100+ datasets found
  1. Policy Dataset

    • kaggle.com
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sjagkoo7 (2023). Policy Dataset [Dataset]. https://www.kaggle.com/datasets/sjagkoo7/policy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sjagkoo7
    Description

    Design a prediction model if a customer having income more than 50000 dollar then need to advise for ploicy. This prediction will help team to take decisions for providing the financial assistance for low income group customers.

  2. Employee Policy Compliance Dataset

    • kaggle.com
    zip
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laraib Nadeem (2024). Employee Policy Compliance Dataset [Dataset]. https://www.kaggle.com/datasets/laraibnadeem2023/employee-policy-compliance-dataset
    Explore at:
    zip(66165 bytes)Available download formats
    Dataset updated
    Dec 24, 2024
    Authors
    Laraib Nadeem
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset has been carefully crafted to simulate policy compliance scenarios in organizations. It contains features relevant to evaluating adherence to regulations, such as compliance status, risk indicators, and operational attributes. The data is intended for research, experimentation, and machine learning applications, particularly in the fields of classification, predictive analytics, and risk assessment.

    The dataset is fully synthetic, ensuring privacy and data security while maintaining realistic patterns and relationships. It provides an excellent starting point for researchers and data scientists exploring policy compliance modeling and related challenges.

    Features:

    The dataset contains 4,000 rows and 12 columns. Below is a detailed description of each feature:

    1. Employee_ID - Unique identifier for each employee.
    2. Name - The name of the employee.
    3. Working_Days - The number of days the employee worked in a given month.
    4. Target_Sales - The sales target assigned to the employee for the month.
    5. Actual_Sales - The actual sales achieved by the employee for the month.
    6. Customer_Satisfaction_Score - A numerical score representing customer satisfaction, ranging from 1 to 5.
    7. Policy_Compliance - Indicates whether the employee complied with company policies. Possible values: Yes, No.
    8. Low_Working_Days - Boolean flag indicating if the employee worked fewer than the required number of days.
    9. Target_Not_Met - Boolean flag indicating if the employee failed to meet their sales target.
    10. Low_Customer_Satisfaction - Boolean flag indicating if the employee's customer satisfaction score was below a threshold.
    11. Non_Compliance_Reason - A text field explaining the reason for non-compliance, if applicable.
    12. Month - The month corresponding to the record

    Purpose:

    This dataset is designed for: * Training machine learning models to predict policy compliance. * Exploring relationships between operational attributes and compliance outcomes. * Generating insights for decision-making and policy optimization.

    Licensing:

    The dataset is released under the CC BY 4.0 license, allowing free use with proper attribution.

  3. Electronic Health Legal Data

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Electronic Health Legal Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/electronic-health-legal-data
    Explore at:
    zip(192951 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    The Devastator
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Electronic Health Legal Data

    Exploring Laws and Regulations

    By US Open Data Portal, data.gov [source]

    About this dataset

    This Electronic Health Information Legal Epidemiology dataset offers an extensive collection of legal and epidemiological data that can be used to understand the complexities of electronic health information. It contains a detailed balance of variables, including legal requirements, enforcement mechanisms, proprietary tools, access restrictions, privacy and security implications, data rights and responsibilities, user accounts and authentication systems. This powerful set provides researchers with real-world insights into the functioning of EHI law in order to assess its impact on patient safety and public health outcomes. With such data it is possible to gain a better understanding of current policies regarding the regulation of electronic health information as well as their potential for improvement in safeguarding patient confidentiality. Use this dataset to explore how these laws impact our healthcare system by exploring patterns across different groups over time or analyze changes leading up to new versions or updates. Make exciting discoveries with this comprehensive dataset!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Start by familiarizing yourself with the different columns of the dataset. Examine each column closely and look up any unfamiliar terminology to get a better understanding of what the columns are referencing.

    • Once you understand the data and what it is intended to represent, think about how you might want to use it in your analysis. You may want to create a research question, or narrower focus for your project surrounding legal epidemiology of electronic health information that can be answered with this data set.

    • After creating your research plan, begin manipulating and cleaning up the data as needed in order to prepare it for analysis or visualization as specified in your project plan or research question/model design steps you have outlined .

    4 .Next, perform exploratory data analysis (EDA) on relevant subsets of data from specific countries if needed on specific subsets based on targets of interests (e.g gender). Filter out irrelevant information necessary for drawing meaningful insights; analyze patterns and trends observed in your filtered datasets ; compare areas which have differing rates e-health related rules and regulations tying decisions made by elected officials strongly driven by demographics , socioeconomics factors ,ideology etc.. . Look out for correlations using statistical information as needed throughout all stages in process from filtering out dis-informative subgroups from full population set til generating visualizations(graphs/ diagrams) depicting valid insight leveraging descriptive / predictive models properly validate against reference datasets when available always keep openness principal during gathering info especially when needs requires contact external sources such validating multiple sources work best provide strong seals establishing validity accuracy facts statement representing humans case scenarios digital support suitably localized supporting local languages culture respectively while keeping secure datasets private visible limited particular users duly authorized access 5 Finally create concrete summaries reporting discoveries create share findings preferably infographics showcasing evidence observances providing overall assessment main conclusions protocols developed so far broader community indirectly related interested professionals able benefit those results ideas complete transparently freely adapted locally ported increase overall global society level enhancing potentiality range impact derive conditions allowing wider adoption increased usage diffusion capture wide spread change movement affect global e-health legal domain clear manner

    Research Ideas

    • Studying how technology affects public health policies and practice - Using the data, researchers can look at the various types of legal regulations related to electronic health information to examine any relations between technology and public health decisions in certain areas or regions.
    • Evaluating trends in legal epidemiology – With this data, policymakers can identify patterns that help measure the evolution of electronic health information regulations over time and investigate why such rules are changing within different states or countries.
    • Analysing possible impacts on healthcare costs – Looking at changes in laws, regulations, and standards relate...
  4. hr-policies-qa-dataset

    • kaggle.com
    zip
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syncora_ai (2025). hr-policies-qa-dataset [Dataset]. https://www.kaggle.com/datasets/syncoraai/hr-policies-qa-dataset
    Explore at:
    zip(54895 bytes)Available download formats
    Dataset updated
    Sep 11, 2025
    Authors
    Syncora_ai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🏢 HR Policies Q&A Synthetic Dataset

    This synthetic dataset for LLM training captures realistic employee–assistant interactions about HR and compliance policies.
    Generated using Syncora.ai's synthetic data generation engine, it provides privacy-safe, high-quality conversations for training Large Language Models (LLMs) to handle HR-related queries.

    Perfect for researchers, HR tech startups, and AI developers building chatbots, compliance assistants, or policy QA systems — without exposing sensitive employee data.

    🧠 Context & Applications

    HR departments handle countless queries on policies, compliance, and workplace practices.
    This dataset simulates those Q&A flows, making it a powerful dataset for LLM training and research.

    You can use it for:

    • HR chatbot prototyping
    • Policy compliance assistants
    • Internal knowledge base fine-tuning
    • Generative AI experimentation
    • Synthetic benchmarking in enterprise QA systems

    📊 Dataset Features

    ColumnDescription
    roleRole of the message author (system, user, or assistant)
    contentActual text of the message
    messagesGrouped sequence of role–content exchanges (conversation turns)

    Each entry represents a self-contained dialogue snippet designed to reflect natural HR conversations, ideal for synthetic data generation research.

    📦 This Repo Contains

    • HR Policies QA Dataset – JSON format, ready to use for LLM training or evaluation
    • Jupyter Notebook – Explore the dataset structure and basic preprocessing
    • Synthetic Data Tools – Generate your own datasets using Syncora.ai
    • Generate Synthetic Data
      Need more? Use Syncora.ai’s synthetic data generation tool to create custom HR/compliance datasets. Our process is simple, reliable, and ensures privacy.

    🧪 ML & Research Use Cases

    • Policy Chatbots — Train assistants to answer compliance and HR questions
    • Knowledge Management — Fine-tune models for consistent responses
    • Synthetic Data Research — Explore structured dialogue datasets without legal risks
    • Evaluation Benchmarks — Test enterprise AI assistants on HR-related queries
    • Dataset Expansion — Combine this dataset with your own data using synthetic generation

    🔒 Why Syncora.ai Synthetic Data?

    • Zero real-user data → Zero privacy liability
    • High realism → Actionable insights for LLM training
    • Fully customizable → Generate synthetic data tailored to your domain
    • Ethically aligned → Safe and responsible dataset creation

    Whether you're building an HR assistant, compliance bot, or experimenting with enterprise LLMs, Syncora.ai synthetic datasets give you trustworthy, free datasets to start with — and scalable tools to grow further.

    💬 Questions or Contributions?

    Got feedback, research use cases, or want to collaborate?
    Open an issue or reach out — we’re excited to work with AI researchers, HR tech builders, and compliance innovators.

    BOOK A DEMO

    ⚠️ Disclaimer

    This dataset is 100% synthetic and does not represent real employees or organizations.
    It is intended solely for research, educational, and experimental use in HR analytics, compliance automation, and machine learning.

  5. Global Internet Usage by Country (2000-2023)

    • kaggle.com
    zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melek Nur (2025). Global Internet Usage by Country (2000-2023) [Dataset]. https://www.kaggle.com/datasets/meleknur/global-internet-usage-by-country-2000-2023
    Explore at:
    zip(17617 bytes)Available download formats
    Dataset updated
    Mar 25, 2025
    Authors
    Melek Nur
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains country-level internet usage data from 2000 to 2023. It provides the percentage of the population using the internet in different countries over time. This data can be useful for analyzing global internet penetration, digital adoption trends, and technological growth across regions.

    🔹 Dataset Information:

    • Country Name & Code – Identifies each country.
    • Yearly Internet Usage (%) – The percentage of the population using the internet from 2000 to 2023.
    • Missing Values (No Data) – Some countries may have missing data for certain years.

    📈 Potential Use Cases:

    • Trend Analysis: Explore how internet usage has evolved globally.
    • Regional Comparisons: Compare internet adoption across countries and continents.
    • Machine Learning Applications
      • Time-Series Forecasting – Predict future internet usage trends using models like ARIMA, LSTM, or Prophet.
      • Clustering & Segmentation – Group countries based on their internet adoption rates using k-means or hierarchical clustering.
      • Feature Engineering – Use internet penetration as a predictor in socio-economic or technological development models.

    📌 Source:

    Modified from this source World bank group data

    This dataset is valuable for data visualization, time-series analysis, and policy-making research related to digital growth.

  6. World Internet Usage Data (2023 Updated)

    • kaggle.com
    zip
    Updated Dec 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). World Internet Usage Data (2023 Updated) [Dataset]. https://www.kaggle.com/datasets/kanchana1990/world-internet-usage-data-2023-updated
    Explore at:
    zip(3946 bytes)Available download formats
    Dataset updated
    Dec 21, 2024
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset provides a comprehensive overview of internet usage across countries as of 2024. It includes data on the percentage of the population using the internet, sourced from multiple organizations such as the World Bank (WB), International Telecommunication Union (ITU), and the CIA. The dataset covers all United Nations member states, excluding North Korea, and provides insights into internet penetration rates, user counts, and trends over recent years. The data is derived from household surveys and internet subscription statistics, offering a reliable snapshot of global digital connectivity.

    Data Science Applications

    This dataset can be used in various data science applications, including: - Digital Divide Analysis: Evaluate disparities in internet access between developed and developing nations. - Trend Analysis: Study the growth of internet penetration over time across different regions. - Policy Recommendations: Assist policymakers in identifying underserved areas and strategizing for improved connectivity. - Market Research: Help businesses identify potential markets for digital products or services. - Correlation Studies: Analyze relationships between internet penetration and socioeconomic indicators like GDP, education levels, or urbanization.

    Column Descriptors

    The dataset contains the following columns: 1. Location: Country or region name. 2. Rate (WB): Percentage of the population using the internet (World Bank data). 3. Year (WB): Year corresponding to the World Bank data. 4. Rate (ITU): Percentage of the population using the internet (ITU data). 5. Year (ITU): Year corresponding to the ITU data. 6. Users (CIA): Estimated number of internet users in absolute terms (CIA data). 7. Year (CIA): Year corresponding to the CIA data. 8. Notes: Additional notes or observations about specific entries.

    Ethically Mined Data

    The data has been sourced from publicly available and reputable organizations such as the World Bank, ITU, and CIA. These sources ensure transparency and ethical collection methods through household surveys and official statistics. The dataset excludes North Korea due to limited reliable information on its internet usage.

    Acknowledgements

    This dataset is based on information compiled from: - World Bank - International Telecommunication Union - CIA World Factbook - Wikipedia's "List of countries by number of Internet users" page

    Special thanks to these organizations for providing open access to this valuable information, enabling deeper insights into global digital connectivity trends.

    Citations: [1] https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users [2] https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users

  7. Impact of social media on suicide rates

    • kaggle.com
    zip
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aadya Singh (2024). Impact of social media on suicide rates [Dataset]. https://www.kaggle.com/datasets/aadyasingh55/impact-of-social-media-on-suicide-rates
    Explore at:
    zip(811 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    Aadya Singh
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    Impact of Social Media on Suicide Rates: Produced Results

    Overview

    This dataset explores the impact of social media usage on suicide rates, presenting an analysis based on social media platform data and WHO suicide rate statistics. It is an insightful resource for researchers, data scientists, and analysts looking to understand the correlation between increased social media activity and suicide rates across different regions and demographics.

    Content

    The dataset includes the following key sources:

    WHO Suicide Rate Data (SDGSUICIDE): Retrieved from WHO data export, which tracks global suicide rates. Social Media Usage Data: Information from major social media platforms, sourced from Kaggle, supplemented with data from:

    Facebook: Statista

    Twitter: Twitter Investor Relations

    Instagram: Facebook Investor Relations

    Acknowledgements

    We would like to acknowledge:

    World Health Organization (WHO): For providing global suicide rate data, accessible under their data policy (WHO Data Policy). Kaggle Dataset Contributors: For social media usage data that played a crucial role in the analysis.

    Usage

    This dataset is useful for studying the potential social factors contributing to suicide rates, especially the role of social media. Analysts can explore correlations using time-series analysis, regression models, or other statistical tools to derive meaningful insights. Please ensure compliance with the Creative Commons Attribution Non-Commercial Share Alike 4.0 International License (CC BY-NC-SA 4.0).

    Data Files

    Impact-of-social-media-on-suicide-rates-results-1.1.0.zip (90.9 kB) Contains processed results and supplementary data.

    Citations

    If you use this dataset in your work, please cite:

    Martin Winkler. (2021). Impact of social media on suicide rates: produced results (1.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4701587 https://zenodo.org/records/4701587

    License

    This dataset is released under the Creative Commons Attribution Non-Commercial Share Alike 4.0 International (CC BY-NC-SA 4.0) license. You are free to share and adapt the material, provided proper attribution is given, it's not used for commercial purposes, and any derivatives are distributed under the same license.

    Columns

    Year: The year of the recorded data. Sex: Demographic indicator (e.g., male, female). Suicide Rate % Change Since 2010: Percentage change in suicide rates compared to the year 2010. Twitter User Count % Change Since 2010: Percentage change in Twitter user counts compared to the year 2010. Facebook User Count % Change Since 2010: Percentage change in Facebook user counts compared to the year 2010.

    Data Bins

    The dataset includes categorized data ranges, allowing for analysis of trends within specified intervals. For example, ranges for suicide rates, Twitter user counts, and Facebook user counts are represented in bins for better granularity.

    Count Summary

    The dataset summarizes counts for various intervals, enabling researchers to identify trends and patterns over time, highlighting periods of significant change or stability in both suicide rates and social media usage.

    Use Cases

    This dataset can be used for:

    Statistical analysis to understand correlations between social media usage and mental health outcomes. Academic research focused on public health, psychology, or sociology. Policy-making discussions aimed at addressing mental health concerns linked to social media.

    Cautions

    The dataset contains sensitive information regarding suicide rates. Users should handle this data with care and sensitivity, considering ethical implications when presenting findings.

  8. Personal Location Data Market

    • kaggle.com
    zip
    Updated Nov 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Personal Location Data Market [Dataset]. https://www.kaggle.com/datasets/thedevastator/location-data-companies-a-comprehensive-survey
    Explore at:
    zip(8520 bytes)Available download formats
    Dataset updated
    Nov 15, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Personal Location Data Market

    Data From: "There’s a Multibillion-Dollar Market for Your Phone’s Location Data"

    Original source. Author: The Markup

    About this dataset

    There’s a multibillion-dollar market for your phone’s location data. We surveyed 100 companies to find out who they are, what they do with your data, and whether they follow best practices.

    Your phone’s location is constantly being tracked and collected by hundreds of companies, many of which are unknown to you. This data is valuable—and it’s being bought and sold in a thriving industry with little regulation.

    The Markup surveyed 100 companies that collect or sell location data to get a better understanding of this industry and what it means for your privacy. We asked these companies about their policies and practices around collecting, using, and selling location data. We also reviewed their public statements and website disclosures related to privacy.

    What we found was an industry that lacks transparency and accountability, with few companies following best practices around protecting the privacy of their users’ data. In many cases, these companies are collecting more data than they need, retaining it for longer than necessary, sharing it with third parties without user consent, or failing to secure it properly—putting users at risk of identity theft, fraud, or other harms.

    If you care about your privacy, you should know who has access to your location data—and what they’re doing with it. This dataset contains information on the 100 companies we surveyed so that you can make informed choices about which ones to trust with your personal data

    How to use the dataset

    This dataset contains information on companies that collect and sell location data. The data includes the company name, website, logo, narrative, company response, privacy email, privacy policy, and whether or not the company is a California-licensed data broker

    Research Ideas

    • To study how location data is collected and sold
    • To understand the business model of location data companies
    • To learn about the privacy policies of these companies

    Acknowledgements

    This dataset was compiled and analyzed by The Markup. The Markup is a nonprofit newsroom that investigates how powerful institutions impact our lives

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: location-data-companies.csv | Column name | Description | |:-------------------|:--------------------------------------------------------------------| | name | The name of the company. (String) | | website | The company's website. (String) | | logo | The company's logo. (String) | | narrative | A description of the company. (String) | | privacy_email | The company's privacy email address. (String) | | privacy_policy | The company's privacy policy. (String) | | CA_broker | Whether the company is a California-licensed data broker. (Boolean) |

  9. CityTrek-14K

    • kaggle.com
    zip
    Updated Jan 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sobhan Moosavi (2024). CityTrek-14K [Dataset]. https://www.kaggle.com/datasets/sobhanmoosavi/citytrek-14k
    Explore at:
    zip(182314065 bytes)Available download formats
    Dataset updated
    Jan 13, 2024
    Authors
    Sobhan Moosavi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    CityTrek-14K is a distinctive, extensive dataset that includes 14,000 trajectories from 280 drivers, each contributing 50 trajectories, in three major U.S. cities: Philadelphia (PA), Atlanta (GA), and Memphis (TN). It features a time series data set capturing details like timestamps, vehicle speeds, and GPS coordinates, with a collection frequency of 1Hz. Although the dataset includes location data, strict anonymization practices were adhered to, ensuring personal information like home or work addresses remain confidential. The CityTrek-14K dataset offers a comprehensive view of driving patterns, encompassing over 4,800 hours of driving data and spanning more than 189,000 miles, collected between July 2017 and March 2019. The dataset comprises two distinct files: the first is a summary of the trips, and the second is a trajectory data file that includes detailed records captured every second.

    Acknowledgements

    If you use this dataset, please kindly cite the following paper: - Moosavi, Sobhan, and Rajiv Ramnath. "Context-aware driver risk prediction with telematics data." Accident Analysis & Prevention 192 (2023): 107269.

    Data Collection Methodology

    The CityTrek-14K dataset was collected using specially designed devices installed in vehicles. These devices were configured to record and transmit data frequently. Further details about this data collection process are elaborated in the paper mentioned above.

    Potential Applications

    The CityTrek-14K dataset is versatile, suitable for numerous applications such as: - Traffic Modeling and ETA Prediction: The dataset contains detailed route information and travel times, making it an excellent resource for large-scale traffic modeling and ETA modeling techniques. - Route Optimization: With its detailed trajectory data, the dataset is ideal for developing and testing route optimization techniques, providing insights into efficient pathfinding methods. - Modeling and Analyzing Driver Behavior: As each driver in the dataset has exactly 50 trajectories recorded, this allows for a comprehensive analysis of driver behavior, offering a unique opportunity to study and model driving patterns and habits.

    Usage Policy and Legal Disclaimer

    This dataset is being distributed solely for research purposes under the Creative Commons Attribution-Noncommercial-ShareAlike license (CC BY-NC-SA 4.0). By downloading the dataset, you agree to use it only for non-commercial, research, or academic applications. If you use this dataset, it is necessary to cite the paper mentioned above.

    Inquiries or need help?

    For any inquiries or assistance, please contact Sobhan Moosavi at sobhan.mehr84@gmail.com

  10. UNICON Energy Consumption Dataset

    • kaggle.com
    zip
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDAClab (2022). UNICON Energy Consumption Dataset [Dataset]. https://www.kaggle.com/datasets/cdaclab/unicon
    Explore at:
    zip(148437018 bytes)Available download formats
    Dataset updated
    Nov 9, 2022
    Authors
    CDAClab
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    UNICON, a large-scale open dataset on UNIversity CONsumption of utilities, electricity, gas and water. This dataset is publicly released as part of La Trobe University’s commitment to Net Zero Carbon Emissions by 2029, for which we are building the La Trobe Energy AI/Analytics Platform (LEAP) that leverages Artificial Intelligence (AI) and Data Analytics to analyse, predict and optimize the consumption, generation and utilization of electricity, renewables, gas and water resources. UNICON contains consumption data for La Trobe’s five campuses in geographically distributed regions, across four years, 2018-2021 inclusive. This includes the COVID-19 global pandemic timeline of university shutdown and work from home measures that led to a significant decrease in the consumption of utilities. The consumption data consists of smart electricity meter readings at 15-minute granularity, gas meter readings at hourly intervals and water meter readings at 15-minute intervals. UNICON also contains weather data from the closest weather station to each campus, collected at two-speed latency of 1 minute and 10 minutes. The dataset is annotated with internal events of significance, such as energy conservation measures (ECMs) and other measurement and validation (M&V) activities conducted as part of LEAP optimization. To the best of our knowledge, this is the first large-scale, comprehensive, open dataset for the three main utilities, electricity, gas, and water consumption in a multi-campus university setting.

    Dataset file descriptions

    • campus_meta.csv – This file contains information about each campus in the university network.
    • nmi_meta.csv – Information about NMIs such as campus location and peak demand is listed in this file.
    • building_meta.csv – This file contains meta information about buildings in each campus which include campus location, floor area and etc.
    • calender.csv – University calendar for the data collection period is included in this file.
    • events.csv – There are series of events happened at each building which include energy efficiency projects such as LED installation and HVAC system updates. This file contains the dates related to each event at building level.
    • nmi_consumption.csv – Consumption data of NMIs are recorded in this file.
    • building_consumption.csv – Consumption data of buildings are recorded in this file.
    • building_submeter_consumption.csv – Consumption data of building sub-meters are recorded in this file.
    • gas_consumption.csv – Gas consumption data of available campuses are recorded in this file.
    • water_consumption.csv – Water consumption data of available campuses are recorded in this file.
    • weather_data.csv – Weather data collected from respective weather stations.

    Acknowledgements

    Please cite the following paper if you use this dataset:

    • H. Moraliyage, N. Mills, P. Rathnayake, D. De Silva and A. Jennings, "UNICON: An Open Dataset of Electricity, Gas and Water Consumption in a Large Multi-Campus University Setting," 2022 15th International Conference on Human System Interaction (HSI), 2022, pp. 1-8, https://doi.org/10.1109/HSI55341.2022.9869498

    Usage Policy and Legal Disclaimer

    This dataset is being distributed only for Research purposes, under Creative Commons Attribution-Noncommercial-ShareAlike license (CC BY-NC-SA 4.0). By clicking on download button(s) below, you are agreeing to use this data only for non-commercial, research, or academic applications. You may need to cite the above papers if you use this dataset.

    Github: https://github.com/CDAC-lab/UNICON

  11. Data from: Tobacco control

    • kaggle.com
    zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prathamesh keote (2024). Tobacco control [Dataset]. https://www.kaggle.com/datasets/shreyaskeote23/tobacco-control/data
    Explore at:
    zip(220365 bytes)Available download formats
    Dataset updated
    Jul 11, 2024
    Authors
    Prathamesh keote
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides comprehensive data on global tobacco control, encompassing various aspects such as usage statistics, policy enforcement, health warnings, and retail prices. The files included are:

    Age-standardized estimates of current tobacco use, tobacco smoking, and cigarette smoking: Statistics on tobacco usage adjusted for age differences. Enforce bans on tobacco advertising: Data on the enforcement of bans on tobacco advertising across different countries. Health warnings on cigarette packages: Information on the prevalence and effectiveness of health warnings on cigarette packages. MPOWER Overview: An overview of the MPOWER measures (Monitor, Protect, Offer, Warn, Enforce, Raise) implemented globally. Non-age-standardized estimates of current tobacco use, tobacco smoking, and cigarette smoking: Raw statistics on tobacco usage without age adjustment. Retail price for a pack of 20 cigarettes: Data on the retail price of a 20-cigarette pack in various regions.

  12. North Carolina Social and Human Services Dataset

    • kaggle.com
    zip
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Varun Deepak Gudhe (2024). North Carolina Social and Human Services Dataset [Dataset]. https://www.kaggle.com/datasets/varundeepakgudhe/north-carolina-social-and-human-services-dataset
    Explore at:
    zip(1460450 bytes)Available download formats
    Dataset updated
    May 3, 2024
    Authors
    Varun Deepak Gudhe
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    North Carolina
    Description

    This dataset encompasses comprehensive social and human services data for North Carolina, offering insights into public assistance, child services, vocational rehabilitation, and transfer payments across state and county levels. Each entry delineates specific services within various geographical areas, classified by type, for each year recorded. This rich dataset enables a deep dive into the trends and distributions of social services, assisting in policy-making and community support initiatives.

    Suggested usage

    1. Policy Development and Evaluation:

      • Government agencies and policymakers can utilize the data to assess the effectiveness of current social service programs and to design new policies. Analyzing trends over time can help identify needs and allocate resources more effectively.
    2. Academic Research:

      • Researchers in social sciences, public health, and economics could use the dataset to study the impact of social services on various demographics within North Carolina. This can lead to scholarly articles, studies on social welfare, and the development of new theories in social service provision.
    3. Community Planning:

      • Local government planners and community organizations can use the dataset to better understand the distribution of services such as child services and vocational rehabilitation, and plan community resources accordingly.
    4. Grant Writing and Funding Applications:

      • Non-profit organizations can use detailed data to justify the need for funding in grant applications. By showing specific needs within communities, they can target their proposals to address gaps in services.
    5. Public Awareness and Advocacy:

      • Advocacy groups can use the data to raise public awareness about the state of social services in North Carolina. This can drive campaigns for enhanced funding or changes in how services are delivered.
    6. Economic Analysis:

      • Economists could explore the dataset to correlate the investment in social services with economic outcomes like employment rates, economic mobility, and community health indicators.
  13. US Accidents Dataset (2016 - 2023) (49 states)

    • kaggle.com
    zip
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML_GOD_SIDDHARTH (2024). US Accidents Dataset (2016 - 2023) (49 states) [Dataset]. https://www.kaggle.com/datasets/mlgodsiddharth/usa-accidents-dataset49-states-subset-of
    Explore at:
    zip(206667276 bytes)Available download formats
    Dataset updated
    Aug 16, 2024
    Authors
    ML_GOD_SIDDHARTH
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This is a subset of dataset https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents with some columns dropped making it easier for practicing EDA
    Acknowledgements If you use this dataset, please kindly cite the following papers:

    Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.

    Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

    Content This dataset was collected in real-time using multiple Traffic APIs. It contains accident data collected from February 2016 to March 2023 for the Contiguous United States. For more details about this dataset, please visit [here].

    Inspiration The US-Accidents dataset can be used for numerous applications, such as real-time car accident prediction, studying car accident hotspot locations, casualty analysis, extracting cause and effect rules to predict car accidents, and studying the impact of precipitation or other environmental stimuli on accident occurrence. The most recent release of the dataset can also be useful for studying the impact of COVID-19 on traffic behavior and accidents.

    Sampled Data (New!) For those requiring a smaller, more manageable dataset, a sampled version is available which includes 500,000 accidents. This sample is extracted from the original dataset for easier handling and analysis.

    Other Details Please note that the dataset may be missing data for certain days, which could be due to network connectivity issues during data collection. Regrettably, the dataset will no longer be updated, and this version should be considered the latest.

    Usage Policy and Legal Disclaimer This dataset is being distributed solely for research purposes under the Creative Commons Attribution-Noncommercial-ShareAlike license (CC BY-NC-SA 4.0). By downloading the dataset, you agree to use it only for non-commercial, research, or academic applications. If you use this dataset, it is necessary to cite the papers mentioned above.

    Inquiries or need help? For any inquiries or assistance, please contact Sobhan Moosavi at sobhan.mehr84@gmail.com

  14. UNISOLAR Solar Power Generation Dataset

    • kaggle.com
    zip
    Updated Nov 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDAClab (2022). UNISOLAR Solar Power Generation Dataset [Dataset]. https://www.kaggle.com/datasets/cdaclab/unisolar/suggestions
    Explore at:
    zip(15462044 bytes)Available download formats
    Dataset updated
    Nov 9, 2022
    Authors
    CDAClab
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    UNISOLAR dataset contains high-granularity Photovoltaic (PV) solar energy generation, solar irradiance, and weather data from 42 PV sites deployed across five campuses at La Trobe University, Victoria, Australia. The dataset includes approximately two years of PV solar energy generation data collected at 15-minute intervals. Geographical placement and engineering specifications for each of the sites are also provided to aid researchers in modellin solar energy generation. Weather data is available at 1-minute intervals and is provided by the Australian Bureau of Meteorology (BOM). Apparent temperature, air temperature, dew point temperature, relative humidity, wind speed, and wind direction were provided under the weather data. The paper describes the data collection methods, cleaning, and merging with weather data. This dataset can be used to forecast, benchmark, and enhance operational outcomes in solar sites.

    Acknowledgements

    Please cite the following paper if you use this dataset:

    • S. Wimalaratne, D. Haputhanthri, S. Kahawala, G. Gamage, D. Alahakoon and A. Jennings, "UNISOLAR: An Open Dataset of Photovoltaic Solar Energy Generation in a Large Multi-Campus University Setting," 2022 15th International Conference on Human System Interaction (HSI), 2022, pp. 1-5, doi: 10.1109/HSI55341.2022.9869474.

    Usage Policy and Legal Disclaimer

    This dataset is being distributed only for Research purposes, under Creative Commons Attribution-Noncommercial-ShareAlike license (CC BY-NC-SA 4.0). By clicking on download button(s) below, you are agreeing to use this data only for non-commercial, research, or academic applications. You may need to cite the above papers if you use this dataset.

    Github: https://github.com/CDAC-lab/UNISOLAR

  15. Insurance Policy Assets, Liabilities, and Premiums

    • kaggle.com
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Insurance Policy Assets, Liabilities, and Premiums [Dataset]. https://www.kaggle.com/datasets/thedevastator/ny-insurance-policy-assets-liabilities-and-premi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    NY Insurance Policy Assets, Liabilities, and Premiums Annually

    Investigating the Impact of Financial Health on Health Insurance Costs

    By State of New York [source]

    About this dataset

    This dataset tracks health insurance premiums written in New York annually since 2004. It provides vital insight into the amount of money and risk taken on by insurance companies in the state: including what types of insurers are writing policies, how much they are taking on in assets and liabilities, and how this has shifted over time. This data will be invaluable to those looking to understand large scale trends in terms of the health insurance industry. The data has been updated as recently as 2021, so it provides a comprehensive picture of changes year-over-year spanning nearly two decades

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains vital information regarding health insurance premiums, assets and liabilities related to policies written in New York annually. It is designed to provide key insights into the performance of insurance companies in New York state.

    The data consists of Type of Insurer, Company Name, Year, Assets, Liabilities and Premium Written for each policy written in every year since 2009. This data can be used to gain greater insight into the performance of certain companies within this industry over time as well as creating benchmarked comparison metrics against other companies within this market space.

    For individual or team exploration projects – you may want to compare one company’s yearly assets/liabilities or premiums against the average value for that same period in order to identify high or low performing periods or take a look at how some variables changed across a 5 year (or wider) timescale e.g compare how did assets/liabilites changed over the duration of 5 years?

    By utilizing basic data visualizations like scatterplots and bar graphs we can start gaining more insights from our analysis by looking at potential correlations between variables such as: Are premium prices related to their assets? Does company size have an impact on the premium price? Have liabilities remained constant compared with past years?

    Administrators in management roles could also use this dataset to track yearly changes within their own companys results- such as tracking existing trends over longer periods with pay attention for changes which require further investigation/ research as necessary .

    All in all this data set is a great tool for students , researchers & analysts alike!

    Research Ideas

    • Establishing a baseline of average health insurance premiums in New York by year across different insurers.
    • Comparing insurance company assets and liabilities with their premium-written to provide an understanding of how profitable they are in the New York market.
    • Tracking the growth and success of health insurers in the New York over time to understand changes in industry trends or policy standards

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: health-insurance-premiums-on-policies-written-in-new-york-annually-1.csv | Column name | Description | |:--------------------|:--------------------------------------------------------------------------------------------------------------------------------| | Type of Insurer | This column indicates the type of insurer that wrote the policy. (String) | | Company Name | This column indicates the name of the company that wrote the policy. (String) | | Year | This column indicates the year that the policy was written in. (Integer) | | Assets | This column indicates the total assets of the company that wrote the policy. (Integer) | | Liabilities | This column indicates the total liabilities of the company that wrote the policy. (Integer) | | Premium Written | This column indicates the total amount paid by an individual or organization for a given product or service annually. (Integer) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit State of New York.

  16. Product Retail Prices per month from 2017-2025

    • kaggle.com
    zip
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aradhana Hirapara (2025). Product Retail Prices per month from 2017-2025 [Dataset]. https://www.kaggle.com/datasets/aradhanahirapara/product-retail-price-survey-2017-2025
    Explore at:
    zip(2543973 bytes)Available download formats
    Dataset updated
    Apr 13, 2025
    Authors
    Aradhana Hirapara
    Description

    This dataset contains monthly retail price data for a wide range of consumer products sold in various Canadian provinces over several years. It has been enriched with tax, category, and classification metadata for deeper insights.

    Usefulness of the Dataset

    This dataset can be used for:

    Use CaseDescription
    Price Trend AnalysisTrack price movements over time, province, and product category.
    Inflation StudiesExamine inflation on essentials vs non-essentials over time.
    Regional Price ComparisonAnalyze cost disparities for the same goods across provinces.
    Tax Policy ImpactUnderstand how tax laws affect consumer pricing by region.
    Budget OptimizationIdentify high-cost vs low-cost essentials for better planning.
    Machine Learning IntegrationUse in models for price prediction or consumer segmentation.

    Purpose and Use Cases

    This dataset is ideal for:

    🏛️ Policy Analysis

    Understand how federal and provincial taxes shape price access — especially for essentials like milk, bread, or medications.

    🧍‍♀️ Consumer Insights

    See how costs for personal care, food, and baby goods evolve month-over-month in each region.

    💸 Inflation & Seasonality

    Analyze how monthly or yearly trends (e.g., holiday spikes or inflation events) affect product pricing.

    🌍 Social Impact Studies

    Measure product accessibility gaps between provinces for low-income consumers or high-tax regions.

    🛍️ Retail & Budget Planning

    Guide families, retailers, or policymakers on where and when to buy or subsidize certain products.

  17. Environmental and Energy Policy Impacts: OECD

    • kaggle.com
    zip
    Updated Jun 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioana Birlan (2024). Environmental and Energy Policy Impacts: OECD [Dataset]. https://www.kaggle.com/datasets/ioanabirlan/green-growth
    Explore at:
    zip(1903 bytes)Available download formats
    Dataset updated
    Jun 2, 2024
    Authors
    Ioana Birlan
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset provides comprehensive panel data (2000-2019) on various environmental and energy metrics across two regions, focusing on indicators that influence CO2 emissions. It includes data from OECD statistics and encompasses variables that serve as indicators of smart urban development and governmental policies. Researchers and policymakers can use this dataset to analyze the impact of various factors on CO2 emissions and to compare the effectiveness of environmental policies between OECD countries in Europe and America. The dataset is valuable for exploring significant differences and similarities in environmental and energy policies, municipal waste management, renewable energy adoption, and technology development.

    Variables included: - Production-based CO2 emissions - Total primary energy supply - Renewable energy supply, % total energy supply - Municipal waste recycled or composted, % treated waste - Mortality from exposure to ambient PM2.5 - Welfare costs of premature mortalities from exposure to ambient PM2.5 - Development of environment-related technologies, % all technologies - Relative advantage in environment-related technology - Environmentally related taxes, % GDP - Terrestrial protected area, % land area - Population density inhabitants per km²

    Usage: This dataset is ideal for analyzing the determinants of CO2 emissions and understanding the effectiveness of different environmental policies. Users can explore:

    • The relationship between renewable energy supply and CO2 emissions.
    • Comparative analysis of CO2 emissions across OECD countries in Europe and America.
    • Evaluation of the effectiveness of environmental policies and their economic implications.
    • Assessment of the role of renewable energy and technology development in reducing emissions.
    • Exploration of smart urban development indicators and their impact.
    • The impact of municipal waste management practices on environmental outcomes.
    • Differences in environmental technology development between European and American OECD countries.
    • The influence of population density and urban development on CO2 emissions.
    • Statistical tests to test significant difference between the two regions.
    • Forecasting emissions for both regions
    • Scenario tests such as Monte Carlo Simulations

    Source: The data is sourced from OECD Statistics

  18. World Bank Indicators (1960‑Present)

    • kaggle.com
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George DiNicola (2025). World Bank Indicators (1960‑Present) [Dataset]. https://www.kaggle.com/datasets/georgejdinicola/world-bank-indicators
    Explore at:
    zip(52559856 bytes)Available download formats
    Dataset updated
    May 29, 2025
    Authors
    George DiNicola
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset provides a comprehensive collection of time series data sourced from the World Bank Open Data Platform, covering a wide range of global indicators from 1960 to the most recently published year. It includes economic, social, environmental, and demographic metrics, making it an ideal resource for researchers, data scientists, and policymakers interested in global development trends, economic forecasting, or socio-economic analysis.

    A tutorial on how to combined the dataset topics together into one large dataset can be found here

    Why this Dataset?

    My motivation for this project was to curate a high-quality collection of datasets for World Bank indicators organized by topics and structured in time-series, making them more accessible for data science projects. Since the World Bank’s Kaggle datasets have not been updated since 2019 https://www.kaggle.com/organizations/theworldbank, I saw an opportunity to provide more current data for the data analysis community.

    Dataset Collection Contents

    This collection brings together more than 800 World Bank indicators organized into 18 topic‑specific CSV files. Each file is structured as a country‑year panel: every row represents a unique combination of year (1960‑present) and ISO‑3 country code, while the columns hold the topic’s indicators.

    The collection includes datasets with a variety of indicators, such as: - Economic Metrics: GDP growth (%), GDP per capita, consumer price inflation, merchandise trade, gross capital formation, and more.
    - Social Metrics: School enrollment (primary, secondary, tertiary), infant mortality rate, maternal mortality rate, poverty headcount, and more.
    - Environmental Metrics: Forest area, renewable energy consumption, food production indices, and more.
    - Demographic Metrics: Urban population, life expectancy, net migration, and more.

    Usage

    This dataset is ideal for a variety of applications, including: - Economic forecasting and trend analysis (e.g., GDP growth, inflation).
    - Socio-economic studies (e.g., education, health, poverty).
    - Environmental impact analysis (e.g., renewable energy adoption).
    - Demographic research (e.g., population trends, migration).

    Topic datasets can be merged with each other using year and country code. This tutorial with notebook code can help you get started quickly.

    Collection Methodology

    The data is collected via a custom software application that discovers and groups high-quality indicators with rules-based logic & artificial intelligence, generates metadata, and performs ETL for the data from the World Bank API. The result is a clean, up‑to‑date collection of World Bank indicators in time-series format that is ready for analysis—no manual downloads or data wrangling required.

    Modifications

    The original World Bank data has been aggregated and transformed for ease of use. Missing values have been preserved as provided by the World Bank, and no significant transformations have been applied beyond formatting and aggregation into a single file.

    Source & Attribution

    The World Bank: World Development Indicators

    This dataset is publicly available and sourced from the World Bank Open Data Platform and is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. When using this data, please attribute the World Bank as follows: "Data sourced from the World Bank, licensed under CC BY 4.0." For more details on the World Bank’s terms of use, visit: https://www.worldbank.org/en/about/legal/terms-of-use-for-datasets.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Feel free to use this data in Kaggle notebooks, academic research, or policy analysis. If you create a derived dataset or analysis, I encourage you to share it with the Kaggle community.

  19. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  20. Insurance Dataset Based on Real-World Statistics

    • kaggle.com
    zip
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SamiAlyasin (2025). Insurance Dataset Based on Real-World Statistics [Dataset]. https://www.kaggle.com/datasets/samialyasin/insurance-data-personal-auto-line-of-business
    Explore at:
    zip(157388 bytes)Available download formats
    Dataset updated
    Jan 19, 2025
    Authors
    SamiAlyasin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    This dataset is a synthetic yet realistic representation of personal auto insurance data, crafted using real-world statistics. While actual insurance data is sensitive and unavailable for public use, this dataset bridges the gap by offering a safe and practical alternative for building robust data science projects.

    Why This Dataset? - Realistic Foundation: Synthetic data generated from real-world statistical patterns ensures practical relevance. - Safe for Use: No personal or sensitive information—completely anonymized and compliant with data privacy standards. - Flexible Applications: Ideal for testing models, developing prototypes, and showcasing portfolio projects.

    How You Can Use It: - Build machine learning models for predicting customer conversion and retention. - Design risk assessment tools or premium optimization algorithms. - Create dashboards to visualize trends in customer segmentation and policy data. - Explore innovative solutions for the insurance industry using a realistic data foundation.

    This dataset empowers you to work on real-world insurance scenarios without compromising on data sensitivity.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
sjagkoo7 (2023). Policy Dataset [Dataset]. https://www.kaggle.com/datasets/sjagkoo7/policy
Organization logo

Policy Dataset

Predict - Whether Policy can be offered or not

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sjagkoo7
Description

Design a prediction model if a customer having income more than 50000 dollar then need to advise for ploicy. This prediction will help team to take decisions for providing the financial assistance for low income group customers.

Search
Clear search
Close search
Google apps
Main menu