100+ datasets found
  1. Artificial Intelligence (AI) awareness, use and impact, Great Britain

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2023). Artificial Intelligence (AI) awareness, use and impact, Great Britain [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/itandinternetindustry/datasets/artificialintelligenceaiawarenessuseandimpactgreatbritain
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    Data from the Opinion and Lifestyle Survey (OPN) on the use of Artificial Intelligence (AI) and how people feel about its uptake in today’s society.

  2. d

    Nationwide real-world implementation of AI for cancer detection in...

    • search.dataone.org
    • datadryad.org
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nora Eisemann; Stefan Bunk; Trasias Mukama; Hannah Baltus; Susanne Elsner; Timo Gomille; Gerold Hecht; Sylvia Heywang-Köbrunner; Regine Rathmann; Katja Siegmann-Luz; Thilo Töllner; Toni Werner Vomweg; Christian Leibig; Alexander Katalinic (2025). Nationwide real-world implementation of AI for cancer detection in population-based mammography screening (PRAIM) [Dataset]. http://doi.org/10.5061/dryad.zs7h44jgn
    Explore at:
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Nora Eisemann; Stefan Bunk; Trasias Mukama; Hannah Baltus; Susanne Elsner; Timo Gomille; Gerold Hecht; Sylvia Heywang-Köbrunner; Regine Rathmann; Katja Siegmann-Luz; Thilo Töllner; Toni Werner Vomweg; Christian Leibig; Alexander Katalinic
    Description

    The PRAIM study (PRospective multicenter observational study of an integrated AI system with live Monitoring) assessed the impact of an AI-based decision support software on breast cancer screening outcomes. This Dryad data package contains the anonymized data from 461 818 screening cases across 12 screening sites in Germany. Variables include screening outcomes like cancer detection, use of AI software, radiologist assessments, cancer characteristics, and further metadata. The data can be used to reproduce the analyses on performance of AI-supported breast cancer screening versus standard of care published in Nature Medicine: Nationwide real-world implementation of AI for cancer detection in population-based mammography screening., , , # Nationwide real-world implementation of AI for cancer detection in population-based mammography screening (PRAIM) – Dataset

    The PRAIM study (PRospective multicenter observational study of an integrated Artificial Intelligence system with live Monitoring) was a study conducted within the German breast cancer screening program from July 2021 to February 2023 to assess the impact of an AI-based decision support software. This dataset contains the data from PRAIM.

    Context

    The PRAIM study has been published in Nature Medicine. Please refer to the article Nationwide real-world implementation of AI for cancer detection in population-based mammography screening for further information on study design, results, and discussion of impact. The study has been previously registered in the German Clinical Trials Register and the study protocol can be found on the [website of the Univ...

  3. f

    Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...

    • frontiersin.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith (2023). Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip [Dataset]. http://doi.org/10.3389/fcomp.2022.1070493.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.

  4. Mental Health Chatbot Pairs

    • kaggle.com
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Mental Health Chatbot Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-chatbot-pairs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Mental Health Chatbot Pairs

    AI-based Tailored Support for Mental Health Conversation

    By Huggingface Hub [source]

    About this dataset

    This dataset contains a compilation of carefully-crafted Q&A pairs which are designed to provide AI-based tailored support for mental health. These carefully chosen questions and answers offer an avenue for those looking for help to gain the assistance they need. With these pre-processed conversations, Artificial Intelligence (AI) solutions can be developed and deployed to better understand and respond appropriately to individual needs based on their input. This comprehensive dataset is crafted by experts in the mental health field, providing insightful content that will further research in this growing area. These data points will be invaluable for developing the next generation of personalized AI-based mental health chatbots capable of truly understanding what people need

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains pre-processed Q&A pairs for AI-based tailored support for mental health. As such, it represents an excellent starting point in building a conversational model which can handle conversations about mental health issues. Here are some tips on how to use this dataset to its fullest potential:

    • Understand your data: Spend time getting to know the text of the conversation between the user and the chatbot and familiarize yourself with what type of questions and answers are included in this specific dataset. This will help you better formulate queries for your own conversational model or develop new ones you can add yourself.

    • Refine your language processing models: By studying the patterns in syntax, grammar, tone, voice, etc., within this conversational data set you can hone your natural language processing capabilities - such as keyword extractions or entity extraction – prior to implementing them into a larger bot system .

    • Test assumptions: Have an idea of what you think may work best with a particular audience or context? See if these assumptions pan out by applying different variations of text to this dataset to see if it works before rolling out changes across other channels or programs that utilize AI/chatbot services

    • Research & Analyze Results : After testing out different scenarios on real-world users by using various forms of q&a within this chatbot pair data set , analyze & record any relevant results pertaining towards understanding user behavior better through further analysis after being exposed to tailored texted conversations about Mental Health topics both passively & actively . The more information you collect here , leads us closer towards creating effective AI powered conversations that bring our desired outcomes from our customer base .

    Research Ideas

    • Developing a chatbot for personalized mental health advice and guidance tailored to individuals' unique needs, experiences, and struggles.
    • Creating an AI-driven diagnostic system that can interpret mental health conversations and provide targeted recommendations for interventions or treatments based on clinical expertise.
    • Designing an AI-powered recommendation engine to suggest relevant content such as articles, videos, or podcasts based on users’ questions or topics of discussion during their conversation with the chatbot

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------------------------| | text | The text of the conversation between the user and the chatbot. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  5. d

    Development of an AI/ML-ready knee ultrasound dataset in a population-based...

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nelson, Amanda (2023). Development of an AI/ML-ready knee ultrasound dataset in a population-based cohort [Dataset]. http://doi.org/10.7910/DVN/SKP9IB
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Nelson, Amanda
    Description

    About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1). Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments. Metadata The DatasetMetadata.json file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata. Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.

  6. O

    Special Population use of Service Category

    • data.austintexas.gov
    • datahub.austintexas.gov
    • +2more
    application/rdfxml +5
    Updated Apr 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Austin, Texas - data.austintexas.gov (2021). Special Population use of Service Category [Dataset]. https://data.austintexas.gov/Health-and-Community-Services/Special-Population-use-of-Service-Category/jwva-euqc
    Explore at:
    json, csv, xml, application/rssxml, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Apr 28, 2021
    Dataset authored and provided by
    City of Austin, Texas - data.austintexas.gov
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This data set contains EIIHA populations who received services funded by Ryan White Part A Grant. EIIHA is Early Identification of Individuals with HIV/AIDS (EIIHA) The special populations (EIIHA) with HIV are: Black MSM = Black men and Black transgender women who have sex with men. Latinx MSM = Latinx men and Latinx Transgender women who have sex with men. Black Women - Black women Transgender - Transgender men and women. These populations have the biggest disparities of people living with HIV. Other data is the number of clients and units used in each service category in the Ryan White Part A, a grant that provides services for those with HIV.

  7. S

    Fairness perceptions of AI use by tax administration

    • sodha.be
    • datacatalogue.cessda.eu
    • +1more
    csv, tsv
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouk Decuypere; Anouk Decuypere (2024). Fairness perceptions of AI use by tax administration [Dataset]. http://doi.org/10.34934/DVN/1NHGXH
    Explore at:
    csv(926438), csv(69827), tsv(71300), tsv(966134)Available download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    Social Sciences and Digital Humanities Archive – SODHA
    Authors
    Anouk Decuypere; Anouk Decuypere
    License

    https://www.sodha.be/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34934/DVN/1NHGXHhttps://www.sodha.be/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34934/DVN/1NHGXH

    Time period covered
    2022 - 2023
    Dataset funded by
    Research Foundation Flanders
    Belgian National Bank
    Description

    We tested whether the proportion of AI versus auditors in fraud selection matters for fairness, and whether there is an impact of transparency (explanations). We found that a higher proportion of AI was more procedurally fair, mostly through bias suppression and consistency, and that the attitude toward AI and trust in the administration explained most variance. Transparency (explanations) had no impact. We also found two small negative interaction effects concerning trust and procedural fairness: with high trust in the tax administration, fairness increased less (as AI increased). Conversely, with low trust, fairness increased more (as AI increased). Dataset 1 was used for the pilot (with students and professionals) Dataset 2 was a representative dataset for the Flemish population.

  8. LLM - Detect AI Generated Text Dataset

    • kaggle.com
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sunil thite (2023). LLM - Detect AI Generated Text Dataset [Dataset]. https://www.kaggle.com/datasets/sunilthite/llm-detect-ai-generated-text-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sunil thite
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In this Dataset contains both AI Generated Essay and Human Written Essay for Training Purpose This dataset challenge is to to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.

    Dataset contains more than 28,000 essay written by student and AI generated.

    Features : 1. text : Which contains essay text 2. generated : This is target label . 0 - Human Written Essay , 1 - AI Generated Essay

  9. A

    ‘Special Population use of Service Category’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Special Population use of Service Category’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-special-population-use-of-service-category-7524/fb7f736d/?iid=002-929&v=presentation
    Explore at:
    Dataset updated
    Jan 26, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Special Population use of Service Category’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/7cf74acd-3ad9-4d02-a58f-2a5e352d8b9c on 26 January 2022.

    --- Dataset description provided by original source is as follows ---

    This data set contains EIIHA populations who received services funded by Ryan White Part A Grant. EIIHA is Early Identification of Individuals with HIV/AIDS (EIIHA) The special populations (EIIHA) with HIV are: Black MSM = Black men and Black transgender women who have sex with men. Latinx MSM = Latinx men and Latinx Transgender women who have sex with men. Black Women - Black women Transgender - Transgender men and women. These populations have the biggest disparities of people living with HIV. Other data is the number of clients and units used in each service category in the Ryan White Part A, a grant that provides services for those with HIV.

    --- Original source retains full ownership of the source dataset ---

  10. A

    ‘2021 World Population (updated daily)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘2021 World Population (updated daily)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-2021-world-population-updated-daily-3a7e/latest
    Explore at:
    Dataset updated
    Jan 29, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Analysis of ‘2021 World Population (updated daily)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rsrishav/world-population on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    2021 World Population dataset which gets updated daily.

    Content

    2021_population.csv: File contains data for only live 2021 population count which gets updated daily. Also contains more information about the country's growth rate, area, etc. timeseries_population_count.csv: File contains data for live population count which gets updated daily but it contains last updated data also. Data in this file is managed day-wise.

    Inspiration

    This type of data can be used for population-related use cases. Like, my own dataset COVID Vaccination in World (updated daily), which requires population data. I believe there are more use cases that I didn't explore yet but might other Kaggler needs this. Time-series related use-case can be implemented on this data but I know it will take time to compile that amount of data. So stay tuned.

    --- Original source retains full ownership of the source dataset ---

  11. d

    Public Use Microdata Samples (PUMS)

    • datasets.ai
    • cmr.earthdata.nasa.gov
    • +1more
    21, 22
    Updated Sep 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2024). Public Use Microdata Samples (PUMS) [Dataset]. https://datasets.ai/datasets/public-use-microdata-samples-pums
    Explore at:
    21, 22Available download formats
    Dataset updated
    Sep 7, 2024
    Dataset authored and provided by
    National Aeronautics and Space Administration
    Description

    The Public Use Microdata Samples (PUMS) are computer-accessible files containing records for a sample of housing Units, with information on the characteristics of each housing Unit and the people in it for 1940-1990. Within the limits of sample size and geographical detail, these files allow users to prepare virtually any tabulations they require. Each datafile is documented in a codebook containing a data dictionary and supporting appendix information. Electronic versions for the codebooks are only available for the 1980 and 1990 datafiles. Identifying information has been removed to protect the confidentiality of the respondents. PUMS is produced by the United States Census Bureau (USCB) and is distributed by USCB, Inter-university Consortium for Political and Social Research (ICPSR), and Columbia University Center for International Earth Science Information Network (CIESIN).

  12. A

    ‘Population by Country - 2020’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Population by Country - 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-population-by-country-2020-c8b7/latest
    Explore at:
    Dataset updated
    Feb 13, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.

    Content

    Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.

    Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

    https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">

    You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

    Below is the code that I used to scrape the code from the website

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">

    Acknowledgements

    Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.

    Inspiration

    As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting

    --- Original source retains full ownership of the source dataset ---

  13. Multi-turn Prompts Dataset

    • kaggle.com
    Updated Oct 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SoftAge.AI (2024). Multi-turn Prompts Dataset [Dataset]. https://www.kaggle.com/datasets/softageai/multi-turn-prompts-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SoftAge.AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description This dataset consists of 400 text-only fine-tuned versions of multi-turn conversations in the English language based on 10 categories and 19 use cases. It has been generated with ethically sourced human-in-the-loop data methods and aligned with supervised fine-tuning, direct preference optimization, and reinforcement learning through human feedback.

    The human-annotated data is focused on data quality and precision to enhance the generative response of models used for AI chatbots, thereby improving their recall memory and recognition ability for continued assistance.

    Key Features Prompts focused on user intent and were devised using natural language processing techniques. Multi-turn prompts with up to 5 turns to enhance responsive memory of large language models for pretraining. Conversational interactions for queries related to varied aspects of writing, coding, knowledge assistance, data manipulation, reasoning, and classification.

    Dataset Source Subject matter expert annotators @SoftAgeAI have annotated the data at simple and complex levels, focusing on quality factors such as content accuracy, clarity, coherence, grammar, depth of information, and overall usefulness.

    Structure & Fields The dataset is organized into different columns, which are detailed below:

    P1, R1, P2, R2, P3, R3, P4, R4, P5 (object): These columns represent the sequence of prompts (P) and responses (R) within a single interaction. Each interaction can have up to 5 prompts and 5 corresponding responses, capturing the flow of a conversation. The prompts are user inputs, and the responses are the model's outputs. Use Case (object): Specifies the primary application or scenario for which the interaction is designed, such as "Q&A helper" or "Writing assistant." This classification helps in identifying the purpose of the dialogue. Type (object): Indicates the complexity of the interaction, with entries labeled as "Complex" in this dataset. This denotes that the dialogues involve more intricate and multi-layered exchanges. Category (object): Broadly categorizes the interaction type, such as "Open-ended QA" or "Writing." This provides context on the nature of the conversation, whether it is for generating creative content, providing detailed answers, or engaging in complex problem-solving. Intended Use Cases

    The dataset can enhance query assistance model functioning related to shopping, coding, creative writing, travel assistance, marketing, citation, academic writing, language assistance, research topics, specialized knowledge, reasoning, and STEM-based. The dataset intends to aid generative models for e-commerce, customer assistance, marketing, education, suggestive user queries, and generic chatbots. It can pre-train large language models with supervision-based fine-tuned annotated data and for retrieval-augmented generative models. The dataset stands free of violence-based interactions that can lead to harm, conflict, discrimination, brutality, or misinformation. Potential Limitations & Biases This is a static dataset, so the information is dated May 2024.

    Note If you have any questions related to our data annotation and human review services for large language model training and fine-tuning, please contact us at SoftAge Information Technology Limited at info@softage.ai.

  14. R

    People Ai Dataset

    • universe.roboflow.com
    zip
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milestone01 (2025). People Ai Dataset [Dataset]. https://universe.roboflow.com/milestone01/people-ai/model/8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    Milestone01
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    People Bounding Boxes
    Description

    People AI

    ## Overview
    
    People AI is a dataset for object detection tasks - it contains People annotations for 1,250 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. GPT vs. Human: A Corpus of Research Abstracts

    • kaggle.com
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helene Eriksen (2023). GPT vs. Human: A Corpus of Research Abstracts [Dataset]. https://www.kaggle.com/datasets/heleneeriksen/gpt-vs-human-a-corpus-of-research-abstracts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Kaggle
    Authors
    Helene Eriksen
    Description

    Overview: This dataset offers a unique collection of research abstracts, comprising both human-written and AI-generated versions. Each entry provides a title, followed by the abstract text, with annotations specifying if the abstract was human-authored or generated by GPT. This dataset was created for the research article Detecting AI Authorship: Analyzing Descriptive Features for AI Detection.

    Structure: The dataset is structured in the following manner:

    title: A research paper's title that remains consistent for both human-written and GPT-generated abstracts.

    abstract: The main content of the abstract. Each title is associated with two abstract texts — one penned by a human author, and another created by GPT.

    ai_generated (Boolean): True indicates the abstract was generated by GPT. False indicates the abstract was human-authored. is_ai_generated (Binary): 1 denotes an AI-generated abstract. 0 denotes a human-written abstract.

    Human abstracted taken from this dataset: https://www.kaggle.com/datasets/Cornell-University/arxiv

    Licence: This dataset is under the MIT licence (https://opensource.org/license/mit/) meaning that "any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software... "

  16. Dec 1998 Current Population Survey: Computer and Internet Use Supplement

    • s.cnmilf.com
    • datasets.ai
    • +1more
    Updated Jul 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). Dec 1998 Current Population Survey: Computer and Internet Use Supplement [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/dec-1998-current-population-survey-computer-and-internet-use-supplement
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Description

    Information on person and household broadband (high-speed Internet) use, where it is used, by what types of devices, what type of service provider, and other characteristics.

  17. LLM: 7 prompt training dataset

    • kaggle.com
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carl McBride Ellis
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description
    • Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
      File: train_essays_RDizzl3_seven_v2.csv
      Human texts: 14247 LLM texts: 3004

      See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts



    • Version 3: "**The RDizzl3 Seven**"
      File: train_essays_RDizzl3_seven_v1.csv

    • "Car-free cities"

    • "Does the electoral college work?"

    • "Exploring Venus"

    • "The Face on Mars"

    • "Facial action coding system"

    • "A Cowboy Who Rode the Waves"

    • "Driverless cars"

    How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

    • Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

    Namely:

    • "Car-free cities"
    • "Does the electoral college work?"
    • "Exploring Venus"
    • "The Face on Mars"
    • "Facial action coding system"
    • "Seeking multiple opinions"
    • "Phones and driving"

    This dataset is a derivative of the datasets

    as well as the original competition training dataset

    • Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
  18. AI corporate investment worldwide 2015-2022

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). AI corporate investment worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/941137/ai-investment-and-funding-worldwide/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2022, the global total corporate investment in artificial intelligence (AI) reached almost ** billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than ******* since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world. What is Artificial Intelligence (AI)? Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes. AI investment and startups The global AI market, valued at ***** billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by **** billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.

  19. R

    Landscape Object Detection On Satellite Images With Ai Dataset

    • universe.roboflow.com
    zip
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satellite Images (2023). Landscape Object Detection On Satellite Images With Ai Dataset [Dataset]. https://universe.roboflow.com/satellite-images-i8zj5/landscape-object-detection-on-satellite-images-with-ai
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 28, 2023
    Dataset authored and provided by
    Satellite Images
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Landscape Objects Bounding Boxes
    Description

    Detecting Landscape Objects on Satellite Images with Artificial Intelligence In recent years, there has been a significant increase in the use of artificial intelligence (AI) for image recognition and object detection. This technology has proven to be useful in a wide range of applications, from self-driving cars to facial recognition systems. In this project, the focus lies on using AI to detect landscape objects in satellite images (aerial photography angle) with the goal to create an annotated map of The Netherlands with all the coordinates of the given landscape objects.

    Background Information

    Problem Statement One of the things that Naturalis does is conducting research into the distribution of wild bees (Naturalis, n.d.). For their research they use a model that predicts whether or not a certain species can occur at a given location. Representing the real world in a digital form, there is at the moment not yet a way to generate an inventory of landscape features such as presence of trees, ponds and hedges, with their precise location on the digital map. The current models rely on species observation data and climate variables, but it is expected that adding detailed physical landscape information could increase the prediction accuracy. Common maps do not contain this level of detail, but high-resolution satellite images do.

    Possible opportunities Based on the problem statement, there is at the moment at Naturalis not a map that does contain the level of detail where detection of landscape elements could be made, according to their wishes. The idea emerged that it should be possible to use satellite images to find the locations of small landscape elements and produce an annotated map. Therefore, by refining the accuracy of the current prediction model, researchers can gain a profound understanding of wild bees in the Netherlands with the goal to take effective measurements to protect wild bees and their living environment.

    Goal of project The goal of the project is to develop an artificial intelligence model for landscape detection on satellite images to create an annotated map of The Netherlands that would therefore increase the accuracy prediction of the current model that is used at Naturalis. The project aims to address the problem of a lack of detailed maps of landscapes that could revolutionize the way Naturalis conduct their research on wild bees. Therefore, the ultimate aim of the project in the long term is to utilize the comprehensive knowledge to protect both the wild bees population and their natural habitats in the Netherlands.

    Data Collection Google Earth One of the main challenges of this project was the difficulty in obtaining a qualified dataset (with or without data annotation). Obtaining high-quality satellite images for the project presents challenges in terms of cost and time. The costs in obtaining high-quality satellite images of the Netherlands is 1,038,575 $ in total (for further details and information of the costs of satellite images. On top of that, the acquisition process for such images involves various steps, from the initial request to the actual delivery of the images, numerous protocols and processes need to be followed.

    After conducting further research, the best possible solution was to use Google Earth as the primary source of data. While Google Earth is not allowed to be used for commercial or promotional purposes, this project is for research purposes only for Naturalis on their research of wild bees, hence the regulation does not apply in this case.

  20. d

    Population - Persons with Disabilities

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arlington County (2025). Population - Persons with Disabilities [Dataset]. https://catalog.data.gov/dataset/population-persons-with-disabilities
    Explore at:
    Dataset updated
    Apr 5, 2025
    Dataset provided by
    Arlington County
    Description

    The Arlington Profile combines countywide data sources and provides a comprehensive outlook of the most current data on population, housing, employment, development, transportation, and community services. These datasets are used to obtain an understanding of community, plan future services/needs, guide policy decisions, and secure grant funding. A PDF Version of the Arlington Profile can be accessed on the Arlington County website.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Office for National Statistics (2023). Artificial Intelligence (AI) awareness, use and impact, Great Britain [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/itandinternetindustry/datasets/artificialintelligenceaiawarenessuseandimpactgreatbritain
Organization logo

Artificial Intelligence (AI) awareness, use and impact, Great Britain

Explore at:
xlsxAvailable download formats
Dataset updated
Jun 16, 2023
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Area covered
United Kingdom
Description

Data from the Opinion and Lifestyle Survey (OPN) on the use of Artificial Intelligence (AI) and how people feel about its uptake in today’s society.

Search
Clear search
Close search
Google apps
Main menu