100+ datasets found

Artificial Intelligence (AI) awareness, use and impact, Great Britain
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2023). Artificial Intelligence (AI) awareness, use and impact, Great Britain [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/itandinternetindustry/datasets/artificialintelligenceaiawarenessuseandimpactgreatbritain
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 16, 2023
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
United Kingdom
Description
Data from the Opinion and Lifestyle Survey (OPN) on the use of Artificial Intelligence (AI) and how people feel about its uptake in today’s society.
d
Nationwide real-world implementation of AI for cancer detection in...
search.dataone.org
datadryad.org
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nora Eisemann; Stefan Bunk; Trasias Mukama; Hannah Baltus; Susanne Elsner; Timo Gomille; Gerold Hecht; Sylvia Heywang-KÃ¶brunner; Regine Rathmann; Katja Siegmann-Luz; Thilo TÃ¶llner; Toni Werner Vomweg; Christian Leibig; Alexander Katalinic (2025). Nationwide real-world implementation of AI for cancer detection in population-based mammography screening (PRAIM) [Dataset]. http://doi.org/10.5061/dryad.zs7h44jgn
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zs7h44jgn
Dataset updated
Jan 7, 2025
Dataset provided by
Dryad Digital Repository
Authors
Nora Eisemann; Stefan Bunk; Trasias Mukama; Hannah Baltus; Susanne Elsner; Timo Gomille; Gerold Hecht; Sylvia Heywang-KÃ¶brunner; Regine Rathmann; Katja Siegmann-Luz; Thilo TÃ¶llner; Toni Werner Vomweg; Christian Leibig; Alexander Katalinic
Description
The PRAIM study (PRospective multicenter observationalÂ study of an integrated AI system with live Monitoring) assessed the impact of an AI-based decision support software on breast cancer screening outcomes. This Dryad data package contains the anonymized data from 461 818 screening cases across 12 screening sites in Germany. Variables include screening outcomes like cancer detection, use of AI software, radiologist assessments, cancer characteristics, and further metadata. The data can be used to reproduce the analyses on performance of AI-supported breast cancer screening versus standard of care published in Nature Medicine: Nationwide real-world implementation of AI for cancer detection in population-based mammography screening., , , # Nationwide real-world implementation of AI for cancer detection in population-based mammography screening (PRAIM) â€“ Dataset

The PRAIM study (PRospective multicenter observational study of an integrated Artificial Intelligence system with live Monitoring) was a study conducted within the German breast cancer screening program from July 2021 to February 2023 to assess the impact of an AI-based decision support software. This dataset contains the data from PRAIM.

Context

The PRAIM study has been published in Nature Medicine. Please refer to the article Nationwide real-world implementation of AI for cancer detection in population-based mammography screening for further information on study design, results, and discussion of impact. The study has been previously registered in the German Clinical Trials Register and the study protocol can be found on the [website of the Univ...
f
Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...
frontiersin.figshare.com
zip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith (2023). Data_Sheet_1_Data and model bias in artificial intelligence for healthcare applications in New Zealand.zip [Dataset]. http://doi.org/10.3389/fcomp.2022.1070493.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fcomp.2022.1070493.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Vithya Yogarajan; Gillian Dobbie; Sharon Leitch; Te Taka Keegan; Joshua Bensemann; Michael Witbrock; Varsha Asrani; David Reith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New Zealand
Description
IntroductionDevelopments in Artificial Intelligence (AI) are adopted widely in healthcare. However, the introduction and use of AI may come with biases and disparities, resulting in concerns about healthcare access and outcomes for underrepresented indigenous populations. In New Zealand, Māori experience significant inequities in health compared to the non-Indigenous population. This research explores equity concepts and fairness measures concerning AI for healthcare in New Zealand.MethodsThis research considers data and model bias in NZ-based electronic health records (EHRs). Two very distinct NZ datasets are used in this research, one obtained from one hospital and another from multiple GP practices, where clinicians obtain both datasets. To ensure research equality and fair inclusion of Māori, we combine expertise in Artificial Intelligence (AI), New Zealand clinical context, and te ao Māori. The mitigation of inequity needs to be addressed in data collection, model development, and model deployment. In this paper, we analyze data and algorithmic bias concerning data collection and model development, training and testing using health data collected by experts. We use fairness measures such as disparate impact scores, equal opportunities and equalized odds to analyze tabular data. Furthermore, token frequencies, statistical significance testing and fairness measures for word embeddings, such as WEAT and WEFE frameworks, are used to analyze bias in free-form medical text. The AI model predictions are also explained using SHAP and LIME.ResultsThis research analyzed fairness metrics for NZ EHRs while considering data and algorithmic bias. We show evidence of bias due to the changes made in algorithmic design. Furthermore, we observe unintentional bias due to the underlying pre-trained models used to represent text data. This research addresses some vital issues while opening up the need and opportunity for future research.DiscussionsThis research takes early steps toward developing a model of socially responsible and fair AI for New Zealand's population. We provided an overview of reproducible concepts that can be adopted toward any NZ population data. Furthermore, we discuss the gaps and future research avenues that will enable more focused development of fairness measures suitable for the New Zealand population's needs and social structure. One of the primary focuses of this research was ensuring fair inclusions. As such, we combine expertise in AI, clinical knowledge, and the representation of indigenous populations. This inclusion of experts will be vital moving forward, proving a stepping stone toward the integration of AI for better outcomes in healthcare.
Mental Health Chatbot Pairs
kaggle.com
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mental Health Chatbot Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-chatbot-pairs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 27, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

By Huggingface Hub [source]

About this dataset

This dataset contains a compilation of carefully-crafted Q&A pairs which are designed to provide AI-based tailored support for mental health. These carefully chosen questions and answers offer an avenue for those looking for help to gain the assistance they need. With these pre-processed conversations, Artificial Intelligence (AI) solutions can be developed and deployed to better understand and respond appropriately to individual needs based on their input. This comprehensive dataset is crafted by experts in the mental health field, providing insightful content that will further research in this growing area. These data points will be invaluable for developing the next generation of personalized AI-based mental health chatbots capable of truly understanding what people need

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains pre-processed Q&A pairs for AI-based tailored support for mental health. As such, it represents an excellent starting point in building a conversational model which can handle conversations about mental health issues. Here are some tips on how to use this dataset to its fullest potential:

Understand your data: Spend time getting to know the text of the conversation between the user and the chatbot and familiarize yourself with what type of questions and answers are included in this specific dataset. This will help you better formulate queries for your own conversational model or develop new ones you can add yourself.

Refine your language processing models: By studying the patterns in syntax, grammar, tone, voice, etc., within this conversational data set you can hone your natural language processing capabilities - such as keyword extractions or entity extraction – prior to implementing them into a larger bot system .

Test assumptions: Have an idea of what you think may work best with a particular audience or context? See if these assumptions pan out by applying different variations of text to this dataset to see if it works before rolling out changes across other channels or programs that utilize AI/chatbot services

Research & Analyze Results : After testing out different scenarios on real-world users by using various forms of q&a within this chatbot pair data set , analyze & record any relevant results pertaining towards understanding user behavior better through further analysis after being exposed to tailored texted conversations about Mental Health topics both passively & actively . The more information you collect here , leads us closer towards creating effective AI powered conversations that bring our desired outcomes from our customer base .

Research Ideas

Developing a chatbot for personalized mental health advice and guidance tailored to individuals' unique needs, experiences, and struggles.

Creating an AI-driven diagnostic system that can interpret mental health conversations and provide targeted recommendations for interventions or treatments based on clinical expertise.

Designing an AI-powered recommendation engine to suggest relevant content such as articles, videos, or podcasts based on users’ questions or topics of discussion during their conversation with the chatbot

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------------------------| | text | The text of the conversation between the user and the chatbot. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
d
Development of an AI/ML-ready knee ultrasound dataset in a population-based...
dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nelson, Amanda (2023). Development of an AI/ML-ready knee ultrasound dataset in a population-based cohort [Dataset]. http://doi.org/10.7910/DVN/SKP9IB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKP9IB
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Nelson, Amanda
Description
About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1). Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments. Metadata The DatasetMetadata.json file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata. Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.
O
Special Population use of Service Category
data.austintexas.gov
datahub.austintexas.gov
+2more
application/rdfxml +5
Updated Apr 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Austin, Texas - data.austintexas.gov (2021). Special Population use of Service Category [Dataset]. https://data.austintexas.gov/Health-and-Community-Services/Special-Population-use-of-Service-Category/jwva-euqc
Explore at:
json, csv, xml, application/rssxml, application/rdfxml, tsvAvailable download formats
Dataset updated
Apr 28, 2021
Dataset authored and provided by
City of Austin, Texas - data.austintexas.gov
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This data set contains EIIHA populations who received services funded by Ryan White Part A Grant. EIIHA is Early Identification of Individuals with HIV/AIDS (EIIHA) The special populations (EIIHA) with HIV are: Black MSM = Black men and Black transgender women who have sex with men. Latinx MSM = Latinx men and Latinx Transgender women who have sex with men. Black Women - Black women Transgender - Transgender men and women. These populations have the biggest disparities of people living with HIV. Other data is the number of clients and units used in each service category in the Ryan White Part A, a grant that provides services for those with HIV.
S
Fairness perceptions of AI use by tax administration
sodha.be
datacatalogue.cessda.eu
+1more
csv, tsv
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anouk Decuypere; Anouk Decuypere (2024). Fairness perceptions of AI use by tax administration [Dataset]. http://doi.org/10.34934/DVN/1NHGXH
Explore at:
csv(926438), csv(69827), tsv(71300), tsv(966134)Available download formats
Unique identifier
https://doi.org/10.34934/DVN/1NHGXH
Dataset updated
Mar 28, 2024
Dataset provided by
Social Sciences and Digital Humanities Archive – SODHA
Authors
Anouk Decuypere; Anouk Decuypere
License
https://www.sodha.be/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34934/DVN/1NHGXHhttps://www.sodha.be/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34934/DVN/1NHGXH
Time period covered
2022 - 2023
Dataset funded by
Research Foundation Flanders
Belgian National Bank
Description
We tested whether the proportion of AI versus auditors in fraud selection matters for fairness, and whether there is an impact of transparency (explanations). We found that a higher proportion of AI was more procedurally fair, mostly through bias suppression and consistency, and that the attitude toward AI and trust in the administration explained most variance. Transparency (explanations) had no impact. We also found two small negative interaction effects concerning trust and procedural fairness: with high trust in the tax administration, fairness increased less (as AI increased). Conversely, with low trust, fairness increased more (as AI increased). Dataset 1 was used for the pilot (with students and professionals) Dataset 2 was a representative dataset for the Flemish population.
LLM - Detect AI Generated Text Dataset
kaggle.com
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sunil thite (2023). LLM - Detect AI Generated Text Dataset [Dataset]. https://www.kaggle.com/datasets/sunilthite/llm-detect-ai-generated-text-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sunil thite
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
In this Dataset contains both AI Generated Essay and Human Written Essay for Training Purpose This dataset challenge is to to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM. The competition dataset comprises a mix of student-written essays and essays generated by a variety of LLMs.

Dataset contains more than 28,000 essay written by student and AI generated.

Features : 1. text : Which contains essay text 2. generated : This is target label . 0 - Human Written Essay , 1 - AI Generated Essay
A
‘Special Population use of Service Category’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Special Population use of Service Category’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-special-population-use-of-service-category-7524/fb7f736d/?iid=002-929&v=presentation
Explore at:
Dataset updated
Jan 26, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Special Population use of Service Category’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/7cf74acd-3ad9-4d02-a58f-2a5e352d8b9c on 26 January 2022.

--- Dataset description provided by original source is as follows ---

This data set contains EIIHA populations who received services funded by Ryan White Part A Grant. EIIHA is Early Identification of Individuals with HIV/AIDS (EIIHA) The special populations (EIIHA) with HIV are: Black MSM = Black men and Black transgender women who have sex with men. Latinx MSM = Latinx men and Latinx Transgender women who have sex with men. Black Women - Black women Transgender - Transgender men and women. These populations have the biggest disparities of people living with HIV. Other data is the number of clients and units used in each service category in the Ryan White Part A, a grant that provides services for those with HIV.

--- Original source retains full ownership of the source dataset ---
A
‘2021 World Population (updated daily)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘2021 World Population (updated daily)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-2021-world-population-updated-daily-3a7e/latest
Explore at:
Dataset updated
Jan 29, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Analysis of ‘2021 World Population (updated daily)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rsrishav/world-population on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

2021 World Population dataset which gets updated daily.

Content

2021_population.csv: File contains data for only live 2021 population count which gets updated daily. Also contains more information about the country's growth rate, area, etc. timeseries_population_count.csv: File contains data for live population count which gets updated daily but it contains last updated data also. Data in this file is managed day-wise.

Inspiration

This type of data can be used for population-related use cases. Like, my own dataset COVID Vaccination in World (updated daily), which requires population data. I believe there are more use cases that I didn't explore yet but might other Kaggler needs this. Time-series related use-case can be implemented on this data but I know it will take time to compile that amount of data. So stay tuned.

--- Original source retains full ownership of the source dataset ---
d
Public Use Microdata Samples (PUMS)
datasets.ai
cmr.earthdata.nasa.gov
+1more
21, 22
Updated Sep 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2024). Public Use Microdata Samples (PUMS) [Dataset]. https://datasets.ai/datasets/public-use-microdata-samples-pums
Explore at:
21, 22Available download formats
Dataset updated
Sep 7, 2024
Dataset authored and provided by
National Aeronautics and Space Administration
Description
The Public Use Microdata Samples (PUMS) are computer-accessible files containing records for a sample of housing Units, with information on the characteristics of each housing Unit and the people in it for 1940-1990. Within the limits of sample size and geographical detail, these files allow users to prepare virtually any tabulations they require. Each datafile is documented in a codebook containing a data dictionary and supporting appendix information. Electronic versions for the codebooks are only available for the 1980 and 1990 datafiles. Identifying information has been removed to protect the confidentiality of the respondents. PUMS is produced by the United States Census Bureau (USCB) and is distributed by USCB, Inter-university Consortium for Political and Social Research (ICPSR), and Columbia University Center for International Earth Science Information Network (CIESIN).
A
‘Population by Country - 2020’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Population by Country - 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-population-by-country-2020-c8b7/latest
Explore at:
Dataset updated
Feb 13, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.

Content

Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.

Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">

You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

Below is the code that I used to scrape the code from the website

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">

Acknowledgements

Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.

Inspiration

As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting

--- Original source retains full ownership of the source dataset ---
Multi-turn Prompts Dataset
kaggle.com
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SoftAge.AI (2024). Multi-turn Prompts Dataset [Dataset]. https://www.kaggle.com/datasets/softageai/multi-turn-prompts-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SoftAge.AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description This dataset consists of 400 text-only fine-tuned versions of multi-turn conversations in the English language based on 10 categories and 19 use cases. It has been generated with ethically sourced human-in-the-loop data methods and aligned with supervised fine-tuning, direct preference optimization, and reinforcement learning through human feedback.

The human-annotated data is focused on data quality and precision to enhance the generative response of models used for AI chatbots, thereby improving their recall memory and recognition ability for continued assistance.

Key Features Prompts focused on user intent and were devised using natural language processing techniques. Multi-turn prompts with up to 5 turns to enhance responsive memory of large language models for pretraining. Conversational interactions for queries related to varied aspects of writing, coding, knowledge assistance, data manipulation, reasoning, and classification.

Dataset Source Subject matter expert annotators @SoftAgeAI have annotated the data at simple and complex levels, focusing on quality factors such as content accuracy, clarity, coherence, grammar, depth of information, and overall usefulness.

Structure & Fields The dataset is organized into different columns, which are detailed below:

P1, R1, P2, R2, P3, R3, P4, R4, P5 (object): These columns represent the sequence of prompts (P) and responses (R) within a single interaction. Each interaction can have up to 5 prompts and 5 corresponding responses, capturing the flow of a conversation. The prompts are user inputs, and the responses are the model's outputs. Use Case (object): Specifies the primary application or scenario for which the interaction is designed, such as "Q&A helper" or "Writing assistant." This classification helps in identifying the purpose of the dialogue. Type (object): Indicates the complexity of the interaction, with entries labeled as "Complex" in this dataset. This denotes that the dialogues involve more intricate and multi-layered exchanges. Category (object): Broadly categorizes the interaction type, such as "Open-ended QA" or "Writing." This provides context on the nature of the conversation, whether it is for generating creative content, providing detailed answers, or engaging in complex problem-solving. Intended Use Cases

The dataset can enhance query assistance model functioning related to shopping, coding, creative writing, travel assistance, marketing, citation, academic writing, language assistance, research topics, specialized knowledge, reasoning, and STEM-based. The dataset intends to aid generative models for e-commerce, customer assistance, marketing, education, suggestive user queries, and generic chatbots. It can pre-train large language models with supervision-based fine-tuned annotated data and for retrieval-augmented generative models. The dataset stands free of violence-based interactions that can lead to harm, conflict, discrimination, brutality, or misinformation. Potential Limitations & Biases This is a static dataset, so the information is dated May 2024.

Note If you have any questions related to our data annotation and human review services for large language model training and fine-tuning, please contact us at SoftAge Information Technology Limited at info@softage.ai.
R
People Ai Dataset
universe.roboflow.com
zip
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milestone01 (2025). People Ai Dataset [Dataset]. https://universe.roboflow.com/milestone01/people-ai/model/8
Explore at:
zipAvailable download formats
Dataset updated
Jun 7, 2025
Dataset authored and provided by
Milestone01
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
People Bounding Boxes
Description
People AI

## Overview People AI is a dataset for object detection tasks - it contains People annotations for 1,250 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
GPT vs. Human: A Corpus of Research Abstracts
kaggle.com
Updated Oct 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helene Eriksen (2023). GPT vs. Human: A Corpus of Research Abstracts [Dataset]. https://www.kaggle.com/datasets/heleneeriksen/gpt-vs-human-a-corpus-of-research-abstracts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2023
Dataset provided by
Kaggle
Authors
Helene Eriksen
Description
Overview: This dataset offers a unique collection of research abstracts, comprising both human-written and AI-generated versions. Each entry provides a title, followed by the abstract text, with annotations specifying if the abstract was human-authored or generated by GPT. This dataset was created for the research article Detecting AI Authorship: Analyzing Descriptive Features for AI Detection.

Structure: The dataset is structured in the following manner:

title: A research paper's title that remains consistent for both human-written and GPT-generated abstracts.

abstract: The main content of the abstract. Each title is associated with two abstract texts — one penned by a human author, and another created by GPT.

ai_generated (Boolean): True indicates the abstract was generated by GPT. False indicates the abstract was human-authored. is_ai_generated (Binary): 1 denotes an AI-generated abstract. 0 denotes a human-written abstract.

Human abstracted taken from this dataset: https://www.kaggle.com/datasets/Cornell-University/arxiv

Licence: This dataset is under the MIT licence (https://opensource.org/license/mit/) meaning that "any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software... "
Dec 1998 Current Population Survey: Computer and Internet Use Supplement
s.cnmilf.com
datasets.ai
+1more
Updated Jul 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). Dec 1998 Current Population Survey: Computer and Internet Use Supplement [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/dec-1998-current-population-survey-computer-and-internet-use-supplement
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
Information on person and household broadband (high-speed Internet) use, where it is used, by what types of devices, what type of service provider, and other characteristics.
LLM: 7 prompt training dataset
kaggle.com
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carl McBride Ellis
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
File: train_essays_RDizzl3_seven_v2.csv
Human texts: 14247 LLM texts: 3004

See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts

Version 3: "**The RDizzl3 Seven**"
File: train_essays_RDizzl3_seven_v1.csv

"Car-free cities"

"Does the electoral college work?"

"Exploring Venus"

"The Face on Mars"

"Facial action coding system"

"A Cowboy Who Rode the Waves"

"Driverless cars"

How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

Namely:

"Car-free cities"

"Does the electoral college work?"

"Exploring Venus"

"The Face on Mars"

"Facial action coding system"

"Seeking multiple opinions"

"Phones and driving"

This dataset is a derivative of the datasets

LLM Generated Essays for the Detect AI Comp! by Radek Osmulski

persuade corpus 2.0 provided by Nicholas Broad

daigt data - llama 70b and falcon180b by Nicholas Broad

Hello, Claude! 1000 essays from Anthropic... by Darragh

as well as the original competition training dataset

Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
AI corporate investment worldwide 2015-2022
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). AI corporate investment worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/941137/ai-investment-and-funding-worldwide/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2022, the global total corporate investment in artificial intelligence (AI) reached almost ** billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than ******* since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world. What is Artificial Intelligence (AI)? Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes. AI investment and startups The global AI market, valued at ***** billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by **** billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.
R
Landscape Object Detection On Satellite Images With Ai Dataset
universe.roboflow.com
zip
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satellite Images (2023). Landscape Object Detection On Satellite Images With Ai Dataset [Dataset]. https://universe.roboflow.com/satellite-images-i8zj5/landscape-object-detection-on-satellite-images-with-ai
Explore at:
zipAvailable download formats
Dataset updated
Jun 28, 2023
Dataset authored and provided by
Satellite Images
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Landscape Objects Bounding Boxes
Description
Detecting Landscape Objects on Satellite Images with Artificial Intelligence In recent years, there has been a significant increase in the use of artificial intelligence (AI) for image recognition and object detection. This technology has proven to be useful in a wide range of applications, from self-driving cars to facial recognition systems. In this project, the focus lies on using AI to detect landscape objects in satellite images (aerial photography angle) with the goal to create an annotated map of The Netherlands with all the coordinates of the given landscape objects.

Background Information

Problem Statement One of the things that Naturalis does is conducting research into the distribution of wild bees (Naturalis, n.d.). For their research they use a model that predicts whether or not a certain species can occur at a given location. Representing the real world in a digital form, there is at the moment not yet a way to generate an inventory of landscape features such as presence of trees, ponds and hedges, with their precise location on the digital map. The current models rely on species observation data and climate variables, but it is expected that adding detailed physical landscape information could increase the prediction accuracy. Common maps do not contain this level of detail, but high-resolution satellite images do.

Possible opportunities Based on the problem statement, there is at the moment at Naturalis not a map that does contain the level of detail where detection of landscape elements could be made, according to their wishes. The idea emerged that it should be possible to use satellite images to find the locations of small landscape elements and produce an annotated map. Therefore, by refining the accuracy of the current prediction model, researchers can gain a profound understanding of wild bees in the Netherlands with the goal to take effective measurements to protect wild bees and their living environment.

Goal of project The goal of the project is to develop an artificial intelligence model for landscape detection on satellite images to create an annotated map of The Netherlands that would therefore increase the accuracy prediction of the current model that is used at Naturalis. The project aims to address the problem of a lack of detailed maps of landscapes that could revolutionize the way Naturalis conduct their research on wild bees. Therefore, the ultimate aim of the project in the long term is to utilize the comprehensive knowledge to protect both the wild bees population and their natural habitats in the Netherlands.

Data Collection Google Earth One of the main challenges of this project was the difficulty in obtaining a qualified dataset (with or without data annotation). Obtaining high-quality satellite images for the project presents challenges in terms of cost and time. The costs in obtaining high-quality satellite images of the Netherlands is 1,038,575 $ in total (for further details and information of the costs of satellite images. On top of that, the acquisition process for such images involves various steps, from the initial request to the actual delivery of the images, numerous protocols and processes need to be followed.

After conducting further research, the best possible solution was to use Google Earth as the primary source of data. While Google Earth is not allowed to be used for commercial or promotional purposes, this project is for research purposes only for Naturalis on their research of wild bees, hence the regulation does not apply in this case.
d
Population - Persons with Disabilities
catalog.data.gov
datasets.ai
+1more
Updated Apr 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arlington County (2025). Population - Persons with Disabilities [Dataset]. https://catalog.data.gov/dataset/population-persons-with-disabilities
Explore at:
Dataset updated
Apr 5, 2025
Dataset provided by
Arlington County
Description
The Arlington Profile combines countywide data sources and provides a comprehensive outlook of the most current data on population, housing, employment, development, transportation, and community services. These datasets are used to obtain an understanding of community, plan future services/needs, guide policy decisions, and secure grant funding. A PDF Version of the Arlington Profile can be accessed on the Arlington County website.

Facebook

Twitter

Click to copy link

Link copied

Cite

Office for National Statistics (2023). Artificial Intelligence (AI) awareness, use and impact, Great Britain [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/itandinternetindustry/datasets/artificialintelligenceaiawarenessuseandimpactgreatbritain

Artificial Intelligence (AI) awareness, use and impact, Great Britain

Explore at:

xlsxAvailable download formats

Dataset updated

Jun 16, 2023

Dataset provided by

Office for National Statisticshttp://www.ons.gov.uk/

License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Area covered

United Kingdom

Description

Data from the Opinion and Lifestyle Survey (OPN) on the use of Artificial Intelligence (AI) and how people feel about its uptake in today’s society.

Clear search

Close search

Google apps

Main menu

Artificial Intelligence (AI) awareness, use and impact, Great Britain

Nationwide real-world implementation of AI for cancer detection in...

Context

Data_Sheet_1_Data and model bias in artificial intelligence for healthcare...

Mental Health Chatbot Pairs

Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Development of an AI/ML-ready knee ultrasound dataset in a population-based...

Special Population use of Service Category

Fairness perceptions of AI use by tax administration

LLM - Detect AI Generated Text Dataset

‘Special Population use of Service Category’ analyzed by Analyst-2

‘2021 World Population (updated daily)’ analyzed by Analyst-2

Context

Content

Inspiration

Public Use Microdata Samples (PUMS)

‘Population by Country - 2020’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Multi-turn Prompts Dataset

People Ai Dataset

People AI

GPT vs. Human: A Corpus of Research Abstracts

Dec 1998 Current Population Survey: Computer and Internet Use Supplement

LLM: 7 prompt training dataset

AI corporate investment worldwide 2015-2022

Landscape Object Detection On Satellite Images With Ai Dataset

Population - Persons with Disabilities

Artificial Intelligence (AI) awareness, use and impact, Great Britain