93 datasets found
  1. LLM RAG Chatbot Training Dataset

    • kaggle.com
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Life Bricks Global (2025). LLM RAG Chatbot Training Dataset [Dataset]. https://www.kaggle.com/datasets/lifebricksglobal/llm-rag-chatbot-training-dataset
    Explore at:
    zip(199960 bytes)Available download formats
    Dataset updated
    May 20, 2025
    Authors
    Life Bricks Global
    Description

    We’ve developed another annotated dataset designed specifically for conversational AI and companion AI model training.

    Watch: How To Use The Dataset

    What you have here on Kaggle is our free sample - Think Salon Kitty meets AI

    The 'Time Waster Identification & Retreat Model Dataset', enables AI handler agents to detect when users are likely to churn—saving valuable tokens and preventing wasted compute cycles in conversational models.

    This batch has 167 entries annotated for sentiment, intent, user risk flagging (via behavioural tracking), user Recovery Potential per statement; among others. This dataset is designed to be a niche micro dataset for a specific use case: Time Waster Identification and Retreat.

    👉 Buy the updated version: https://lifebricksglobal.gumroad.com/l/Time-WasterDetection-Dataset

    This dataset is perfect for:

    • Fine-tuning LLM routing logic
    • Building intelligent AI agents for customer engagement
    • Companion AI training + moderation modelling
    • This is part of a broader series of human-agent interaction datasets we are releasing under our independent data licensing program.

    It is designed for AI researchers and developers building:

    • Conversational AI agents
    • Companion AI models
    • Human-agent interaction simulators
    • LLM routing optimization models

    Use case:

    • Conversational AI
    • Companion AI
    • Defence & Aerospace
    • Customer Support AI
    • Gaming / Virtual Worlds
    • LLM Safety Research
    • AI Orchestration Platforms

    This batch has 167 entries annotated for sentiment, intent, user risk flagging (via behavioural tracking), user Recovery Potential per statement; among others. This dataset is designed to be a niche micro dataset for a specific use case: Time Waster Identification and Retreat.

    👉 Good for teams working on conversational AI, companion AI, fraud detectors and those integrating routing logic for voice/chat agents

    👉 Buy the updated version: https://lifebricksglobal.gumroad.com/l/Time-WasterDetection-Dataset

    Contact us on LinkedIn: Life Bricks Global.

    License:

    This dataset is provided under a custom license. By using the dataset, you agree to the following terms:

    Usage: You are allowed to use the dataset for non-commercial purposes, including research, development, and machine learning model training.

    Modification: You may modify the dataset for your own use.

    Redistribution: Redistribution of the dataset in its original or modified form is not allowed without permission.

    Attribution: Proper attribution must be given when using or referencing this dataset.

    No Warranty: The dataset is provided "as-is" without any warranties, express or implied, regarding its accuracy, completeness, or fitness for a particular purpose.

  2. PERU MIGRANT Study | Baseline and 5yr follow-up dataset

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    bin
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Jaime Miranda; Antonio Bernabe-Ortiz; Rodrigo Carrillo Larco (2023). PERU MIGRANT Study | Baseline and 5yr follow-up dataset [Dataset]. http://doi.org/10.6084/m9.figshare.4832612.v4
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    J. Jaime Miranda; Antonio Bernabe-Ortiz; Rodrigo Carrillo Larco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Peru
    Description

    This is an update of a prior dataset publication containing baseline and 5-year follow-up data from the PERU MIGRANT Study (PEru's Rural to Urban MIGRANTs Study).The PERU MIGRANT Study was designed to investigate the magnitude of differences between rural-to-urban migrant and non-migrant groups in specific cardiovascular risk factors. Three groups were selected: i) Rural, people who have always have lived in a rural environment; ii) Rural-urban, people who migrated from rural to urban areas; and, iii) Urban, people who have always lived in a urban environment.PERU MIGRANT Study protocol, instruments and variables are described in full in:Miranda JJ, Gilman RH, García HH, Smeeth L. The effect on cardiovascular risk factors of migration from rural to urban areas in Peru: PERU MIGRANT Study. BMC Cardiovasc Disord 2009;9:23. PERU MIGRANT Study baseline dataset is available at:https://figshare.com/articles/PERU_MIGRANT_Study_Baseline_dataset/3125005Main findings of the baseline study:Miranda JJ, Gilman RH, Smeeth L. Differences in cardiovascular risk factors in rural, urban and rural-to-urban migrants in Peru. Heart 2011;97(10):787-96. Main findings of the 5-yr follow-up study: Carrillo-Larco RM, Bernabé-Ortiz A, Pillay TD, Gilman RH, Sanchez JF, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Obesity risk in rural, urban and rural-to-urban migrants: prospective results of the PERU MIGRANT study. Int J Obes (Lond) 2016;40(1):181-5. Bernabe-Ortiz A, Sanchez JF, Carrillo-Larco RM, Gilman RH, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Rural-to-urban migration and risk of hypertension: longitudinal results of the PERU MIGRANT study. J Hum Hypertens 2017;31(1):22-28. Lazo-Porras M, Bernabe-Ortiz A, Målaga G, Gilman RH, Acuña-Villaorduña A, Cardenas-Montero D, Smeeth L, Miranda JJ. Low HDL cholesterol as a cardiovascular risk factor in rural, urban, and rural-urban migrants: PERU MIGRANT cohort study. Atherosclerosis 2016;246:36-43.Burroughs Pena MS, Bernabé-Ortiz A, Carrillo-Larco RM, Sånchez JF, Quispe R, Pillay TD, Målaga G, Gilman RH, Smeeth L, Miranda JJ. Migration, urbanisation and mortality: 5-year longitudinal analysis of the PERU MIGRANT study. J Epidemiol Community Health 2015;69(7):715-8.

  3. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  4. d

    Accidents

    • datasets.ai
    • data.bloomington.in.gov
    • +1more
    23, 40, 55, 8
    Updated May 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Bloomington (2023). Accidents [Dataset]. https://datasets.ai/datasets/accidents-d4eba
    Explore at:
    8, 40, 23, 55Available download formats
    Dataset updated
    May 19, 2023
    Dataset authored and provided by
    City of Bloomington
    Description

    Bloomington Police Department Calls for Service that reported an accident.

    Note that this is every call for service that documents an accident, regardless of the outcome of the accident. Not all accidents become State Crash Reports, and, therefore, the data contained in this set will not match accident data supplied by the Indiana State Police.This set of raw data contains information from Bloomington Police Department Calls for Service that reported an accident.

    Key code for Race:

    A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic

    Key Code for Reading Districts:

    Example: LB519

    L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector.

    Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  5. d

    Armored Rescue Vehicle Use

    • datasets.ai
    • data.bloomington.in.gov
    • +3more
    23, 40, 55, 8
    Updated May 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Bloomington (2023). Armored Rescue Vehicle Use [Dataset]. https://datasets.ai/datasets/armored-rescue-vehicle-use-ead99
    Explore at:
    23, 55, 8, 40Available download formats
    Dataset updated
    May 19, 2023
    Dataset authored and provided by
    City of Bloomington
    Description

    Bloomington Police Department Calls for Service that resulted in the use of an armored rescue vehicle.

    Key code for Race:

    A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic

    Key Code for Reading Districts:

    Example: LB519

    L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector.

    Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  6. d

    Officers Assaulted

    • catalog.data.gov
    • data.bloomington.in.gov
    • +1more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.bloomington.in.gov (2025). Officers Assaulted [Dataset]. https://catalog.data.gov/dataset/officers-assaulted-826cf
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    data.bloomington.in.gov
    Description

    Information found in this report follow the Uniformed Crime Reporting guidelines established by the FBI for LEOKA. Key code for Race: A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic Key Code for Reading Districts: Example: LB519 L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector. Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  7. g

    Coronavirus (Covid-19) Data in the United States

    • github.com
    • openicpsr.org
    • +4more
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
    Explore at:
    csvAvailable download formats
    Dataset provided by
    New York Times
    License

    https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE

    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  8. d

    Data from: Nuisance Complaints

    • catalog.data.gov
    • data.bloomington.in.gov
    • +2more
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.bloomington.in.gov (2025). Nuisance Complaints [Dataset]. https://catalog.data.gov/dataset/nuisance-complaints-a7ed9
    Explore at:
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    data.bloomington.in.gov
    Description

    Calls for Service, specifically for alcohol related, disturbance, intoxication, noise, panhandling, and vandalism. Key code for Race: A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic Key Code for Reading Districts: Example: LB519 L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector. Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  9. w

    Fire statistics data tables

    • gov.uk
    • s3.amazonaws.com
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2025). Fire statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/fire-statistics-data-tables
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    GOV.UK
    Authors
    Ministry of Housing, Communities and Local Government
    Description

    On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.

    This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.

    MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/">Northern Ireland: Fire and Rescue Statistics.

    If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@communities.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Fire statistics guidance
    Fire statistics incident level datasets

    Incidents attended

    https://assets.publishing.service.gov.uk/media/68f0f810e8e4040c38a3cf96/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 143 KB) Previous FIRE0101 tables

    https://assets.publishing.service.gov.uk/media/68f0ffd528f6872f1663ef77/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 2.12 MB) Previous FIRE0102 tables

    https://assets.publishing.service.gov.uk/media/68f20a3e06e6515f7914c71c/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 197 KB) Previous FIRE0103 tables

    https://assets.publishing.service.gov.uk/media/68f20a552f0fc56403a3cfef/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 443 KB) Previous FIRE0104 tables

    Dwelling fires attended

    https://assets.publishing.service.gov.uk/media/68f100492f0fc56403a3cf94/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, 192 KB) Previous FIRE0201 tables

    <span class="gem

  10. d

    Stolen Guns

    • datasets.ai
    • bloomington.data.socrata.com
    • +2more
    23, 40, 55, 8
    Updated May 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Bloomington (2023). Stolen Guns [Dataset]. https://datasets.ai/datasets/stolen-guns-b43e8
    Explore at:
    55, 23, 40, 8Available download formats
    Dataset updated
    May 19, 2023
    Dataset authored and provided by
    City of Bloomington
    Description

    Information from Bloomington Police Department regarding guns reported stolen.

    Key code for Race:

    A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic

    Key Code for Reading Districts:

    Example: LB519

    L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector.

    Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  11. g

    Domestic Violence

    • gimi9.com
    • data.bloomington.in.gov
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Domestic Violence [Dataset]. https://gimi9.com/dataset/data-gov_domestic-violence-1dc16/
    Explore at:
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    These Bloomington Police Department cases have been identified as Domestic Battery using the State Statue definition of 'domestic'. Key code for Race: A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic Key Code for Reading Districts: Example: LB519 L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector. Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  12. mmlu

    • huggingface.co
    Updated May 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for AI Safety (2023). mmlu [Dataset]. https://huggingface.co/datasets/cais/mmlu
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2023
    Dataset authored and provided by
    Center for AI Safetyhttps://safe.ai/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for MMLU

      Dataset Summary
    

    Measuring Massive Multitask Language Understanding by Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt (ICLR 2021). This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn. This covers 57 tasks
 See the full description on the dataset page: https://huggingface.co/datasets/cais/mmlu.

  13. g

    Accidents

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Accidents [Dataset]. https://gimi9.com/dataset/data-gov_accidents-d4eba/
    Explore at:
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    đŸ‡ș🇾 ëŻžê”­ English Bloomington Police Department Calls for Service that reported an accident. Note that this is every call for service that documents an accident, regardless of the outcome of the accident. Not all accidents become State Crash Reports, and, therefore, the data contained in this set will not match accident data supplied by the Indiana State Police.This set of raw data contains information from Bloomington Police Department Calls for Service that reported an accident. Key code for Race: A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic Key Code for Reading Districts: Example: LB519 L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector. Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  14. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petroc Taylor (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/topics/1464/big-data/
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Petroc Taylor
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just 2 percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.

  15. g

    Calls for Service

    • gimi9.com
    • data.bloomington.in.gov
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calls for Service [Dataset]. https://gimi9.com/dataset/data-gov_calls-for-service-6702d/
    Explore at:
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Information from the Bloomington Police Department on all calls for service received. Key code for Race: A- Asian/Pacific Island, Non-Hispanic B- African American, Non-Hispanic C- Hawaiian/Other Pacific Island, Hispanic H- Hawaiian/Other Pacific Island, Non-Hispanic I- Indian/Alaskan Native, Non-Hispanic K- African American, Hispanic L- Caucasian, Hispanic N- Indian/Alaskan Native, Hispanic P- Asian/Pacific Island, Hispanic S- Asian, Non-Hispanic T- Asian, Hispanic U- Unknown W- Caucasian, Non-Hispanic Key Code for Reading Districts: Example: LB519 L for Law call or incident B stands for Bloomington 5 is the district or beat where incident occurred All numbers following represents a grid sector. Disclaimer: The Bloomington Police Department takes great effort in making open data as accurate as possible, but there is no avoiding the introduction of errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

  16. e

    Simple download service (Atom) of the dataset: Cynegetic interest group of...

    • data.europa.eu
    • gimi9.com
    unknown
    Updated Apr 4, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Simple download service (Atom) of the dataset: Cynegetic interest group of l’Orne [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-07cb46fc-ea79-4050-897f-ccb545071d16/embed
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Apr 4, 2019
    Description

    This resource describes the zoning associated with a Cyngetic Interest Grouping (GIC) in the department of Orne.

    The “Groupement d’IntĂ©ĂȘt CynĂ©gĂ©tique” does not cover any particular legal regime. They represent a group of people who have grouped together to carry out game management actions in a given geographical area.

    The establishment of a Cynegetic Interest Grouping (GIC) is due solely to the will of the holders of hunting rights (associations, individuals, etc.) to coordinate actions in favour of a species, either reintroduced or in a precarious situation and whose staff must be restored in order to allow for future levies. Third parties can integrate these ICMs, such as the FDC (Departmental Hunters Federation) which provide interesting technical or administrative support.

    These good management practices are in turn beneficial to other species. This approach also makes it possible to associate other users of the territory with the practice of hunting, as was the case with the GIC de la Sainte-Victoire (Bouches-du-RhĂŽne) where the afluence of tourists requires an appropriate management of game.

  17. City of Los Angeles Crime data

    • kaggle.com
    zip
    Updated Apr 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramin Huseyn (2024). City of Los Angeles Crime data [Dataset]. https://www.kaggle.com/datasets/raminhuseyn/crime-data-from-2020-to-present
    Explore at:
    zip(48433749 bytes)Available download formats
    Dataset updated
    Apr 29, 2024
    Authors
    Ramin Huseyn
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Area covered
    Los Angeles
    Description

    This dataset reflects incidents of crime in the City of Los Angeles dating back to 2020. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. The dataset contains 2,083,227 rows and 29 columns.

    Column nameDescription
    DR_NODivision of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits
    Date RptdMM/DD/YYYY
    DATE OCCMM/DD/YYYY
    TIME OCCIn 24 hour military time.
    AREAThe LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21.
    AREA NAMEThe 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for.
    Crm CdIndicates the crime committed. (Same as Crime Code 1)
    Crm Cd DescDefines the Crime Code provided.
    MocodesModus Operandi: Activities associated with the suspect in commission of the crime
    Vict AgeVictim age
    Vict SexF - Female M - Male X - Unknown
    Vict DescentDescent Code: A - Other Asian B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian
    Premis CdThe type of structure, vehicle, or location where the crime took place.
    Premis DescDefines the Premise Code provided
    Weapon Used CdThe type of weapon used in the crime.
    Weapon DescDefines the Weapon Used Code provided.
    StatusStatus of the case. (IC is the default)
    Status DescDefines the Status Code provided.
    Crm Cd 1Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 are respectively less serious offenses. Lower crime class numbers are more serious.
    Crm Cd 2May contain a code for an additional crime, less serious than Crime Code 1.
    Crm Cd 3May contain a code for an additional crime, less serious than Crime Code 1
    Crm Cd 4May contain a code for an additional crime, less serious than Crime Code 1.
    LOCATIONStreet address of crime incident rounded to the nearest hundred block to maintain anonymity.
    Cross StreetCross Street of rounded Address.
    LATLatitude
    LONLongtitude
  18. C

    Data from: Median Income

    • data.ccrpc.org
    csv
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Champaign County Regional Planning Commission (2025). Median Income [Dataset]. https://data.ccrpc.org/dataset/median-income
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset authored and provided by
    Champaign County Regional Planning Commission
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The estimated median household income and estimated median family income are two separate measures: every family is a household, but not every household is a family. According to the U.S. Census Bureau definitions of the terms, a family “includes a householder and one or more people living in the same household who are related to the householder by birth, marriage, or adoption,”[1] while a household “includes all the people who occupy a housing unit,” including households of just one person[2]. When evaluated together, the estimated median household income and estimated median family income provide a thorough picture of household-level economics in Champaign County.

    Both estimated median household income and estimated median family income were higher in 2024 than in 2005. The change in estimated median household income between 2023 and 2024 was not statistically significant. However, the increase in estimated median family income between 2023 and 2024 was statistically significant. Estimated median family income is consistently higher than estimated median household income, largely due to the definitions of each term, and the types of household that are measured and are not measured in each category.

    Median income data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.

    As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

    Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

    For interested data users, the 2020 ACS 1-Year Experimental data release includes datasets on Median Household Income in the Past 12 Months (in 2020 Inflation-Adjusted Dollars) and Median Family Income in the Past 12 Months (in 2020 Inflation-Adjusted Dollars).

    [1] U.S. Census Bureau. (Date unknown). Glossary. “Family Household.” (Accessed 19 April 2016).

    [2] U.S. Census Bureau. (Date unknown). Glossary. “Household.” (Accessed 19 April 2016).

    Sources: U.S. Census Bureau; American Community Survey, 2024 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using data.census.gov; (2 December 2025).; U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using data.census.gov; (17 October 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using data.census.gov; (18 September 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using data.census.gov; (3 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using data.census.gov; (7 June 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using data.census.gov; (7 June 2021).;U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S1903; generated by CCRPC staff; using American FactFinder; (16 March 2016).

  19. C

    CTA - List of CTA Datasets

    • transitchicago.com
    • data.cityofchicago.org
    • +4more
    csv, xlsx, xml
    Updated May 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Transit Authority (2017). CTA - List of CTA Datasets [Dataset]. https://www.transitchicago.com/data/
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    May 17, 2017
    Dataset authored and provided by
    Chicago Transit Authority
    Description

    This lists datasets published by CTA in the City of Chicago Data Portal.

  20. A

    Nuisance Complaints Data

    • data.amerigeoss.org
    • data.wu.ac.at
    csv
    Updated Jul 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States[old] (2019). Nuisance Complaints Data [Dataset]. https://data.amerigeoss.org/sq/dataset/nuisance-complaints-data
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 28, 2019
    Dataset provided by
    United States[old]
    Description

    This set of raw data contains information from Bloomington Police Department Calls for Service, specifically for the call natures of alcohol related, disturbance, drunk, noise, panhandling, and vandalism.

    Key code for Race:

    • A- Asian/Pacific Island, Non-Hispanic
    • B- African American, Non-Hispanic
    • I- Indian/Alaskan Native, Non-Hispanic
    • K- African American, Hispanic
    • L- Caucasian, Hispanic
    • N- Indian/Alaskan Native, Hispanic
    • P- Asian/Pacific Island, Hispanic
    • U- Unknown
    • W- Caucasian, Non-Hispanic

    Key Code for Reading Districts:

    Example: LB519

    • ‘L’ for Law call or incident
    • ‘B’ stands for Bloomington
    • 5 is the district or beat where incident occurred
    • All numbers following represents a grid sector.

    A map of the five districts can be located on Raidsonline.com, under the tab labeled ‘Agency Layers’.

    Disclaimer: The Bloomington Police Department takes great effort in making Calls for Service data as accurate as possible, but there is no avoiding errors in this process, which relies on data provided by many people and that cannot always be verified. Information contained in this dataset may change over a period of time. The Bloomington Police Department is not responsible for any error or omission from this data, or for the use or interpretation of the results of any research conducted.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Life Bricks Global (2025). LLM RAG Chatbot Training Dataset [Dataset]. https://www.kaggle.com/datasets/lifebricksglobal/llm-rag-chatbot-training-dataset
Organization logo

LLM RAG Chatbot Training Dataset

Time-Waster Detection for Companion & Conversational AI Agents (human-verified)

Explore at:
zip(199960 bytes)Available download formats
Dataset updated
May 20, 2025
Authors
Life Bricks Global
Description

We’ve developed another annotated dataset designed specifically for conversational AI and companion AI model training.

Watch: How To Use The Dataset

What you have here on Kaggle is our free sample - Think Salon Kitty meets AI

The 'Time Waster Identification & Retreat Model Dataset', enables AI handler agents to detect when users are likely to churn—saving valuable tokens and preventing wasted compute cycles in conversational models.

This batch has 167 entries annotated for sentiment, intent, user risk flagging (via behavioural tracking), user Recovery Potential per statement; among others. This dataset is designed to be a niche micro dataset for a specific use case: Time Waster Identification and Retreat.

👉 Buy the updated version: https://lifebricksglobal.gumroad.com/l/Time-WasterDetection-Dataset

This dataset is perfect for:

  • Fine-tuning LLM routing logic
  • Building intelligent AI agents for customer engagement
  • Companion AI training + moderation modelling
  • This is part of a broader series of human-agent interaction datasets we are releasing under our independent data licensing program.

It is designed for AI researchers and developers building:

  • Conversational AI agents
  • Companion AI models
  • Human-agent interaction simulators
  • LLM routing optimization models

Use case:

  • Conversational AI
  • Companion AI
  • Defence & Aerospace
  • Customer Support AI
  • Gaming / Virtual Worlds
  • LLM Safety Research
  • AI Orchestration Platforms

This batch has 167 entries annotated for sentiment, intent, user risk flagging (via behavioural tracking), user Recovery Potential per statement; among others. This dataset is designed to be a niche micro dataset for a specific use case: Time Waster Identification and Retreat.

👉 Good for teams working on conversational AI, companion AI, fraud detectors and those integrating routing logic for voice/chat agents

👉 Buy the updated version: https://lifebricksglobal.gumroad.com/l/Time-WasterDetection-Dataset

Contact us on LinkedIn: Life Bricks Global.

License:

This dataset is provided under a custom license. By using the dataset, you agree to the following terms:

Usage: You are allowed to use the dataset for non-commercial purposes, including research, development, and machine learning model training.

Modification: You may modify the dataset for your own use.

Redistribution: Redistribution of the dataset in its original or modified form is not allowed without permission.

Attribution: Proper attribution must be given when using or referencing this dataset.

No Warranty: The dataset is provided "as-is" without any warranties, express or implied, regarding its accuracy, completeness, or fitness for a particular purpose.

Search
Clear search
Close search
Google apps
Main menu