56 datasets found
  1. 10,109 People - Face Images Dataset

    • nexdata.ai
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 10,109 People - Face Images Dataset [Dataset]. https://www.nexdata.ai/datasets/1402?source=Github
    Explore at:
    Dataset updated
    Jun 14, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Data size, Data format, Data diversity, Age distribution, Race distribution, Gender distribution, Collecting environment
    Description

    10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws.

  2. A

    ‘COVID-19 Hospitalizations’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID-19 Hospitalizations’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-covid-19-hospitalizations-5df8/latest
    Explore at:
    Dataset updated
    Apr 7, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘COVID-19 Hospitalizations’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/66f8a124-afc7-4416-9157-9fad641b9913 on 27 January 2022.

    --- Dataset description provided by original source is as follows ---

    Note: As of April 9, 2021, this dataset will update daily with a four-day data lag.

    A. SUMMARY Count of COVID+ patients admitted to the hospital. Patients who are hospitalized and test positive for COVID-19 may be admitted to an acute care bed (a regular hospital bed), or an intensive care unit (ICU) bed. This data shows the daily total count of COVID+ patients in these two bed types, and the data reflects totals from all San Francisco Hospitals.

    B. HOW THE DATASET IS CREATED Hospital information is based on admission data reported to the San Francisco Department of Public Health.

    C. UPDATE PROCESS Updates automatically at 05:00 Pacific Time each day. Redundant runs are scheduled at 07:00 and 09:00 in case of pipeline failure.

    D. HOW TO USE THIS DATASET Each record represents how many people were hospitalized on the date recorded in either an ICU bed or acute care bed (shown as Med/Surg under DPHCategory field).

    The dataset shown here includes all San Francisco hospitals and updates daily with a four-day lag as information is collected and verified. Data may change as more current information becomes available.

    --- Original source retains full ownership of the source dataset ---

  3. d

    Daily COVID-19 Outbreak Summary

    • datasets.ai
    • data.kingcounty.gov
    • +3more
    21
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King County, Washington (2024). Daily COVID-19 Outbreak Summary [Dataset]. https://datasets.ai/datasets/daily-covid-19-outbreak-summary
    Explore at:
    21Available download formats
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    King County, Washington
    Description

    Updated daily between 3:00 pm to 5:00 pm Data are updated daily in the early afternoon and reflect laboratory results reported to the Washington State Department of Health as of midnight the day before. Data for previous dates will be updated as new results are entered, interviews are conducted, and data errors are corrected.

    Many people test positive but do not require hospitalization. The counts of positive cases do not necessarily indicate levels of demand at local hospitals.

    Reporting of test results to the Washington State Department of Health may be delayed by several days and will be updated when data are available. Only positive or negative test results are reflected in the counts and exclude tests where results are pending, inconclusive or were not performed.

  4. A

    ‘CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE’...

    • analyst-2.ai
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-ct-school-learning-model-indicators-by-county-14-day-metrics-archive-1346/88dbc343/?iid=004-752&v=presentation
    Explore at:
    Dataset updated
    Jan 26, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Connecticut
    Description

    Analysis of ‘CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/feda0dbb-905d-48c8-81ec-590689a6da8f on 26 January 2022.

    --- Dataset description provided by original source is as follows ---

    NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus.

    This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education.

    Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures.

    For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    --- Original source retains full ownership of the source dataset ---

  5. Daily and Intraday Stock Price Data

    • kaggle.com
    Updated Dec 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris Marjanovic (2017). Daily and Intraday Stock Price Data [Dataset]. https://www.kaggle.com/borismarjanovic/daily-and-intraday-stock-price-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Boris Marjanovic
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Stock market data -- and particularly intraday price data -- can be very expensive to buy. To help more people gain access to it, here I provide daily as well as intraday price and volume data for all U.S.-based stocks and ETFs trading on the NYSE, NASDAQ, and NYSE MKT.

    Content

    The dataset (last updated 12/06/2017) is presented in CSV format as follows:

    • Intraday data: Date,Time,Open,High,Low,Close,Volume,OpenInt

    • Daily data: Date,Open,High,Low,Close,Volume,OpenInt

    Acknowledgements

    The dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.

    Inspiration

    Many have tried, but most have failed, to predict the stock market's ups and downs. Can you do any better?

  6. A

    ‘COVID-19 case rate per 100,000 population and percent test positivity in...

    • analyst-2.ai
    Updated Feb 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-covid-19-case-rate-per-100000-population-and-percent-test-positivity-in-the-last-14-days-by-town-d334/760f38b9/?iid=006-223&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/d5e87e00-5f12-4c5e-9fb7-9718e5dbef35 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).

    A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

    Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.

    These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

    The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

    Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm

    --- Original source retains full ownership of the source dataset ---

  7. A

    ‘Covid-19 Tests by Race Ethnicity and Date’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Covid-19 Tests by Race Ethnicity and Date’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-covid-19-tests-by-race-ethnicity-and-date-f47f/e38e3d0a/?iid=004-383&v=presentation
    Explore at:
    Dataset updated
    Jan 27, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Covid-19 Tests by Race Ethnicity and Date’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/68410b4b-052f-4ce3-8d0c-873b5664f1a4 on 27 January 2022.

    --- Dataset description provided by original source is as follows ---

    Note: As of April 16, 2021, this dataset will update daily with a five-day data lag.

    A. SUMMARY This dataset includes San Francisco COVID-19 tests by race/ ethnicity and date. For each day, this dataset represents the daily count of tests collected by race/ethnicity, and how many of those were positive, negative, and indeterminate. Tests in this dataset include all tests collected from San Francisco residents who listed a San Francisco home address at the time of testing, and tests that were collected in San Francisco but had a missing home address. Data are based on information collected at the time of testing.

    For recent data, about 25-30% of tests are missing race/ ethnicity information. Tests where the race/ ethnicity of the patient is unknown are included in the dataset under the "Unknown" category.

    This data was de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected).

    The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco. Each positive test result is investigated. During this investigation, some test results are found to be for persons living outside of San Francisco and some people in San Francisco may be tested multiple times. In both cases, these results are not included in San Francisco’s total COVID-19 case count. To track the number of cases by race/ ethnicity, see this dashboard: https://data.sfgov.org/stories/s/w6za-6st8

    B. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information.

    C. UPDATE PROCESS Updates automatically at 05:00 Pacific Time each day. Redundant runs are scheduled at 07:00 and 09:00 in case of pipeline failure.

    D. HOW TO USE THIS DATASET Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.

    In order to track trends over time, a data user can analyze this data by "specimen_collection_date".

    Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of percent positive. When there are fewer than 20 positives tests for a given race/ethnicity and time period, the positivity rate is not calculated for the public tracker because rates of small test counts are less reliable.

    Calculating Testing Rates: To calculate the testing rate per 10,000 residents, divide the total number of tests collected (positive, negative, and indeterminate results) for the specified race/ ethnicity by the total number of residents who identify as that race/ ethnicity (according to the 2018 5-year estimates from the American Community Survey), then multiply by 10,000. When there are fewer than 20 total tests for a given race/ethnicity and time period, the testing rate is not calculated for the public tracker because rates of small test counts are less reliable.

    Read more about how this data is updated and validated daily: https://data.sfgov.org/stories/s/nudz-9tg2

    There are two other datasets related to tests: 1. COVID-19 Tests 2. <a href="https://data.sfgov.org/dataset/Covid-19-Testing-by

    --- Original source retains full ownership of the source dataset ---

  8. d

    COVID-19 Daily Testing - By Person - Historical

    • datasets.ai
    • data.cityofchicago.org
    • +2more
    23, 40, 55, 8
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2024). COVID-19 Daily Testing - By Person - Historical [Dataset]. https://datasets.ai/datasets/covid-19-daily-testing-by-person
    Explore at:
    55, 8, 23, 40Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset authored and provided by
    City of Chicago
    Description

    This dataset is historical only and ends at 5/7/2021. For more information, please see http://dev.cityofchicago.org/open%20data/data%20portal/2021/05/04/covid-19-testing-by-person.html. The recommended alternative dataset for similar data beyond that date is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test/gkdw-2tgv.

    This is the source data for some of the metrics available at https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.

    For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19.

    This dataset contains counts of people tested for COVID-19 and their results. This dataset differs from https://data.cityofchicago.org/d/gkdw-2tgv in that each person is in this dataset only once, even if tested multiple times. In the other dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in that dataset more than once on the same day unless he/she had both a positive and not-positive test.

    Only Chicago residents are included based on the home address as provided by the medical provider.

    Molecular (PCR) and antigen tests are included, and only one test is counted for each individual. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table.

    Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results.

    Demographic data are more complete for those who test positive; care should be taken when calculating percentage positivity among demographic groups.

    All data are provisional and subject to change. Information is updated as additional details are received.

    Data Source: Illinois National Electronic Disease Surveillance System

  9. F

    General domain Human-Human conversation chats in English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General domain Human-Human conversation chats in English [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    This training dataset comprises more than 10,000 conversational text data between two native English people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.

    These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.

    These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.

    This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.

    This training dataset's licence belongs to FutureBeeAI!

  10. AI corporate investment worldwide 2015-2022

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). AI corporate investment worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/941137/ai-investment-and-funding-worldwide/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2022, the global total corporate investment in artificial intelligence (AI) reached almost ** billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than ******* since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world. What is Artificial Intelligence (AI)? Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes. AI investment and startups The global AI market, valued at ***** billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by **** billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.

  11. F

    General domain Human-Human conversation chats in Spanish

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General domain Human-Human conversation chats in Spanish [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    This training dataset comprises more than 10,000 conversational text data between two native Spanish people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.

    These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.

    These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.

    This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.

    This training dataset's licence belongs to FutureBeeAI!

  12. d

    U.S. State, Territorial, and County Stay-At-Home Orders: March 15-May 5 by...

    • catalog.data.gov
    • data.virginia.gov
    • +4more
    Updated Jun 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). U.S. State, Territorial, and County Stay-At-Home Orders: March 15-May 5 by County by Day [Dataset]. https://catalog.data.gov/dataset/u-s-state-territorial-and-county-stay-at-home-orders-march-15-may-5-county-and-july-7-stat
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    Centers for Disease Control and Prevention
    Area covered
    United States
    Description

    State, territorial, and county executive orders, administrative orders, resolutions, and proclamations are collected from government websites and cataloged and coded using Microsoft Excel by one coder with one or more additional coders conducting quality assurance. Data were collected to determine when individuals in states, territories, and counties were subject to executive orders, administrative orders, resolutions, and proclamations for COVID-19 that require or recommend people stay in their homes. These data are derived from the publicly available state, territorial, and county executive orders, administrative orders, resolutions, and proclamations (“orders”) for COVID-19 that expressly require or recommend individuals stay at home found by the CDC, COVID-19 Community Intervention and At-Risk Task Force, Monitoring and Evaluation Team & CDC, Center for State, Tribal, Local, and Territorial Support, Public Health Law Program from March 15 through May 5, 2020. These data will be updated as new orders are collected. Any orders not available through publicly accessible websites are not included in these data. Only official copies of the documents or, where official copies were unavailable, official press releases from government websites describing requirements were coded; news media reports on restrictions were excluded. Recommendations not included in an order are not included in these data. These data do not include mandatory business closures, curfews, or limitations on public or private gatherings. These data do not necessarily represent an official position of the Centers for Disease Control and Prevention.

  13. DOHMH Covid-19 Milestone Data: New Cases of Covid-19 (7 Day Average)

    • data.cityofnewyork.us
    • datasets.ai
    • +1more
    application/rdfxml +5
    Updated Jun 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health and Mental Hygiene (DOHMH) (2021). DOHMH Covid-19 Milestone Data: New Cases of Covid-19 (7 Day Average) [Dataset]. https://data.cityofnewyork.us/Health/DOHMH-Covid-19-Milestone-Data-New-Cases-of-Covid-1/xwtc-hedq
    Explore at:
    csv, json, xml, application/rdfxml, tsv, application/rssxmlAvailable download formats
    Dataset updated
    Jun 15, 2021
    Dataset provided by
    New York City Department of Health and Mental Hygienehttps://nyc.gov/health
    Authors
    Department of Health and Mental Hygiene (DOHMH)
    Description

    This dataset shows daily confirmed and probable cases of COVID-19 in New York City by date of specimen collection. Total cases has been calculated as the sum of daily confirmed and probable cases. Seven-day averages of confirmed, probable, and total cases are also included in the dataset. A person is classified as a confirmed COVID-19 case if they test positive with a nucleic acid amplification test (NAAT, also known as a molecular test; e.g. a PCR test). A probable case is a person who meets the following criteria with no positive molecular test on record: a) test positive with an antigen test, b) have symptoms and an exposure to a confirmed COVID-19 case, or c) died and their cause of death is listed as COVID-19 or similar. As of June 9, 2021, people who meet the definition of a confirmed or probable COVID-19 case >90 days after a previous positive test (date of first positive test) or probable COVID-19 onset date will be counted as a new case. Prior to June 9, 2021, new cases were counted ≥365 days after the first date of specimen collection or clinical diagnosis. Any person with a residence outside of NYC is not included in counts. Data is sourced from electronic laboratory reporting from the New York State Electronic Clinical Laboratory Reporting System to the NYC Health Department. All identifying health information is excluded from the dataset.

    These data are used to evaluate the overall number of confirmed and probable cases by day (seven day average) to track the trajectory of the pandemic. Cases are classified by the date that the case occurred. NYC COVID-19 data include people who live in NYC. Any person with a residence outside of NYC is not included.

  14. Stock Market Dataset for Predictive Analysis

    • kaggle.com
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WARNER (2025). Stock Market Dataset for Predictive Analysis [Dataset]. https://www.kaggle.com/datasets/s3programmer/stock-market-dataset-for-predictive-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    WARNER
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Stock Market Dataset is designed for predictive analysis and machine learning applications in financial markets. It includes 13647 records of simulated stock trading data with features commonly used in stock price forecasting.

    🔹 Key Features Date – Trading day timestamps (business days only) Open, High, Low, Close – Simulated stock prices Volume – Trading volume per day RSI (Relative Strength Index) – Measures market momentum MACD (Moving Average Convergence Divergence) – Trend-following momentum indicator Sentiment Score – Simulated market sentiment from financial news & social media Target – Binary label (1: Price goes up, 0: Price goes down) for next-day prediction This dataset is useful for training hybrid deep learning models such as LSTM, CNN, and Attention-based networks for stock market forecasting. It enables financial analysts, traders, and AI researchers to experiment with market trends, technical analysis, and sentiment-based predictions.

  15. m

    Data from: AN EMG DATASET FOR ARABIC SIGN LANGUAGE ALPHABET LETTERS AND...

    • data.mendeley.com
    Updated Nov 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amina Ben Haj Amor (2023). AN EMG DATASET FOR ARABIC SIGN LANGUAGE ALPHABET LETTERS AND NUMBERS [Dataset]. http://doi.org/10.17632/ft9bhdgybs.3
    Explore at:
    Dataset updated
    Nov 10, 2023
    Authors
    Amina Ben Haj Amor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Sign languages are natural, gestural languages that use visual channel to communicate. Deaf people develop them to overcome their inability to communicate orally. Sign language interpreters bridge the gap that deaf people face in society and provide them with an equal opportunity to thrive in all environments. However, Deaf people often struggle to communicate on a daily basis, especially in public service spaces such as hospitals, post offices, and municipal buildings. Therefore, the implementation of a tool for automatic recognition of sign language is essential to allow the autonomy of deaf people. Moreover, it is difficult to provide full-time interpreters to help deaf people in all public services and administrations.

    Although surface electromyography (sEMG) provides an important potential technology for the detection of hand gestures, the related research in automatic SL recognition remains limited. To date, most works have focused on the recognition of hand gestures from images, videos, or gloves. The works of BEN HAJ AMOR et al. on EMG signals have shown that these multichannel signals contain rich and detailed information that can be exploited, in particular for the recognition of handshape and for the control prosthesis. Consequently, these successes represent a great step towards the recognition of gestures in sign language.

    We build a large database of EMG data, recorded while signing the 28 characters of the Arabic sign language alphabet. This provides a valuable resource for research into how the muscles involved in signing produce the shapes needed to form the letters of the alphabet.

    Instructions: The data for this project is provided as zipped NumPy arrays with custom headers. In order to load these files, you will need to have the NumPy package installed.

    The respective loadz primitive allows for a straight forwardloading of the datasets. The data is organized as follows:

    The data for each label (handshape) is stored in a separate folder. Each folder contains .npz files. An npz file contains the data for one record (a matrix 8x400).

    For more details, please refer to the paper.

  16. Predictive Maintenance Dataset

    • kaggle.com
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Himanshu Agarwal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

    The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.

  17. h

    Human-Like-DPO-Dataset

    • huggingface.co
    Updated May 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human-Like LLMs (2024). Human-Like-DPO-Dataset [Dataset]. https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2024
    Dataset authored and provided by
    Human-Like LLMs
    License

    https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/

    Description

    Enhancing Human-Like Responses in Large Language Models

    🤗 Models | 📊 Dataset | 📄 Paper

      Human-Like-DPO-Dataset
    

    This dataset was created as part of research aimed at improving conversational fluency and engagement in large language models. It is suitable for formats like Direct Preference Optimization (DPO) to guide models toward generating more human-like responses. The dataset includes 10,884 samples across 256 topics, including: Technology Daily Life Science… See the full description on the dataset page: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset.

  18. A

    ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-vehicle-miles-traveled-during-covid-19-lock-downs-636d/latest
    Explore at:
    Dataset updated
    Jan 4, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/vehicle-miles-travelede on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    **This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **

    Overview

    Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.

    This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.

    Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.

    This data has been made available to members of AP’s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.

    Findings

    • Nationally, data shows that vehicle travel in the US has doubled compared to the seven-day period ending April 13, which was the lowest VMT since the COVID-19 crisis began. In early December, travel reached a low not seen since May, with a small rise leading up to the Christmas holiday.
    • Average vehicle miles traveled continues to be below what would be expected without a pandemic - down 38% compared to January 2020. September 4 reported the largest single day estimate of vehicle miles traveled since March 14.
    • New Jersey, Michigan and New York are among the states with the largest relative uptick in travel at this point of the pandemic - they report almost two times the miles traveled compared to their lowest seven-day period. However, travel in New Jersey and New York is still much lower than expected without a pandemic. Other states such as New Mexico, Vermont and West Virginia have rebounded the least.

    About This Data

    The county level data is provided by StreetLight Data, Inc, a transportation analysis firm that measures travel patterns across the U.S.. The data is from their Vehicle Miles Traveled (VMT) Monitor which uses anonymized and aggregated data from smartphones and other GPS-enabled devices to provide county-by-county VMT metrics for more than 3,100 counties. The VMT Monitor provides an estimate of total vehicle miles travelled by residents of each county, each day since the COVID-19 crisis began (March 1, 2020), as well as a change from the baseline average daily VMT calculated for January 2020. Additional columns are calculations by AP.

    Included Data

    01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    Additional Data Queries

    * Filter for specific state - filters 02_vmt_state.csv daily data for specific state.

    * Filter counties by state - filters 03_vmt_county.csv daily data for counties in specific state.

    * Filter for specific county - filters 03_vmt_county.csv daily data for specific county.

    Interactive

    The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:

    This dataset was created by Angeliki Kastanis and contains around 0 samples along with Date At Low, Mean7 County Vmt At Low, technical information and other features such as: - County Name - County Fips - and more.

    How to use this dataset

    • Analyze State Name in relation to Baseline Jan Vmt
    • Study the influence of Date At Low on Mean7 County Vmt At Low
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Angeliki Kastanis

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  19. A

    ‘U.S. State, Territorial, and County Stay-At-Home Orders: March 15-May 5 by...

    • analyst-2.ai
    Updated May 5, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘U.S. State, Territorial, and County Stay-At-Home Orders: March 15-May 5 by County by Day’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-u-s-state-territorial-and-county-stay-at-home-orders-march-15-may-5-by-county-by-day-71f8/e01c6421/?iid=009-486&v=presentation
    Explore at:
    Dataset updated
    May 5, 2015
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Analysis of ‘U.S. State, Territorial, and County Stay-At-Home Orders: March 15-May 5 by County by Day’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/e0859a07-2370-4e39-ad88-1f3a0f87eb29 on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    State, territorial, and county executive orders, administrative orders, resolutions, and proclamations are collected from government websites and cataloged and coded using Microsoft Excel by one coder with one or more additional coders conducting quality assurance.

    Data were collected to determine when individuals in states, territories, and counties were subject to executive orders, administrative orders, resolutions, and proclamations for COVID-19 that require or recommend people stay in their homes.

    These data are derived from the publicly available state, territorial, and county executive orders, administrative orders, resolutions, and proclamations (“orders”) for COVID-19 that expressly require or recommend individuals stay at home found by the CDC, COVID-19 Community Intervention and At-Risk Task Force, Monitoring and Evaluation Team & CDC, Center for State, Tribal, Local, and Territorial Support, Public Health Law Program from March 15 through May 5, 2020. These data will be updated as new orders are collected. Any orders not available through publicly accessible websites are not included in these data. Only official copies of the documents or, where official copies were unavailable, official press releases from government websites describing requirements were coded; news media reports on restrictions were excluded. Recommendations not included in an order are not included in these data. These data do not include mandatory business closures, curfews, or limitations on public or private gatherings. These data do not necessarily represent an official position of the Centers for Disease Control and Prevention.

    --- Original source retains full ownership of the source dataset ---

  20. F

    General domain Human-Human conversation chats in Turkish

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General domain Human-Human conversation chats in Turkish [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/turkish-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    This training dataset comprises more than 10,000 conversational text data between two native Turkish people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.

    These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.

    These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.

    This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.

    This training dataset's licence belongs to FutureBeeAI!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nexdata (2024). 10,109 People - Face Images Dataset [Dataset]. https://www.nexdata.ai/datasets/1402?source=Github
Organization logo

10,109 People - Face Images Dataset

Explore at:
Dataset updated
Jun 14, 2024
Dataset authored and provided by
Nexdata
Variables measured
Data size, Data format, Data diversity, Age distribution, Race distribution, Gender distribution, Collecting environment
Description

10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws.

Search
Clear search
Close search
Google apps
Main menu