50 datasets found
  1. AI Tools Usage by Indian College Students 2025

    • kaggle.com
    zip
    Updated Jul 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kshitij Saini (2025). AI Tools Usage by Indian College Students 2025 [Dataset]. https://www.kaggle.com/datasets/kshitijsaini121/ai-tools-usage-by-indian-college-students-2025
    Explore at:
    zip(90645 bytes)Available download formats
    Dataset updated
    Jul 7, 2025
    Authors
    Kshitij Saini
    Description

    AI Tool Usage by Indian College Students 2025

    This unique dataset, collected via a May 2025 survey, captures how 496 Indian college students use AI tools (e.g., ChatGPT, Gemini, Copilot) in academics. It includes 16 attributes like AI tool usage, trust, impact on grades, and internet access, ideal for education analytics and machine learning.

    Columns

    • Student_Name: Anonymized student name.
    • ** College_Name:** College attended.
    • Stream: Academic discipline (e.g., Engineering, Arts).
    • Year_of_Study: Year of study (1–4). -** AI_Tools_Used: **Tools used (e.g., ChatGPT, Gemini).
    • Daily_Usage_Hours: Hours spent daily on AI tools. -** Use_Cases:** Purposes (e.g., Assignments, Exam Prep). -** Trust_in_AI_Tools:** Trust level (1–5). -** Impact_on_Grades:** Grade impact (-3 to +3).
    • Do_Professors_Allow_Use: Professor approval (Yes/No). -** Preferred_AI_Tool:** Preferred tool. -** Awareness_Level: **AI awareness (1–10).
    • Willing_to_Pay_for_Access: Willingness to pay (Yes/No). -** State:** Indian state. -** Device_Used:** Device (e.g., Laptop, Mobile). -** Internet_Access: **Access quality (Poor/Medium/High). ### Use Cases Predict academic performance using AI tool usage. Analyze trust in AI across streams or regions. Cluster students by usage patterns. Study digital divide via Internet_Access. Source: Collected via Google Forms survey in May 2025, ensuring diverse representation across India.
  2. Daily AI Assistant Usage Behavior Dataset

    • kaggle.com
    zip
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prince Rajak (2025). Daily AI Assistant Usage Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/prince7489/daily-ai-assistant-usage-behavior-dataset
    Explore at:
    zip(5332 bytes)Available download formats
    Dataset updated
    Nov 20, 2025
    Authors
    Prince Rajak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Daily AI Assistant Usage Behavior Dataset captures real-world interaction patterns between users and AI assistants throughout their day. It includes details such as query types, time-of-day usage, session duration, device type, user intent, and follow-up behavior.

    This dataset is designed to help researchers, developers, and data enthusiasts analyze how people rely on AI tools for productivity, creativity, learning, and routine tasks. It is ideal for building models around user behavior prediction, recommendation systems, personalization, and conversational AI improvements.

  3. d

    AI in Consumer Decision Making | Global Coverage | 190+ Countries

    • datarade.ai
    .json, .csv, .xls
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rwazi (2025). AI in Consumer Decision Making | Global Coverage | 190+ Countries [Dataset]. https://datarade.ai/data-products/ai-in-consumer-decision-making-global-coverage-190-count-rwazi
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset authored and provided by
    Rwazihttp://rwazi.com/
    Area covered
    United Kingdom
    Description

    AI in Consumer Decision-Making: Global Zero-Party Dataset

    This dataset captures how consumers around the world are using AI tools like ChatGPT, Perplexity, Gemini, Claude, and Copilot to guide their purchase decisions. It spans multiple product categories, demographics, and geographies, mapping the emerging role of AI as a decision-making companion across the consumer journey.

    What Makes This Dataset Unique

    Unlike datasets inferred from digital traces or modeled from third-party assumptions, this collection is built entirely on zero-party data: direct responses from consumers who voluntarily share their habits and preferences. That means the insights come straight from the people making the purchases, ensuring unmatched accuracy and relevance.

    For FMCG leaders, retailers, and financial services strategists, this dataset provides the missing piece: visibility into how often consumers are letting AI shape their decisions, and where that influence is strongest.

    Dataset Structure

    Each record is enriched with: Product Category – from high-consideration items like electronics to daily staples such as groceries and snacks. AI Tool Used – identifying whether consumers turn to ChatGPT, Gemini, Perplexity, Claude, or Copilot. Influence Level – the percentage of consumers in a given context who rely on AI to guide their choices. Demographics – generational breakdowns from Gen Z through Boomers. Geographic Detail – city- and country-level coverage across Africa, LATAM, Asia, Europe, and North America.

    This structure allows filtering and comparison across categories, age groups, and markets, giving users a multidimensional view of AI’s impact on purchasing.

    Why It Matters

    AI has become a trusted voice in consumers’ daily lives. From meal planning to product comparisons, many people now consult AI before making a purchase—often without realizing how much it shapes the options they consider. For brands, this means that the path to purchase increasingly runs through an AI filter.

    This dataset provides a comprehensive view of that hidden step in the consumer journey, enabling decision-makers to quantify: How much AI shapes consumer thinking before they even reach the shelf or checkout. Which product categories are most influenced by AI consultation. How adoption varies by geography and generation. Which AI platforms are most commonly trusted by consumers.

    Opportunities for Business Leaders

    FMCG & Retail Brands: Understand where AI-driven decision-making is already reshaping category competition. Marketers: Identify demographic segments most likely to consult AI, enabling targeted strategies. Retailers: Align assortments and promotions with the purchase patterns influenced by AI queries. Investors & Innovators: Gauge market readiness for AI-integrated commerce solutions.

    The dataset doesn’t just describe what’s happening—it opens doors to the “so what” questions that define strategy. Which categories are becoming algorithm-driven? Which markets are shifting fastest? Where is the opportunity to get ahead of competitors in an AI-shaped funnel?

    Why Now

    Consumer AI adoption is no longer a forecast; it is a daily behavior. Just as search engines once rewrote the rules of marketing, conversational AI is quietly rewriting how consumers decide what to buy. This dataset offers an early, detailed view into that change, giving brands the ability to act while competitors are still guessing.

    What You Get

    Users gain: A global, city-level view of AI adoption in consumer decision-making. Cross-category comparability to see where AI influence is strongest and weakest. Generational breakdowns that show how adoption differs between younger and older cohorts. AI platform analysis, highlighting how tool preferences vary by region and category. Every row is powered by zero-party input, ensuring the insights reflect actual consumer behavior—not modeled assumptions.

    How It’s Used

    Leverage this data to:

    Validate strategies before entering new markets or categories. Benchmark competitors on AI readiness and influence. Identify growth opportunities in categories where AI-driven recommendations are rapidly shaping decisions. Anticipate risks where brand visibility could be disrupted by algorithmic mediation.

    Core Insights

    The full dataset reveals: Surprising adoption curves across categories where AI wasn’t expected to play a role. Geographic pockets where AI has already become a standard step in purchase decisions. Demographic contrasts showing who trusts AI most—and where skepticism still holds. Clear differences between AI platforms and the consumer profiles most drawn to each.

    These patterns are not visible in traditional retail data, sales reports, or survey summaries. They are only captured here, directly from the consumers themselves.

    Summary

    Winning in FMCG and retail today means more than getting on shelves, capturing price points, or running promotions. It means understanding the invisible algorithms consumers are ...

  4. AI–Work and Human Identity Dataset — 2025

    • figshare.com
    csv
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Clarity Institute (2025). AI–Work and Human Identity Dataset — 2025 [Dataset]. http://doi.org/10.6084/m9.figshare.30615563.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Human Clarity Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset examines how AI adoption, workplace technologies, and digital tools are reshaping people’s sense of work identity, purpose, confidence, and values alignment.The dataset includes variables on job meaning, AI-related concerns, motivation, wellbeing, and the emotional impact of AI on daily work.This dataset is openly available and is part of the HCI Open Data Series, which provides verifiable, timestamped behavioural data on the human experience of the AI era.

  5. d

    Intuizi Country Origin Dataset | Geospatial Mobility detail data for 94...

    • datarade.ai
    .csv, .txt
    Updated Nov 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intuizi (2022). Intuizi Country Origin Dataset | Geospatial Mobility detail data for 94 countries | Cloud delivery | 400m Uniques, updated daily [Dataset]. https://datarade.ai/data-products/intuizi-country-origin-dataset-mobility-detail-data-for-100-intuizi
    Explore at:
    .csv, .txtAvailable download formats
    Dataset updated
    Nov 18, 2022
    Dataset authored and provided by
    Intuizi
    Area covered
    United Kingdom, United States
    Description

    This de-duped dataset is used by our customers for many purposes, primarily to understand which countries the people who visit specific locations (more accurately, the mobile devices carried by those people) - perhaps the locations that they own/operate, perhaps those owned/operated by their competitors, or visited by their customers - originated.

    If, for instance, you operate a hotel brand and want to understand the top ten countries that visitors to your City came from; if/how that changes seasonally over time, and by type of location (perhaps higher end visitors are more likely to come from the UK or Germany versus France or Italy) - to help you build out your data models or marketing in those countries and/or to help tailor your product offers towards their needs.

    This data can be useful as a way to understand, for instance, whether there are specific geographical areas you might consider putting a new location; where you might buy billboard ads, advertising the ‘local’ store; to build your own mobility data models to help better understand visitation into your own/your competitors premises, or test hypotheses around changes in visitation patterns over time.

    The Intuizi Country Origin Dataset comprises fully-consented mobile device data, de-identified at source by the entity which has legal consent to own/process such data, and on who’s behalf we work to create a de-identified dataset of Encrypted ID visitation/mobility data.

  6. f

    Consumer Data | United States | Reach - Comprehensive Insights for Enhanced...

    • factori.ai
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Consumer Data | United States | Reach - Comprehensive Insights for Enhanced Customer Experience & Marketing Strategies [Dataset]. https://www.factori.ai/datasets/people-data/
    Explore at:
    Dataset updated
    Jul 15, 2025
    License

    https://www.factori.ai/privacy-policyhttps://www.factori.ai/privacy-policy

    Area covered
    United States
    Description

    Our consumer data is meticulously gathered and aggregated from surveys, digital services, and public sources, ensuring the collection of fresh and reliable data points through powerful profiling algorithms. Our comprehensive data enrichment solution spans a variety of datasets, enabling you to address gaps in customer data, gain deeper insights into your customers, and enhance client experiences.

    Data Categories and Attributes:

    • Geography: City, State, ZIP, County, CBSA, Census Tract, etc.
    • Demographics: Gender, Age Group, Marital Status, Language, etc.
    • Financial: Income Range, Credit Rating Range, Credit Type, Net Worth Range, etc.
    • Persona: Consumer Type, Communication Preferences, Family Type, etc.
    • Interests: Content, Brands, Shopping, Hobbies, Lifestyle, etc.
    • Household: Number of Children, Number of Adults, IP Address, etc.
    • Behaviors: Brand Affinity, App Usage, Web Browsing, etc.
    • Firmographics: Industry, Company, Occupation, Revenue, etc.
    • Retail Purchase: Store, Category, Brand, SKU, Quantity, Price, etc.
    • Auto: Car Make, Model, Type, Year, etc.
    • Housing: Home Type, Home Value, Renter/Owner, Year Built, etc

    Data Export Methodology

    Our dynamic data collection ensures the most updated insights, delivered at intervals best suited to your needs (daily, weekly, or monthly).

    Use Cases

    Our enriched consumer data supports a 360-degree customer view, data enrichment, fraud detection, and advertising & marketing, providing valuable insights to enhance your business strategies and client interactions.

  7. d

    Dataset for: The More Competent, the Better? The Effects of Perceived...

    • demo-b2find.dkrz.de
    Updated Nov 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Dataset for: The More Competent, the Better? The Effects of Perceived Competencies on Disclosure Towards Conversational Artificial Intelligence - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/f5fbca1a-d524-516d-a9d5-072025072e4e
    Explore at:
    Dataset updated
    Nov 29, 2022
    Description

    Conversational AI (e.g., Google Assistant or Amazon Alexa) is present in many people’s everyday life and, at the same time, becomes more and more capable of solving more complex tasks. However, it is unclear how the growing capabilities of conversational AI affect people’s disclosure towards the system as previous research has revealed mixed effects of technology competence. To address this research question, we propose a framework systematically disentangling conversational AI competencies along the lines of the dimensions of human competencies suggested by the action regulation theory. Across two correlational studies and three experiments (N total = 1453), we investigated how these competencies differentially affect users’ and non-users’ disclosure towards conversational AI. Results indicate that intellectual competencies (e.g., planning actions and anticipating problems) in a conversational AI heighten users’ willingness to disclose and reduce their privacy concerns. In contrast, meta-cognitive heuristics (e.g., deriving universal strategies based on previous interactions) raise privacy concerns for users and, even more so, for non-users but reduce willingness to disclose only for non-users. Thus, the present research suggests that not all competencies of a conversational AI are seen as merely positive, and the proposed differentiation of competencies is informative to explain effects on disclosure.

  8. F

    English Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The English General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world English usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level English conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native English speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    Words per Chat: 300–700
    Turns per Chat: Up to 50 dialogue turns
    Contributors: 200 native English speakers from the FutureBeeAI Crowd Community
    Format: TXT, DOCS, JSON or CSV (customizable)
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    Music, books, and movies
    Health and wellness
    Children and parenting
    Family life and relationships
    Food and cooking
    Education and studying
    Festivals and traditions
    Environment and daily life
    Internet and tech usage
    Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level English usage with:

    Colloquial expressions and local dialect influence
    Domain-relevant terminology
    Language-specific grammar, phrasing, and sentence flow
    Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    Participant Age
    Gender
    Country/Region
    Chat Domain
    Chat Topic
    Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    Manual review for content completeness
    Format checks for chat turns and metadata
    Linguistic verification by native speakers
    Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    Conversational AI / Chatbots
    Smart assistants and voicebots
    <div

  9. d

    Daily COVID-19 Outbreak Summary

    • catalog.data.gov
    • data.kingcounty.gov
    • +3more
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.kingcounty.gov (2024). Daily COVID-19 Outbreak Summary [Dataset]. https://catalog.data.gov/dataset/daily-covid-19-outbreak-summary
    Explore at:
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    data.kingcounty.gov
    Description

    Updated daily between 3:00 pm to 5:00 pm Data are updated daily in the early afternoon and reflect laboratory results reported to the Washington State Department of Health as of midnight the day before. Data for previous dates will be updated as new results are entered, interviews are conducted, and data errors are corrected. Many people test positive but do not require hospitalization. The counts of positive cases do not necessarily indicate levels of demand at local hospitals. Reporting of test results to the Washington State Department of Health may be delayed by several days and will be updated when data are available. Only positive or negative test results are reflected in the counts and exclude tests where results are pending, inconclusive or were not performed.

  10. Data from: Face Images Dataset

    • kaggle.com
    zip
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). Face Images Dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/multi-race-and-multi-pose-face-images-data
    Explore at:
    zip(1247411 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Face Images Dataset

    Description

    10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws. For more details, please refer to the link: https://www.nexdata.ai/datasets/computervision/1402?source=Kaggle

    Data size

    10,109 people, no less than 30 images per person

    Race distribution

    3,504 black people, 3,559 Indian people and 3,046 Asian people

    Gender distribution

    4,930 males, 5,179 females

    Age distribution

    most people are young aged, the middle-aged and the elderly cover a small portion

    Collecting environment

    including indoor and outdoor scenes

    Data diversity

    different face poses, races, accessories, ages, light conditions and scenes

    Data format

    .jpg, .png, .jpeg

    Licensing Information

    Commercial License

  11. F

    Dutch General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Dutch General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-dutch-netherlands
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Netherlands
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Dutch General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Dutch speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Dutch communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Dutch speech models that understand and respond to authentic Dutch accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Dutch. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Dutch speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Netherlands to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Dutch speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Dutch.
    Voice Assistants: Build smart assistants capable of understanding natural Dutch conversations.
    <span

  12. m

    AN EMG DATASET FOR ARABIC SIGN LANGUAGE ALPHABET AND NUMBERS

    • data.mendeley.com
    Updated Sep 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amina Ben Haj Amor (2023). AN EMG DATASET FOR ARABIC SIGN LANGUAGE ALPHABET AND NUMBERS [Dataset]. http://doi.org/10.17632/ft9bhdgybs.2
    Explore at:
    Dataset updated
    Sep 27, 2023
    Authors
    Amina Ben Haj Amor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Sign languages are natural, gestural languages that use visual channel to communicate. Deaf people develop them to overcome their inability to communicate orally. Sign language interpreters bridge the gap that deaf people face in society and provide them with an equal opportunity to thrive in all environments. However, Deaf people often struggle to communicate on a daily basis, especially in public service spaces such as hospitals, post offices, and municipal buildings. Therefore, the implementation of a tool for automatic recognition of sign language is essential to allow the autonomy of deaf people. Moreover, it is difficult to provide full-time interpreters to help deaf people in all public services and administrations.

    Although surface electromyography (sEMG) provides an important potential technology for the detection of hand gestures, the related research in automatic SL recognition remains limited. To date, most works have focused on the recognition of hand gestures from images, videos, or gloves. The works of BEN HAJ AMOR et al. on EMG signals have shown that these multichannel signals contain rich and detailed information that can be exploited, in particular for the recognition of handshape and for the control prosthesis. Consequently, these successes represent a great step towards the recognition of gestures in sign language.

    We build a large database of EMG data, recorded while signing the 28 characters of the Arabic sign language alphabet. This provides a valuable resource for research into how the muscles involved in signing produce the shapes needed to form the letters of the alphabet.

    Instructions: The data for this project is provided as zipped NumPy arrays with custom headers. In order to load these files, you will need to have the NumPy package installed.

    The respective loadz primitive allows for a straight forwardloading of the datasets. The data is organized as follows:

    The data for each label (handshape) is stored in a separate folder. Each folder contains .npz files. An npz file contains the data for one record (a matrix 8x400).

    For more details, please refer to the paper.

  13. Remote Worker Productivity Dataset

    • kaggle.com
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Remote Worker Productivity Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/remote-worker-productivity-dataset/code
    Explore at:
    zip(49610 bytes)Available download formats
    Dataset updated
    May 23, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset has been developed to support research on the role of Artificial Intelligence (AI) and Machine Learning (ML) in enhancing the productivity of remote workers. It simulates behavioral, performance, and technology usage data from employees working remotely across different geographic settings—cities, towns, and villages—and within various industry sectors.

    The dataset includes demographic information, daily work patterns, use of AI-assisted tools, task completion metrics, and a calculated productivity score. The target variable (productivity_label) categorizes productivity into three classes: High, Medium, and Low. This dataset is suitable for predictive modeling, classification tasks, feature importance analysis, and workflow optimization research.

    ✅ Key Features Geographic Diversity: Simulated data from urban, semi-urban, and rural remote workers.

    Multi-industry Coverage: Includes IT, Healthcare, Education, Finance, and Retail.

    AI/ML Usage Metrics: Tracks usage frequency and impact of intelligent tools.

    Time Management Indicators: Work hours, task scheduling, break patterns.

    Target Variable: productivity_label (High, Medium, Low).

  14. d

    COVID-19 Daily Testing - By Person - Historical

    • datasets.ai
    • healthdata.gov
    • +2more
    23, 40, 55, 8
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2020). COVID-19 Daily Testing - By Person - Historical [Dataset]. https://datasets.ai/datasets/covid-19-daily-testing-by-person
    Explore at:
    55, 8, 23, 40Available download formats
    Dataset updated
    Nov 10, 2020
    Dataset authored and provided by
    City of Chicago
    Description

    This dataset is historical only and ends at 5/7/2021. For more information, please see http://dev.cityofchicago.org/open%20data/data%20portal/2021/05/04/covid-19-testing-by-person.html. The recommended alternative dataset for similar data beyond that date is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Testing-By-Test/gkdw-2tgv.

    This is the source data for some of the metrics available at https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.

    For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19.

    This dataset contains counts of people tested for COVID-19 and their results. This dataset differs from https://data.cityofchicago.org/d/gkdw-2tgv in that each person is in this dataset only once, even if tested multiple times. In the other dataset, each test is counted, even if multiple tests are performed on the same person, although a person should not appear in that dataset more than once on the same day unless he/she had both a positive and not-positive test.

    Only Chicago residents are included based on the home address as provided by the medical provider.

    Molecular (PCR) and antigen tests are included, and only one test is counted for each individual. Tests are counted on the day the specimen was collected. A small number of tests collected prior to 3/1/2020 are not included in the table.

    Not-positive lab results include negative results, invalid results, and tests not performed due to improper collection. Chicago Department of Public Health (CDPH) does not receive all not-positive results.

    Demographic data are more complete for those who test positive; care should be taken when calculating percentage positivity among demographic groups.

    All data are provisional and subject to change. Information is updated as additional details are received.

    Data Source: Illinois National Electronic Disease Surveillance System

  15. DOHMH Covid-19 Milestone Data: New Cases of Covid-19 (7 Day Average)

    • data.cityofnewyork.us
    • datasets.ai
    • +1more
    csv, xlsx, xml
    Updated Jun 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health and Mental Hygiene (DOHMH) (2021). DOHMH Covid-19 Milestone Data: New Cases of Covid-19 (7 Day Average) [Dataset]. https://data.cityofnewyork.us/Health/DOHMH-Covid-19-Milestone-Data-New-Cases-of-Covid-1/xwtc-hedq
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Jun 15, 2021
    Dataset provided by
    New York City Department of Health and Mental Hygienehttps://nyc.gov/health
    Authors
    Department of Health and Mental Hygiene (DOHMH)
    Description

    This dataset shows daily confirmed and probable cases of COVID-19 in New York City by date of specimen collection. Total cases has been calculated as the sum of daily confirmed and probable cases. Seven-day averages of confirmed, probable, and total cases are also included in the dataset. A person is classified as a confirmed COVID-19 case if they test positive with a nucleic acid amplification test (NAAT, also known as a molecular test; e.g. a PCR test). A probable case is a person who meets the following criteria with no positive molecular test on record: a) test positive with an antigen test, b) have symptoms and an exposure to a confirmed COVID-19 case, or c) died and their cause of death is listed as COVID-19 or similar. As of June 9, 2021, people who meet the definition of a confirmed or probable COVID-19 case >90 days after a previous positive test (date of first positive test) or probable COVID-19 onset date will be counted as a new case. Prior to June 9, 2021, new cases were counted ≥365 days after the first date of specimen collection or clinical diagnosis. Any person with a residence outside of NYC is not included in counts. Data is sourced from electronic laboratory reporting from the New York State Electronic Clinical Laboratory Reporting System to the NYC Health Department. All identifying health information is excluded from the dataset.

    These data are used to evaluate the overall number of confirmed and probable cases by day (seven day average) to track the trajectory of the pandemic. Cases are classified by the date that the case occurred. NYC COVID-19 data include people who live in NYC. Any person with a residence outside of NYC is not included.

  16. Emotion, Identity & Creativity in the Age of AI 2025 (Dataset)

    • figshare.com
    csv
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human Clarity Institute (2025). Emotion, Identity & Creativity in the Age of AI 2025 (Dataset) [Dataset]. http://doi.org/10.6084/m9.figshare.30660974.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Human Clarity Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the Human Clarity Institute’s AI–Human Experience Data Series. It measures how adults experience emotional shifts, identity change, creative confidence, and the psychological influence of AI tools in daily life.The dataset includes validated 1–7 Likert-scale items, emotional amplification and emotional fatigue indicators, measures of identity perception and self-concept change, and metrics related to creativity, AI-assisted creative behaviour, and creative confidence. Open-text reflections capture how people describe the emotional and psychological role of AI in their thinking, expression, and daily experiences. Demographic variables are included across five English-speaking countries.Data were collected via Prolific in 2025 from adults in the UK, US, Canada, Australia, New Zealand, and Ireland. All data were cleaned, anonymised, and verified according to the Human Clarity Institute’s open-data publication protocol.This dataset contributes to understanding how AI-driven environments influence emotional states, personal identity, and creative behaviour, providing foundational data for longitudinal tracking of emotional and psychological effects in the AI era.

  17. h

    Human-Like-DPO-Dataset

    • huggingface.co
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Human-Like LLMs (2024). Human-Like-DPO-Dataset [Dataset]. https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2024
    Dataset authored and provided by
    Human-Like LLMs
    License

    https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/

    Description

    Enhancing Human-Like Responses in Large Language Models

    🤗 Models | 📊 Dataset | 📄 Paper

      Human-Like-DPO-Dataset
    

    This dataset was created as part of research aimed at improving conversational fluency and engagement in large language models. It is suitable for formats like Direct Preference Optimization (DPO) to guide models toward generating more human-like responses. The dataset includes 10,884 samples across 256 topics, including: Technology Daily Life Science… See the full description on the dataset page: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset.

  18. Utrecht housing dataset

    • kaggle.com
    zip
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ICT Institute (2025). Utrecht housing dataset [Dataset]. https://www.kaggle.com/datasets/ictinstitute/utrecht-housing-dataset/discussion
    Explore at:
    zip(72621 bytes)Available download formats
    Dataset updated
    Jan 27, 2025
    Authors
    ICT Institute
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    Utrecht
    Description

    The Utrecht housing dataset is a freely available dataset that can be used by students to learn about data science and machine learning. The older versions are synthetic datasets. The latest version is an actual dataset based on data collected from a house offering website (Funda) and official land registry (Kadaster).

    This dataset is described in the following accompanying paper: - Van Otterloo, S and Burda, P. 2025. The Utrecht Housing dataset: A housing appraisal dataset. Computers and Society Research Journal (2025), 1. The paper can be downloaded here: https://ictinstitute.nl/utrecht-housing-dataset-2025/.

    History In July 2022, Stefan Leijnen and Sieuwert van Otterloo taught a one week summerschool ‘AI and machine learning’ at the Utrecht University of Applied Sciences. The goal of this summer school is to make AI and Machine Learning accessible to as many people as possible. Using AI without properly understanding it comes with risks. We want to reduce these risks by giving students from all backgrounds the tools and knowledge to understand AI. Luckily, AI has become more accessible thanks to the existence of many free and open tools and libraries. Any student can train and test algorithms with only a few days of training.

    The Utrecht Housing Dataset was designed for use during day 1, day 2 and day 3. The dataset has multiple different input variables that are interesting to explore. The size is such that it is well suited for visualisations. The dataset represents one of the core tenets of responsible AI: AI should be made accessible to a wide group of people, so that anyone with some university experience can test and evaluate algorithms.

    When developing the summerschool, we could not find a dataset that was both interesting to analyse and easy to use. Existing datasets often have data quality issues that distract from the learning goals, or are only suited for illustrating one phenomenon. Many classical machine learning datasets also do not have meaningful tasks. The problems that one can do with these datasets are either too basic or theoretical. The Utrecht Housing Dataset thus offers a new combination that we found useful in our classroom.

    The dataset is released as creative commons, and can be used freely for any purpose. If you use it, please refer to it as the “The Utrecht housing dataset – example dataset for prediction” by Sieuwert van Otterloo, www.ictinstitute.nl or refer to Sieuwert van Otterloo as the author/source.

    The dataset is provided as a CSV file. Each line contains data for one house. The values are seperated by commas.

  19. F

    Polish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Polish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-polish-poland
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Polish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Polish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Polish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Polish speech models that understand and respond to authentic Polish accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Polish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Polish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Poland to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Polish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Polish.
    Voice Assistants: Build smart assistants capable of understanding natural Polish conversations.
    <span

  20. Description of multimodal dataset.

    • plos.figshare.com
    xls
    Updated Nov 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sobhana Jahan; Kazi Abu Taher; M. Shamim Kaiser; Mufti Mahmud; Md. Sazzadur Rahman; A. S. M. Sanwar Hosen; In-Ho Ra (2023). Description of multimodal dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0294253.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sobhana Jahan; Kazi Abu Taher; M. Shamim Kaiser; Mufti Mahmud; Md. Sazzadur Rahman; A. S. M. Sanwar Hosen; In-Ho Ra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundAccording to the World Health Organization (WHO), dementia is the seventh leading reason of death among all illnesses and one of the leading causes of disability among the world’s elderly people. Day by day the number of Alzheimer’s patients is rising. Considering the increasing rate and the dangers, Alzheimer’s disease should be diagnosed carefully. Machine learning is a potential technique for Alzheimer’s diagnosis but general users do not trust machine learning models due to the black-box nature. Even, some of those models do not provide the best performance because of using only neuroimaging data.ObjectiveTo solve these issues, this paper proposes a novel explainable Alzheimer’s disease prediction model using a multimodal dataset. This approach performs a data-level fusion using clinical data, MRI segmentation data, and psychological data. However, currently, there is very little understanding of multimodal five-class classification of Alzheimer’s disease.MethodFor predicting five class classifications, 9 most popular Machine Learning models are used. These models are Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Multi-Layer Perceptron (MLP), K-Nearest Neighbor (KNN), Gradient Boosting (GB), Adaptive Boosting (AdaB), Support Vector Machine (SVM), and Naive Bayes (NB). Among these models RF has scored the highest value. Besides for explainability, SHapley Additive exPlanation (SHAP) is used in this research work.Results and conclusionsThe performance evaluation demonstrates that the RF classifier has a 10-fold cross-validation accuracy of 98.81% for predicting Alzheimer’s disease, cognitively normal, non-Alzheimer’s dementia, uncertain dementia, and others. In addition, the study utilized Explainable Artificial Intelligence based on the SHAP model and analyzed the causes of prediction. To the best of our knowledge, we are the first to present this multimodal (Clinical, Psychological, and MRI segmentation data) five-class classification of Alzheimer’s disease using Open Access Series of Imaging Studies (OASIS-3) dataset. Besides, a novel Alzheimer’s patient management architecture is also proposed in this work.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kshitij Saini (2025). AI Tools Usage by Indian College Students 2025 [Dataset]. https://www.kaggle.com/datasets/kshitijsaini121/ai-tools-usage-by-indian-college-students-2025
Organization logo

AI Tools Usage by Indian College Students 2025

Unique survey data on AI tool impact in academics

Explore at:
zip(90645 bytes)Available download formats
Dataset updated
Jul 7, 2025
Authors
Kshitij Saini
Description

AI Tool Usage by Indian College Students 2025

This unique dataset, collected via a May 2025 survey, captures how 496 Indian college students use AI tools (e.g., ChatGPT, Gemini, Copilot) in academics. It includes 16 attributes like AI tool usage, trust, impact on grades, and internet access, ideal for education analytics and machine learning.

Columns

  • Student_Name: Anonymized student name.
  • ** College_Name:** College attended.
  • Stream: Academic discipline (e.g., Engineering, Arts).
  • Year_of_Study: Year of study (1–4). -** AI_Tools_Used: **Tools used (e.g., ChatGPT, Gemini).
  • Daily_Usage_Hours: Hours spent daily on AI tools. -** Use_Cases:** Purposes (e.g., Assignments, Exam Prep). -** Trust_in_AI_Tools:** Trust level (1–5). -** Impact_on_Grades:** Grade impact (-3 to +3).
  • Do_Professors_Allow_Use: Professor approval (Yes/No). -** Preferred_AI_Tool:** Preferred tool. -** Awareness_Level: **AI awareness (1–10).
  • Willing_to_Pay_for_Access: Willingness to pay (Yes/No). -** State:** Indian state. -** Device_Used:** Device (e.g., Laptop, Mobile). -** Internet_Access: **Access quality (Poor/Medium/High). ### Use Cases Predict academic performance using AI tool usage. Analyze trust in AI across streams or regions. Cluster students by usage patterns. Study digital divide via Internet_Access. Source: Collected via Google Forms survey in May 2025, ensuring diverse representation across India.
Search
Clear search
Close search
Google apps
Main menu