100+ datasets found

Global impact of AI and big-data analytics on jobs 2023-2027
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global impact of AI and big-data analytics on jobs 2023-2027 [Dataset]. https://www.statista.com/statistics/1383919/ai-bigdata-impact-jobs/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2022 - Feb 2023
Area covered
Worldwide
Description
Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.
A
AI Training Dataset In Healthcare Market Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). AI Training Dataset In Healthcare Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-in-healthcare-market-5352
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
global
Variables measured
Market Size
Description
The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.
m
Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...
data.mendeley.com
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
Explore at:
Unique identifier
https://doi.org/10.17632/xmcg82mx9k.3
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
values-in-the-wild
huggingface.co
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthropic (2025). values-in-the-wild [Dataset]. https://huggingface.co/datasets/Anthropic/values-in-the-wild
Explore at:
Dataset updated
Apr 21, 2025
Dataset authored and provided by
Anthropichttps://anthropic.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary

This dataset presents a comprehensive taxonomy of 3307 values expressed by Claude (an AI assistant) across hundreds of thousands of real-world conversations. Using a novel privacy-preserving methodology, these values were extracted and classified without human reviewers accessing any conversation content. The dataset reveals patterns in how AI systems express values "in the wild" when interacting with diverse users and tasks. We're releasing this resource to advance research… See the full description on the dataset page: https://huggingface.co/datasets/Anthropic/values-in-the-wild.
Commercial Real Estate Data | Global Real Estate Professionals | Work...
datarade.ai
Updated Oct 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2021). Commercial Real Estate Data | Global Real Estate Professionals | Work Emails, Phone Numbers & Verified Profiles | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/commercial-real-estate-data-global-real-estate-professional-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 27, 2021
Dataset provided by
Area covered
Burkina Faso, Sierra Leone, Guatemala, El Salvador, Korea (Republic of), Hong Kong, Comoros, Netherlands, Marshall Islands, Bolivia (Plurinational State of)
Description
Success.ai’s Commercial Real Estate Data and B2B Contact Data for Global Real Estate Professionals is a comprehensive dataset designed to connect businesses with industry leaders in real estate worldwide. With over 170M verified profiles, including work emails and direct phone numbers, this solution ensures precise outreach to agents, brokers, property developers, and key decision-makers in the real estate sector.

Utilizing advanced AI-driven validation, our data is continuously updated to maintain 99% accuracy, offering actionable insights that empower targeted marketing, streamlined sales strategies, and efficient recruitment efforts. Whether you’re engaging with top real estate executives or sourcing local property experts, Success.ai provides reliable and compliant data tailored to your needs.

Key Features of Success.ai’s Real Estate Professional Contact Data

Comprehensive Industry Coverage Gain direct access to verified profiles of real estate professionals across the globe, including:

Real Estate Agents: Professionals facilitating property sales and purchases.

Brokers: Key intermediaries managing transactions between buyers and sellers.

Property Developers: Decision-makers shaping residential, commercial, and industrial projects.

Real Estate Executives: Leaders overseeing multi-regional operations and business strategies.

Architects & Consultants: Experts driving design and project feasibility.

Verified and Continuously Updated Data

AI-Powered Validation: All profiles are verified using cutting-edge AI to ensure up-to-date accuracy. Real-Time Updates: Our database is refreshed continuously to reflect the most current information. Global Compliance: Fully aligned with GDPR, CCPA, and other regional regulations for ethical data use.

Customizable Data Delivery Tailor your data access to align with your operational goals:

API Integration: Directly integrate data into your CRM or project management systems for seamless workflows. Custom Flat Files: Receive detailed datasets customized to your specifications, ready for immediate application.

Why Choose Success.ai for Real Estate Contact Data?

Best Price Guarantee Enjoy competitive pricing that delivers exceptional value for verified, comprehensive contact data.

Precision Targeting for Real Estate Professionals Our dataset equips you to connect directly with real estate decision-makers, minimizing misdirected efforts and improving ROI.

Strategic Use Cases

Lead Generation: Target qualified real estate agents and brokers to expand your network. Sales Outreach: Engage with property developers and executives to close high-value deals. Marketing Campaigns: Drive targeted campaigns tailored to real estate markets and demographics. Recruitment: Identify and attract top talent in real estate for your growing team. Market Research: Access firmographic and demographic data for in-depth industry analysis.

Data Highlights 170M+ Verified Professional Profiles 50M Work Emails 30M Company Profiles 700M Global Professional Profiles

Powerful APIs for Enhanced Functionality

Enrichment API Ensure your contact database remains relevant and up-to-date with real-time enrichment. Ideal for businesses seeking to maintain competitive agility in dynamic markets.

Lead Generation API Boost your lead generation with verified contact details for real estate professionals, supporting up to 860,000 API calls per day for robust scalability.

Use Cases for Real Estate Contact Data

Targeted Outreach for New Projects Connect with property developers and brokers to pitch your services or collaborate on upcoming projects.

Real Estate Marketing Campaigns Execute personalized marketing campaigns targeting agents and clients in residential, commercial, or industrial sectors.

Enhanced Sales Strategies Shorten sales cycles by directly engaging with decision-makers and key stakeholders.

Recruitment and Talent Acquisition Access profiles of highly skilled professionals to strengthen your real estate team.

Market Analysis and Intelligence Leverage firmographic and demographic insights to identify trends and optimize business strategies.

What Makes Us Stand Out? >> Unmatched Data Accuracy: Our AI-driven validation ensures 99% accuracy for all contact details. >> Comprehensive Global Reach: Covering professionals across diverse real estate markets worldwide. >> Flexible Delivery Options: Access data in formats that seamlessly fit your existing systems. >> Ethical and Compliant Data Practices: Adherence to global standards for secure and responsible data use.

Success.ai’s B2B Contact Data for Global Real Estate Professionals delivers the tools you need to connect with the right people at the right time, driving efficiency and success in your business operations. From agents and brokers to property developers and executiv...
B
Global Public Opinion on Artificial Intelligence (GPO-AI)
borealisdata.ca
Updated Mar 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blake Lee-Whiting; Peter John Loewen; Thomas Bergeron (2025). Global Public Opinion on Artificial Intelligence (GPO-AI) [Dataset]. http://doi.org/10.5683/SP3/WCUN0S
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/WCUN0S
Dataset updated
Mar 5, 2025
Dataset provided by
Borealis
Authors
Blake Lee-Whiting; Peter John Loewen; Thomas Bergeron
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In October and November 2023, researchers at the Schwartz Reisman Institute for Technology and Society and the Policy, Elections and Representation Lab at the Munk School of Global Affairs and Public Policy at the University of Toronto completed a survey on public perceptions of and attitudes toward AI. The survey was administered to over 1,000 people in each of 21 countries, for a total of 23,882 surveys conducted in 12 languages. The combined populations of the countries sampled represent a majority of the world's population. Countries: Argentina, Australia, Brazil, Canada, Chile, China, France, Germany, India, Indonesia, Italy, Japan, Kenya, Mexico, Pakistan, Poland, Portugal, South Africa, Spain, United Kingdom, United States of America Languages: Chinese (Simplified), English, French, German, Indonesian, Italian, Japanese, Polish, Portuguese (Portugal), Portuguese (Brazil), Spanish (Spain), Spanish (Latin America). The survey explored general knowledge of and attitudes toward AI. Topics included concerns about AI, safety, regulation, autonomous vehicles and AI's effect on jobs now and in the future. Participants were asked whether they are interested in or trust applications of AI for clothes, travel, grocery shopping, dating or finance. Respondents were asked about their attitudes toward the use of emerging technologies in education, the justice system, health care and immigration. Respondents were also asked about their knowledge of and experience with ChatGPT and deepfakes.
e
Future of Labour (June 2023) - Dataset - B2FIND
b2find.eudat.eu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Future of Labour (June 2023) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c936a262-64b1-5ba2-8e6e-682b4bef595c
Explore at:
Description
The study on the future of work was conducted by Kantar Public on behalf of the Press and Information Office of the Federal Government. During the survey period from 13 to 22 June 2023, German-speaking people aged 16 to 67 in Germany, excluding pensioners, were surveyed in online interviews (CAWI) on the following topics: current life and work situation, future expectations, the use of AI and the digitalization of the world of work as well as attitudes towards demographic change and the shortage of skilled workers. The respondents were selected using a quota sample from an online access panel. Future: general life satisfaction; satisfaction with selected aspects of life (working conditions, education, qualifications, health situation, professional remuneration, family situation, financial situation); expectations for the future: rather confident vs. rather worried about the private and professional future; rather confident vs. rather worried about the professional future of younger people or the next generation; rather confident vs. rather worried about the future of Germany; confidence vs. concern regarding the competitiveness of the German economy in various areas (digitalization and automation of the working world, climate protection goals of industry, effects of the Ukraine war on the German economy, access to important raw materials such as rare earths or metals, reliable supply of energy, number of qualified specialists, general price development, development of wages and salaries, development of pensions); probability of various future scenarios for Germany in 2030 (Germany is once again the world export champion, unemployment is at an all-time low - full employment prevails in Germany, the energy transition has already created hundreds of thousands of new jobs in German industry, Germany has emerged the strongest in the EU from the crises of the last 15 years, the price crisis has led to the fact The price crisis has meant that politics and business have successfully set the course for the future, citizens can deal with all official matters digitally from home, German industry is much faster than expected in terms of climate targets and is already almost climate-neutral, Germany is the most popular country of immigration for foreign university graduates, the nursing shortage in Germany has been overcome thanks to the immigration of skilled workers). 2. Importance of work: importance of different areas of life (ranking); work to earn money vs. as a vocation; importance of different work characteristics (e.g. job security, adequate income, development prospects and career opportunities, etc.). 3. Professional situation: satisfaction with various aspects of work (job security, pay/income, development/career opportunities, interesting work, sufficient contact with other people, compatibility of family/private life and work. Work climate/ working atmosphere, further training opportunities, social recognition, meaningful and useful work); job satisfaction; expected development of working conditions in own professional field; recognition for own work from the company/ employer, from colleagues, from other people from the work context, from the personal private environment, from society in general and from politics; unemployed people were asked: currently looking for a new job; assessment of chances of finding a new job; pupils, students and trainees were asked: assessment of future career opportunities; reasons for assessing career opportunities as poor (open). 4. AI: use of artificial intelligence (AI) in the world of work rather as an opportunity or rather as a danger; expected effects of AI on working conditions in their own professional field (improvement, deterioration, no effects); opportunities and dangers of digitization, AI and automation based on comparisons (all in all, digitization leads to a greater burden on the environment, as computers, tablets, smartphones and data centers are major power guzzlers vs. All in all, digitalization protects the environment through less mobility and more efficient management, artificial intelligence and digitalization help to reduce the workload and relieve employees of repetitive and monotonous tasks vs. artificial intelligence and digitalization overburden many employees through further work intensification. Stress and burnouts will increasingly be the result, artificial intelligence and digitalization will primarily lead to job losses vs. artificial intelligence and digitalization will create more new, future-proof jobs than old ones will be lost, our economy will benefit greatly from global networking through speed and efficiency gains vs. our economy is threatened by global networking by becoming more susceptible to cyberattacks and hacker attacks, digitalization will lead to new, more flexible working time models and a better work-life balance vs. digitalization will lead to a blurring of boundaries between work and leisure time and thus, above all, to more self-exploitation by employees). 5. Home office: local focus of own work currently, before the corona pandemic and during the corona pandemic (exclusively/ predominantly in the company or from home, at changing work locations (company, at home, mobile from on the road); Agreement with various statements on the topic of working from home (wherever possible, employers should give their employees the opportunity to work from home, working from home leads to a loss of cohesion in the company, working from home enables a better work-life balance, digital communication makes coordination processes more complicated, home office makes an important contribution to climate protection due to fewer journeys to work, home office leads to a mixture of work and leisure time and thus to a greater workload, home office leads to greater job satisfaction and thus to higher productivity, since many professions cannot be carried out in the home office, it would be fairer if everyone had to work outside the home); attitude towards a general 4-day working week (A four-day week for everyone would increase the shortage of skilled workers vs. a four-day week for everyone would increase motivation and therefore productivity). 6. Demographic change: knowledge of the meaning of the term demographic change; expected impact of demographic change on the future of Germany; opinion on the future in Germany based on alternative future scenarios (in the future, poverty in old age will increase noticeably vs. the future generation of pensioners will be wealthier than ever before, in the future, politics and elections will be increasingly determined by older people vs. the influence of the younger generation on politics will become much more important, our social security systems will continue to ensure intergenerational fairness and equalization in the future vs. the distribution conflicts between the younger and older generations will increase noticeably, future generations will have to work longer due to the shortage of skilled workers vs. people will have to work less in the future due to digitalization and automation and will be able to retire earlier). 7. Shortage of skilled workers: shortage of skilled workers in own company; additional personal burden due to shortage of skilled workers; company is doing enough to counteract the shortage of skilled workers; use of artificial intelligence (AI) in the company could compensate for the shortage of skilled workers; evaluation of various measures taken by the federal government to combat the shortage of skilled workers (improvement of training and further education opportunities, increasing the participation of women in the labor market (e.g. by expanding childcare services, more flexible working hours, offers for older skilled workers to stay in work longer, facilitating the immigration of foreign skilled workers); evaluation of the work of the federal government to combat the shortage of skilled workers; attractiveness (reputation in society) of various professions with a shortage of skilled workers (e.g. social pedagogues/educators); evaluation of the work of the federal government to combat the shortage of skilled workers. B. social pedagogue, nursery school teacher, etc.); job recommendation for younger people; own activity in one of the professions mentioned with a shortage of skilled workers. Demography: sex; age; age in age groups; employment; federal state; region west/east; school education; vocational training; self-placement social class; employment status; occupation differentiated workers, employees, civil servants; industry; household size; number of children under 18 in the household; net household income (grouped); location size; party sympathy; migration background (respondent, one parent or both parents). Additionally coded were: consecutive interview number; school education head group (low, medium, high); weighting factor.
EmoVisual Data
kaggle.com
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arya Shah (2024). EmoVisual Data [Dataset]. https://www.kaggle.com/datasets/aryashah2k/emovisual-data/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arya Shah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Emo Visual Data

Introduction

This is an emoticon visual annotation data set, which collects 5329 emoticons and uses the glm-4v api and step-free-api projects to complete the visual annotation through multi-modal large models.

Example:

0f20b31d-e019-4565-9286-fdf29cc8e144.jpg

Original 这个表情包中的内容和笑点在于它展示了一只卡通兔子，兔子的表情看起来既无奈又有些生气，配文是“活着已经够累了，上网你还要刁难我”。这句话以一种幽默的方式表达了许多人在上网时可能会遇到的挫折感或烦恼，尤其是当遇到困难或不顺心的事情时。这种对现代生活压力的轻松吐槽使得这个表情包在社交媒体上很受欢迎，人们用它来表达自己在网络世界中的疲惫感或面对困难时的幽默态度。

Translated: The content and laughter of this emoticon package is that it shows a cartoon rabbit. The rabbit's expression looks helpless and a little angry. The caption is "I am tired of living, but you still make things difficult for me online." This quote expresses in a humorous way the frustration or annoyance that many people may experience when surfing the Internet, especially when something difficult or doesn't go their way. This lighthearted take on the pressures of modern life has made the meme popular on social media, where people use it to express their feelings of exhaustion in the online world or to use humor in the face of difficulties.
F
Canadian French Call Center Data for Healthcare AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Canadian French Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-french-canada
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
French, Canada
Dataset funded by
FutureBeeAI
Description
Introduction
This Canadian French Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of French speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
Speech Data
The dataset features 30 Hours of dual-channel call center conversations between native Canadian French speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
•Participant Diversity:
•
Speakers: 60 verified native Canadian French speakers from our contributor community.

•
Regions: Diverse provinces across Canada to ensure broad dialectal representation.

•
Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.

•RecordingDetails:
•
Conversation Nature: Naturally flowing, unscripted conversations.

•
Call Duration: Each session ranges between 5 to 15 minutes.

•
Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clear conditions without background noise or echo.

Topic Diversity
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgical Consultation
•Dietary Advice and Consultations
•Insurance Coverage Inquiries
•Follow-up Treatment Requests, and more
•OutboundCalls:
•Appointment Reminders
•Preventive Care Campaigns
•Test Results & Lab Reports
•Health Risk Assessment Calls
•Vaccination Updates
•Wellness Subscription Outreach, and more
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Transcription
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
•Transcription Includes:
•Speaker-identified Dialogues
•Time-coded Segments
•Non-speech Annotations (e.g., silence, cough)
•High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.
Metadata
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
•
Participant Metadata: ID, gender, age, region, accent, and dialect.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

Usage and Applications
This dataset can be used across a range of healthcare and voice AI use cases:
•
<b
Global Blockchain AI Market Size By Technology (Computer Vision, Natural...
verifiedmarketresearch.com
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Blockchain AI Market Size By Technology (Computer Vision, Natural Language Processing, Machine Learning), By Deployment (Cloud, On-Premise), By Application (Smart Contracts, Governance, Logistics and Supply Chain Management, Payments & Settlements), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/blockchain-ai-market/
Explore at:
Dataset updated
Jun 10, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2030
Area covered
Global
Description
Blockchain AI Market size was valued at USD 448 Million in 2023 and is projected to reach USD 2730 Million by 2031, at a CAGR of 25.5% from 2024 to 2031.

Global Blockchain AI Market Drivers

The market drivers for the Blockchain AI Market can be influenced by various factors. These may include:

Enhanced Data Security: By offering a decentralized and unchangeable record for information sharing and archiving, the combination of blockchain technology and artificial intelligence improves data security. Sensitive information is especially valuable in this secure infrastructure for supply chain management, banking, and healthcare. Increased Adoption of AI: As AI is used more and more in many industries, there is a greater need for blockchain-based solutions to deal with issues with data transparency and integrity. Blockchain technology ensures the quality and dependability of AI-powered services and apps by verifying the legitimacy of the data used to train AI algorithms. Growing worries About Data Privacy: Organizations are investigating blockchain AI solutions that provide more control over data access and usage due to growing worries about data privacy and ownership. Blockchain gives people control over their data while allowing AI algorithms to access it selectively for processing and analysis. Demand for Transparent and Reliable AI Systems: Companies and customers alike are looking for reliable and transparent AI systems that can shed light on the decision-making process. Blockchain technology makes it possible to transparently record the decisions and acts of AI algorithms, which promotes transparency and confidence in AI-powered systems. Decentralized AI Marketplaces Are Necessary: Blockchain technology is enabling the development of decentralized AI marketplaces, which are democratizing access to AI datasets and algorithms. These markets enable peer-to-peer exchanges and cooperation, enabling businesses and developers to profitably and effectively share AI resources. Regulatory Compliance Requirements: The adoption of blockchain AI solutions is being driven by regulatory mandates, such as the GDPR (General Data Protection Regulation) in Europe and HIPAA (Health Insurance Portability and Accountability Act) in the healthcare industry, to ensure compliance with data protection regulations. The transparent data governance offered by blockchain's immutability and auditability features facilitate regulatory compliance. Growing Interest in Federated Learning: Due to privacy concerns and data localization requirements, federated learning, a distributed machine learning approach, is gaining interest. It trains AI models across various decentralized devices. Blockchain technology guarantees data privacy, integrity, and incentive among participating nodes, which can enable safe and effective federated learning. Extension of DAOs and Smart Contracts: Automated and untrusted decision-making and agreement execution is made possible by the combination of AI systems with smart contracts and decentralized autonomous organizations (DAOs). Smart contracts built on the blockchain can carry out predetermined scenarios and transactions based on insights generated by artificial intelligence, simplifying corporate processes and lowering dependency on middlemen. The emergence of AI-driven token economies: is being fueled by the convergence of blockchain and AI technology. In these economies, tokens are utilized as incentives for sharing data, training models, and improving algorithms. These token economies ensure equitable reward for contributions while encouraging cooperation and creativity in AI research and development. Partnerships and Cross-Industry Collaboration: The adoption of blockchain AI solutions is being accelerated by partnerships and cross-industry collaboration among research institutions, industry consortia, and technology vendors. Inter-industry collaborations enable the sharing of knowledge, assets, and optimal methodologies, promoting the advancement of blockchain artificial intelligence solutions that are both interoperable and scalable.
F
Russian Call Center Data for Healthcare AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Russian Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-russian-russia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Russian Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Russian speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
Speech Data
The dataset features 30 Hours of dual-channel call center conversations between native Russian speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
•Participant Diversity:
•
Speakers: 60 verified native Russian speakers from our contributor community.

•
Regions: Diverse provinces across Russia to ensure broad dialectal representation.

•
Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.

•RecordingDetails:
•
Conversation Nature: Naturally flowing, unscripted conversations.

•
Call Duration: Each session ranges between 5 to 15 minutes.

•
Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clear conditions without background noise or echo.

Topic Diversity
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgical Consultation
•Dietary Advice and Consultations
•Insurance Coverage Inquiries
•Follow-up Treatment Requests, and more
•OutboundCalls:
•Appointment Reminders
•Preventive Care Campaigns
•Test Results & Lab Reports
•Health Risk Assessment Calls
•Vaccination Updates
•Wellness Subscription Outreach, and more
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Transcription
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
•Transcription Includes:
•Speaker-identified Dialogues
•Time-coded Segments
•Non-speech Annotations (e.g., silence, cough)
•High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.
Metadata
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
•
Participant Metadata: ID, gender, age, region, accent, and dialect.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

Usage and Applications
This dataset can be used across a range of healthcare and voice AI use cases:
•
<b style="font-weight:
Data from: NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research...
zenodo.org
data.niaid.nih.gov
zip
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde (2023). NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection [Dataset]. http://doi.org/10.5281/zenodo.7931113
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7931113
Dataset updated
May 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NeSy4VRD

NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.

Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.

The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.

NeSy4VRD on Zenodo: the NeSy4VRD dataset package

This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.

The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.

Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.

The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.

Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.

All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.

NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code

The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:

comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the NeSy4VRD ontology, VRD-World, as well)

open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the NeSy4VRD ontology, VRD-World, and RDF knowledge graphs.

The NeSy4VRD infrastructure supporting extensibility consists of:

open source Python code for conducting deep and comprehensive analyses of the NeSy4VRD dataset (the VRD images and their associated NeSy4VRD visual relationship annotations)

an open source, custom-designed NeSy4VRD protocol for specifying visual relationship annotation customisation instructions declaratively, in text files

an open source, custom-designed NeSy4VRD workflow, implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process.

The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.

The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.

To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:

use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.);

use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images;

use the NeSy4VRD protocol to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>);

use the NeSy4VRD workflow to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images;

introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor);

continue experimenting, now with the added benefit of the additional stuff object class 'water';

contribute the enriched set of NeSy4VRD visual relationship
F
English Human-Human Chat Dataset for Conversational AI & NLP
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-general-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The English General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world English usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level English conversations covering a broad spectrum of everyday topics.
Conversational Text Data
This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native English speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.
•
Words per Chat: 300–700

•
Turns per Chat: Up to 50 dialogue turns

•
Contributors: 200 native English speakers from the FutureBeeAI Crowd Community

•
Format: TXT, DOCS, JSON or CSV (customizable)

•
Structure: Each record contains the full chat, topic tag, and metadata block

Diversity and Domain Coverage
Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:
•Music, books, and movies
•Health and wellness
•Children and parenting
•Family life and relationships
•Food and cooking
•Education and studying
•Festivals and traditions
•Environment and daily life
•Internet and tech usage
•Childhood memories and casual chatting
This diversity ensures the dataset is useful across multiple NLP and language understanding applications.
Linguistic Authenticity
Chats reflect informal, native-level English usage with:
•Colloquial expressions and local dialect influence
•Domain-relevant terminology
•Language-specific grammar, phrasing, and sentence flow
•Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
•Representation of different writing styles and input quirks to ensure training data realism
Metadata
Every chat instance is accompanied by structured metadata, which includes:
•Participant Age
•Gender
•Country/Region
•Chat Domain
•Chat Topic
•Dialect
This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.
Data Quality Assurance
All chat records pass through a rigorous QA process to maintain consistency and accuracy:
•Manual review for content completeness
•Format checks for chat turns and metadata
•Linguistic verification by native speakers
•Removal of inappropriate or unusable samples
This ensures a clean, reliable dataset ready for high-performance AI model training.
Applications
This dataset is ideal for training and evaluating a wide range of text-based AI systems:
•Conversational AI / Chatbots
•Smart assistants and voicebots
<div
Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...
datarade.ai
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 12, 2024
Dataset provided by
Area covered
United States
Description
Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

API Features:

Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.

High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.

Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
f
Description of multimodal dataset.
plos.figshare.com
xls
Updated Nov 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sobhana Jahan; Kazi Abu Taher; M. Shamim Kaiser; Mufti Mahmud; Md. Sazzadur Rahman; A. S. M. Sanwar Hosen; In-Ho Ra (2023). Description of multimodal dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0294253.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0294253.t001
Dataset updated
Nov 16, 2023
Dataset provided by
PLOS ONE
Authors
Sobhana Jahan; Kazi Abu Taher; M. Shamim Kaiser; Mufti Mahmud; Md. Sazzadur Rahman; A. S. M. Sanwar Hosen; In-Ho Ra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundAccording to the World Health Organization (WHO), dementia is the seventh leading reason of death among all illnesses and one of the leading causes of disability among the world’s elderly people. Day by day the number of Alzheimer’s patients is rising. Considering the increasing rate and the dangers, Alzheimer’s disease should be diagnosed carefully. Machine learning is a potential technique for Alzheimer’s diagnosis but general users do not trust machine learning models due to the black-box nature. Even, some of those models do not provide the best performance because of using only neuroimaging data.ObjectiveTo solve these issues, this paper proposes a novel explainable Alzheimer’s disease prediction model using a multimodal dataset. This approach performs a data-level fusion using clinical data, MRI segmentation data, and psychological data. However, currently, there is very little understanding of multimodal five-class classification of Alzheimer’s disease.MethodFor predicting five class classifications, 9 most popular Machine Learning models are used. These models are Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Multi-Layer Perceptron (MLP), K-Nearest Neighbor (KNN), Gradient Boosting (GB), Adaptive Boosting (AdaB), Support Vector Machine (SVM), and Naive Bayes (NB). Among these models RF has scored the highest value. Besides for explainability, SHapley Additive exPlanation (SHAP) is used in this research work.Results and conclusionsThe performance evaluation demonstrates that the RF classifier has a 10-fold cross-validation accuracy of 98.81% for predicting Alzheimer’s disease, cognitively normal, non-Alzheimer’s dementia, uncertain dementia, and others. In addition, the study utilized Explainable Artificial Intelligence based on the SHAP model and analyzed the causes of prediction. To the best of our knowledge, we are the first to present this multimodal (Clinical, Psychological, and MRI segmentation data) five-class classification of Alzheimer’s disease using Open Access Series of Imaging Studies (OASIS-3) dataset. Besides, a novel Alzheimer’s patient management architecture is also proposed in this work.
F
Mandarin Call Center Data for Travel AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Mandarin Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-mandarin-china
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Mandarin Chinese Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Mandarin -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
Speech Data
The dataset includes 30 hours of dual-channel audio recordings between native Mandarin Chinese speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
•Participant Diversity:
•
Speakers: 60 native Mandarin Chinese contributors from our verified pool.

•
Regions: Covering multiple China provinces to capture accent and dialectal variation.

•
Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).

•Recording Details:
•
Conversation Nature: Naturally flowing, spontaneous customer-agent calls.

•
Call Duration: Between 5 and 15 minutes per session.

•
Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.

•
Recording Environment: Captured in controlled, noise-free, echo-free settings.

Topic Diversity
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
•Inbound Calls:
•Booking Assistance
•Destination Information
•Flight Delays or Cancellations
•Support for Disabled Passengers
•Health and Safety Travel Inquiries
•Lost or Delayed Luggage, and more
•Outbound Calls:
•Promotional Travel Offers
•Customer Feedback Surveys
•Booking Confirmations
•Flight Rescheduling Alerts
•Visa Expiry Notifications, and others
These scenarios help models understand and respond to diverse traveler needs in real-time.
Transcription
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-Stamped Segments
•Non-speech Markers (e.g., pauses, coughs)
•High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.
Metadata
Extensive metadata enriches each call and speaker for better filtering and AI training:
•
Participant Metadata: ID, age, gender, region, accent, and dialect.

•
Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

Usage and Applications
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
•
ASR Systems: Train Mandarin speech-to-text engines for travel platforms.

<div style="margin-top:10px; margin-bottom: 10px;
A
‘World Happiness Report 2019’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘World Happiness Report 2019’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-world-happiness-report-2019-f29c/e8e08550/?iid=004-258&v=presentation
Explore at:
Dataset updated
Nov 20, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘World Happiness Report 2019’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/PromptCloudHQ/world-happiness-report-2019 on 30 September 2021.

--- Dataset description provided by original source is as follows ---

The data has been released by SDSN and extracted by PromptCloud's custom web crawling solution.

Context

The World Happiness Report is a landmark survey of the state of global happiness that ranks 156 countries by how happy their citizens perceive themselves to be. This year’s World Happiness Report focuses on happiness and the community: how happiness has evolved over the past dozen years, with a focus on the technologies, social norms, conflicts and government policies that have driven those changes.

Content

What is Dystopia?

Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive (or zero, in six instances) width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom, and least social support, it is referred to as “Dystopia,” in contrast to Utopia.

What are the residuals?

The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2016-2018 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.7 shows the average residual for each country if the equation in Table 2.1 is applied to average 2016- 2018 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.7, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.88 on the 0 to 10 scale. Table 7 of the online Statistical Appendix 1 for Chapter 2 puts the Dystopia plus residual block at the left side, and also draws the Dystopia line, making it easy to compare the signs and sizes of the residuals in different countries.

Why do we use these six factors to explain life evaluations?

The variables used reflect what has been broadly found in the research literature to be important in explaining national-level differences in life evaluations. Some important variables, such as unemployment or inequality, do not appear because comparable international data are not yet available for the full sample of countries. The variables are intended to illustrate important lines of correlation rather than to reflect clean causal estimates, since some of the data are drawn from the same survey sources, some are correlated with each other (or with other important factors for which we do not have measures), and in several instances there are likely to be two-way relations between life evaluations and the chosen variables (for example, healthy people are overall happier, but as Chapter 4 in the World Happiness Report 2013 demonstrated, happier people are overall healthier). In Statistical Appendix 1 of World Happiness Report 2018, we assessed the possible importance of using explanatory data from the same people whose life evaluations are being explained. We did this by randomly dividing the samples into two groups, and using the average values for .e.g. freedom gleaned from one group to explain the life evaluations of the other group. This lowered the effects, but only very slightly (e.g. 2% to 3%), assuring us that using data from the same individuals is not seriously affecting the results.

Data source: http://worldhappiness.report/ed/2019/

More such datasets can be downloaded from DataStock.

--- Original source retains full ownership of the source dataset ---
F
South Asian Facial Images Dataset | Selfie & ID Card Images
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). South Asian Facial Images Dataset | Selfie & ID Card Images [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-selfie-id-south-asian
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
South Asia
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the South Asian Human Facial Images Dataset, curated to advance facial recognition technology and support the development of secure biometric identity systems, KYC verification processes, and AI-driven computer vision applications. This dataset is designed to serve as a robust foundation for real-world face matching and recognition use cases.
Facial Image Data
The dataset contains over 8,000 facial image sets of South Asian individuals. Each set includes:
•
Selfie Images: 5 high-quality selfie images taken under different conditions

•
ID Card Images: 2 clear facial images extracted from different government-issued ID cards

Diversity & Representation
•
Geographic Diversity: Participants represent South Asian countries including India, Pakistan, Bangladesh, Nepal, Sri Lanka, Bhutan, Maldives, and more

•
Demographics: Individuals aged 18 to 70 years with a 60:40 male-to-female ratio

•
File Formats: Images are provided in JPEG and HEIC formats for compatibility and quality retention

Image Quality & Capture Conditions
All images were captured with real-world variability to enhance dataset robustness:
•
Lighting: Captured under diverse lighting setups to simulate real environments

•
Backgrounds: A wide variety of indoor and outdoor backgrounds

•
Device Quality: Captured using modern smartphones to ensure high resolution and clarity

Metadata
Each participant’s data is accompanied by rich metadata to support AI model training, including:
•Unique participant ID
•Image file names
•Age at the time of capture
•Gender
•Country of origin
•Demographic details
•File format information
This metadata enables targeted filtering and training across diverse scenarios.
Use Cases & Applications
This dataset is ideal for a wide range of AI and biometric applications:
•
Facial Recognition: Train accurate and generalizable face matching models

•
KYC & Identity Verification: Enhance onboarding and compliance systems in fintech and government services

•
Biometric Identification: Build secure facial recognition systems for access control and identity authentication

•
Age Prediction: Train models to estimate age from facial features

•
Generative AI: Provide reference data for synthetic face generation or augmentation tasks

Secure & Ethical Collection
•
Data Security: All images were securely stored and processed on FutureBeeAI’s proprietary platform

•
Ethical Compliance: Data collection was conducted in full alignment with privacy laws and ethical standards

•
Informed Consent: Every participant provided written consent, with full awareness of the intended uses of the data

Dataset Updates & Customization
To meet evolving AI demands, this dataset is regularly updated and can be customized. Available options include:
<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;
📚 Student Performance Dataset 📚
kaggle.com
Updated Mar 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waqar Ali (2025). 📚 Student Performance Dataset 📚 [Dataset]. https://www.kaggle.com/datasets/waqi786/student-performance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Waqar Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Understanding student performance is key to improving education systems and learning outcomes. This synthetic dataset is designed to simulate real-world academic data, enabling researchers, educators, and data scientists to analyze factors influencing student achievement in a structured and ethical manner.

With AI-generated records, this dataset provides insights into how demographic attributes, academic performance, and attendance patterns interact to shape student success.

🔍 Key Features: ✔️ Demographics & Grade Levels – Understand how age, gender, and grade level influence academic outcomes ✔️ Subject-Specific Performance – Modeled Math, Reading, and Writing scores for detailed analysis ✔️ Attendance Records – Explore the correlation between school presence and academic success ✔️ Comprehensive Student Data – Synthetic records designed for educational research and machine learning applications

📊 Dataset Overview: This dataset has been synthetically generated and does not contain real-world data. It is intended for educational purposes, machine learning practice, and exploratory data analysis related to student performance.

📖 Columns Description: Student_ID – Unique identifier for each synthetic student Gender – Simulated gender representation Age – Modeled student age Grade_Level – Academic level of the student Math_Score, Reading_Score, Writing_Score – Simulated subject-wise scores Attendance – Modeled school attendance record ⚠️ Disclaimer: This dataset is completely synthetic and should not be used for real-world educational policy-making, student assessments, or institutional reporting. It serves as a safe, ethical resource for learning, research, and model development.

🔹 Use this dataset to explore student performance trends, build predictive models, and gain insights into educational success factors! 🎯📊
F
Hindi Agent-Customer Chat Dataset for Healthcare Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-healthcare-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.
Participant & Chat Overview
•
Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community

•
Conversation Length: 300–700 words per chat

•
Turns per Chat: 50–150 dialogue turns across both participants

•
Chat Types: Inbound and outbound

•
Sentiment Coverage: Positive, neutral, and negative outcomes included

Topic Diversity
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
•
Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups

•
Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
Language Diversity & Realism
This dataset reflects the natural flow of Hindi healthcare communication and includes:
•
Authentic Naming Patterns: Hindi personal names, clinic names, and brands

•
Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Hindi formats

•
Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Hindi-speaking regions

•
Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversational Flow & Structure
Conversations range from simple inquiries to complex advisory sessions, including:
•General inquiries
•Detailed problem-solving
•Routine status updates
•Treatment recommendations
•Support and feedback interactions
Each conversation typically includes these structural components:
•Greetings and verification
•Information gathering
•Problem definition
•Solution delivery
•Closing messages
•Follow-up and feedback (where applicable)
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Data Format & Structure
Available in JSON, CSV, and TXT formats, each conversation includes:
•Full message history with clear speaker labels
•Participant identifiers
•Metadata (e.g., topic tags, region, sentiment)
•Compatibility with common NLP and ML pipelines
Applications
<p

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Global impact of AI and big-data analytics on jobs 2023-2027 [Dataset]. https://www.statista.com/statistics/1383919/ai-bigdata-impact-jobs/

Global impact of AI and big-data analytics on jobs 2023-2027

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 30, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Nov 2022 - Feb 2023

Area covered

Worldwide

Description

Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.

Clear search

Close search

Google apps

Main menu

Global impact of AI and big-data analytics on jobs 2023-2027

AI Training Dataset In Healthcare Market Report

Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

values-in-the-wild

Commercial Real Estate Data | Global Real Estate Professionals | Work...

Global Public Opinion on Artificial Intelligence (GPO-AI)

Future of Labour (June 2023) - Dataset - B2FIND

EmoVisual Data

Emo Visual Data

Introduction

Canadian French Call Center Data for Healthcare AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Global Blockchain AI Market Size By Technology (Computer Vision, Natural...

Russian Call Center Data for Healthcare AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Data from: NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research...

English Human-Human Chat Dataset for Conversational AI & NLP

Introduction

Conversational Text Data

Diversity and Domain Coverage

Linguistic Authenticity

Metadata

Data Quality Assurance

Applications

Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

Description of multimodal dataset.

Mandarin Call Center Data for Travel AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

‘World Happiness Report 2019’ analyzed by Analyst-2

Context

Content

South Asian Facial Images Dataset | Selfie & ID Card Images

Introduction

Facial Image Data

Diversity & Representation

Image Quality & Capture Conditions

Metadata

Use Cases & Applications

Secure & Ethical Collection

Dataset Updates & Customization

📚 Student Performance Dataset 📚

Hindi Agent-Customer Chat Dataset for Healthcare Domain

Introduction

Participant & Chat Overview

Topic Diversity

Language Diversity & Realism

Conversational Flow & Structure

Data Format & Structure

Applications

Global impact of AI and big-data analytics on jobs 2023-2027