100+ datasets found
  1. Global impact of AI and big-data analytics on jobs 2023-2027

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global impact of AI and big-data analytics on jobs 2023-2027 [Dataset]. https://www.statista.com/statistics/1383919/ai-bigdata-impact-jobs/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2022 - Feb 2023
    Area covered
    Worldwide
    Description

    Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.

  2. A

    AI Training Dataset In Healthcare Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). AI Training Dataset In Healthcare Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-in-healthcare-market-5352
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    global
    Variables measured
    Market Size
    Description

    The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.

  3. m

    Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

    • data.mendeley.com
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

    Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.

  4. values-in-the-wild

    • huggingface.co
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthropic (2025). values-in-the-wild [Dataset]. https://huggingface.co/datasets/Anthropic/values-in-the-wild
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset authored and provided by
    Anthropichttps://anthropic.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset presents a comprehensive taxonomy of 3307 values expressed by Claude (an AI assistant) across hundreds of thousands of real-world conversations. Using a novel privacy-preserving methodology, these values were extracted and classified without human reviewers accessing any conversation content. The dataset reveals patterns in how AI systems express values "in the wild" when interacting with diverse users and tasks. We're releasing this resource to advance research… See the full description on the dataset page: https://huggingface.co/datasets/Anthropic/values-in-the-wild.

  5. Commercial Real Estate Data | Global Real Estate Professionals | Work...

    • datarade.ai
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). Commercial Real Estate Data | Global Real Estate Professionals | Work Emails, Phone Numbers & Verified Profiles | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/commercial-real-estate-data-global-real-estate-professional-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Area covered
    Burkina Faso, Sierra Leone, Guatemala, El Salvador, Korea (Republic of), Hong Kong, Comoros, Netherlands, Marshall Islands, Bolivia (Plurinational State of)
    Description

    Success.ai’s Commercial Real Estate Data and B2B Contact Data for Global Real Estate Professionals is a comprehensive dataset designed to connect businesses with industry leaders in real estate worldwide. With over 170M verified profiles, including work emails and direct phone numbers, this solution ensures precise outreach to agents, brokers, property developers, and key decision-makers in the real estate sector.

    Utilizing advanced AI-driven validation, our data is continuously updated to maintain 99% accuracy, offering actionable insights that empower targeted marketing, streamlined sales strategies, and efficient recruitment efforts. Whether you’re engaging with top real estate executives or sourcing local property experts, Success.ai provides reliable and compliant data tailored to your needs.

    Key Features of Success.ai’s Real Estate Professional Contact Data

    • Comprehensive Industry Coverage Gain direct access to verified profiles of real estate professionals across the globe, including:
    1. Real Estate Agents: Professionals facilitating property sales and purchases.
    2. Brokers: Key intermediaries managing transactions between buyers and sellers.
    3. Property Developers: Decision-makers shaping residential, commercial, and industrial projects.
    4. Real Estate Executives: Leaders overseeing multi-regional operations and business strategies.
    5. Architects & Consultants: Experts driving design and project feasibility.
    • Verified and Continuously Updated Data

    AI-Powered Validation: All profiles are verified using cutting-edge AI to ensure up-to-date accuracy. Real-Time Updates: Our database is refreshed continuously to reflect the most current information. Global Compliance: Fully aligned with GDPR, CCPA, and other regional regulations for ethical data use.

    • Customizable Data Delivery Tailor your data access to align with your operational goals:

    API Integration: Directly integrate data into your CRM or project management systems for seamless workflows. Custom Flat Files: Receive detailed datasets customized to your specifications, ready for immediate application.

    Why Choose Success.ai for Real Estate Contact Data?

    • Best Price Guarantee Enjoy competitive pricing that delivers exceptional value for verified, comprehensive contact data.

    • Precision Targeting for Real Estate Professionals Our dataset equips you to connect directly with real estate decision-makers, minimizing misdirected efforts and improving ROI.

    • Strategic Use Cases

      Lead Generation: Target qualified real estate agents and brokers to expand your network. Sales Outreach: Engage with property developers and executives to close high-value deals. Marketing Campaigns: Drive targeted campaigns tailored to real estate markets and demographics. Recruitment: Identify and attract top talent in real estate for your growing team. Market Research: Access firmographic and demographic data for in-depth industry analysis.

    • Data Highlights 170M+ Verified Professional Profiles 50M Work Emails 30M Company Profiles 700M Global Professional Profiles

    • Powerful APIs for Enhanced Functionality

      Enrichment API Ensure your contact database remains relevant and up-to-date with real-time enrichment. Ideal for businesses seeking to maintain competitive agility in dynamic markets.

    Lead Generation API Boost your lead generation with verified contact details for real estate professionals, supporting up to 860,000 API calls per day for robust scalability.

    • Use Cases for Real Estate Contact Data
    1. Targeted Outreach for New Projects Connect with property developers and brokers to pitch your services or collaborate on upcoming projects.

    2. Real Estate Marketing Campaigns Execute personalized marketing campaigns targeting agents and clients in residential, commercial, or industrial sectors.

    3. Enhanced Sales Strategies Shorten sales cycles by directly engaging with decision-makers and key stakeholders.

    4. Recruitment and Talent Acquisition Access profiles of highly skilled professionals to strengthen your real estate team.

    5. Market Analysis and Intelligence Leverage firmographic and demographic insights to identify trends and optimize business strategies.

    • What Makes Us Stand Out? >> Unmatched Data Accuracy: Our AI-driven validation ensures 99% accuracy for all contact details. >> Comprehensive Global Reach: Covering professionals across diverse real estate markets worldwide. >> Flexible Delivery Options: Access data in formats that seamlessly fit your existing systems. >> Ethical and Compliant Data Practices: Adherence to global standards for secure and responsible data use.

    Success.ai’s B2B Contact Data for Global Real Estate Professionals delivers the tools you need to connect with the right people at the right time, driving efficiency and success in your business operations. From agents and brokers to property developers and executiv...

  6. B

    Global Public Opinion on Artificial Intelligence (GPO-AI)

    • borealisdata.ca
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blake Lee-Whiting; Peter John Loewen; Thomas Bergeron (2025). Global Public Opinion on Artificial Intelligence (GPO-AI) [Dataset]. http://doi.org/10.5683/SP3/WCUN0S
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 5, 2025
    Dataset provided by
    Borealis
    Authors
    Blake Lee-Whiting; Peter John Loewen; Thomas Bergeron
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In October and November 2023, researchers at the Schwartz Reisman Institute for Technology and Society and the Policy, Elections and Representation Lab at the Munk School of Global Affairs and Public Policy at the University of Toronto completed a survey on public perceptions of and attitudes toward AI. The survey was administered to over 1,000 people in each of 21 countries, for a total of 23,882 surveys conducted in 12 languages. The combined populations of the countries sampled represent a majority of the world's population. Countries: Argentina, Australia, Brazil, Canada, Chile, China, France, Germany, India, Indonesia, Italy, Japan, Kenya, Mexico, Pakistan, Poland, Portugal, South Africa, Spain, United Kingdom, United States of America Languages: Chinese (Simplified), English, French, German, Indonesian, Italian, Japanese, Polish, Portuguese (Portugal), Portuguese (Brazil), Spanish (Spain), Spanish (Latin America). The survey explored general knowledge of and attitudes toward AI. Topics included concerns about AI, safety, regulation, autonomous vehicles and AI's effect on jobs now and in the future. Participants were asked whether they are interested in or trust applications of AI for clothes, travel, grocery shopping, dating or finance. Respondents were asked about their attitudes toward the use of emerging technologies in education, the justice system, health care and immigration. Respondents were also asked about their knowledge of and experience with ChatGPT and deepfakes.

  7. e

    Future of Labour (June 2023) - Dataset - B2FIND

    • b2find.eudat.eu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Future of Labour (June 2023) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c936a262-64b1-5ba2-8e6e-682b4bef595c
    Explore at:
    Description

    The study on the future of work was conducted by Kantar Public on behalf of the Press and Information Office of the Federal Government. During the survey period from 13 to 22 June 2023, German-speaking people aged 16 to 67 in Germany, excluding pensioners, were surveyed in online interviews (CAWI) on the following topics: current life and work situation, future expectations, the use of AI and the digitalization of the world of work as well as attitudes towards demographic change and the shortage of skilled workers. The respondents were selected using a quota sample from an online access panel. Future: general life satisfaction; satisfaction with selected aspects of life (working conditions, education, qualifications, health situation, professional remuneration, family situation, financial situation); expectations for the future: rather confident vs. rather worried about the private and professional future; rather confident vs. rather worried about the professional future of younger people or the next generation; rather confident vs. rather worried about the future of Germany; confidence vs. concern regarding the competitiveness of the German economy in various areas (digitalization and automation of the working world, climate protection goals of industry, effects of the Ukraine war on the German economy, access to important raw materials such as rare earths or metals, reliable supply of energy, number of qualified specialists, general price development, development of wages and salaries, development of pensions); probability of various future scenarios for Germany in 2030 (Germany is once again the world export champion, unemployment is at an all-time low - full employment prevails in Germany, the energy transition has already created hundreds of thousands of new jobs in German industry, Germany has emerged the strongest in the EU from the crises of the last 15 years, the price crisis has led to the fact The price crisis has meant that politics and business have successfully set the course for the future, citizens can deal with all official matters digitally from home, German industry is much faster than expected in terms of climate targets and is already almost climate-neutral, Germany is the most popular country of immigration for foreign university graduates, the nursing shortage in Germany has been overcome thanks to the immigration of skilled workers). 2. Importance of work: importance of different areas of life (ranking); work to earn money vs. as a vocation; importance of different work characteristics (e.g. job security, adequate income, development prospects and career opportunities, etc.). 3. Professional situation: satisfaction with various aspects of work (job security, pay/income, development/career opportunities, interesting work, sufficient contact with other people, compatibility of family/private life and work. Work climate/ working atmosphere, further training opportunities, social recognition, meaningful and useful work); job satisfaction; expected development of working conditions in own professional field; recognition for own work from the company/ employer, from colleagues, from other people from the work context, from the personal private environment, from society in general and from politics; unemployed people were asked: currently looking for a new job; assessment of chances of finding a new job; pupils, students and trainees were asked: assessment of future career opportunities; reasons for assessing career opportunities as poor (open). 4. AI: use of artificial intelligence (AI) in the world of work rather as an opportunity or rather as a danger; expected effects of AI on working conditions in their own professional field (improvement, deterioration, no effects); opportunities and dangers of digitization, AI and automation based on comparisons (all in all, digitization leads to a greater burden on the environment, as computers, tablets, smartphones and data centers are major power guzzlers vs. All in all, digitalization protects the environment through less mobility and more efficient management, artificial intelligence and digitalization help to reduce the workload and relieve employees of repetitive and monotonous tasks vs. artificial intelligence and digitalization overburden many employees through further work intensification. Stress and burnouts will increasingly be the result, artificial intelligence and digitalization will primarily lead to job losses vs. artificial intelligence and digitalization will create more new, future-proof jobs than old ones will be lost, our economy will benefit greatly from global networking through speed and efficiency gains vs. our economy is threatened by global networking by becoming more susceptible to cyberattacks and hacker attacks, digitalization will lead to new, more flexible working time models and a better work-life balance vs. digitalization will lead to a blurring of boundaries between work and leisure time and thus, above all, to more self-exploitation by employees). 5. Home office: local focus of own work currently, before the corona pandemic and during the corona pandemic (exclusively/ predominantly in the company or from home, at changing work locations (company, at home, mobile from on the road); Agreement with various statements on the topic of working from home (wherever possible, employers should give their employees the opportunity to work from home, working from home leads to a loss of cohesion in the company, working from home enables a better work-life balance, digital communication makes coordination processes more complicated, home office makes an important contribution to climate protection due to fewer journeys to work, home office leads to a mixture of work and leisure time and thus to a greater workload, home office leads to greater job satisfaction and thus to higher productivity, since many professions cannot be carried out in the home office, it would be fairer if everyone had to work outside the home); attitude towards a general 4-day working week (A four-day week for everyone would increase the shortage of skilled workers vs. a four-day week for everyone would increase motivation and therefore productivity). 6. Demographic change: knowledge of the meaning of the term demographic change; expected impact of demographic change on the future of Germany; opinion on the future in Germany based on alternative future scenarios (in the future, poverty in old age will increase noticeably vs. the future generation of pensioners will be wealthier than ever before, in the future, politics and elections will be increasingly determined by older people vs. the influence of the younger generation on politics will become much more important, our social security systems will continue to ensure intergenerational fairness and equalization in the future vs. the distribution conflicts between the younger and older generations will increase noticeably, future generations will have to work longer due to the shortage of skilled workers vs. people will have to work less in the future due to digitalization and automation and will be able to retire earlier). 7. Shortage of skilled workers: shortage of skilled workers in own company; additional personal burden due to shortage of skilled workers; company is doing enough to counteract the shortage of skilled workers; use of artificial intelligence (AI) in the company could compensate for the shortage of skilled workers; evaluation of various measures taken by the federal government to combat the shortage of skilled workers (improvement of training and further education opportunities, increasing the participation of women in the labor market (e.g. by expanding childcare services, more flexible working hours, offers for older skilled workers to stay in work longer, facilitating the immigration of foreign skilled workers); evaluation of the work of the federal government to combat the shortage of skilled workers; attractiveness (reputation in society) of various professions with a shortage of skilled workers (e.g. social pedagogues/educators); evaluation of the work of the federal government to combat the shortage of skilled workers. B. social pedagogue, nursery school teacher, etc.); job recommendation for younger people; own activity in one of the professions mentioned with a shortage of skilled workers. Demography: sex; age; age in age groups; employment; federal state; region west/east; school education; vocational training; self-placement social class; employment status; occupation differentiated workers, employees, civil servants; industry; household size; number of children under 18 in the household; net household income (grouped); location size; party sympathy; migration background (respondent, one parent or both parents). Additionally coded were: consecutive interview number; school education head group (low, medium, high); weighting factor.

  8. EmoVisual Data

    • kaggle.com
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Shah (2024). EmoVisual Data [Dataset]. https://www.kaggle.com/datasets/aryashah2k/emovisual-data/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arya Shah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Emo Visual Data

    Introduction

    This is an emoticon visual annotation data set, which collects 5329 emoticons and uses the glm-4v api and step-free-api projects to complete the visual annotation through multi-modal large models.

    Example:

    0f20b31d-e019-4565-9286-fdf29cc8e144.jpg

    Original 这个表情包中的内容和笑点在于它展示了一只卡通兔子,兔子的表情看起来既无奈又有些生气,配文是“活着已经够累了,上网你还要刁难我”。这句话以一种幽默的方式表达了许多人在上网时可能会遇到的挫折感或烦恼,尤其是当遇到困难或不顺心的事情时。这种对现代生活压力的轻松吐槽使得这个表情包在社交媒体上很受欢迎,人们用它来表达自己在网络世界中的疲惫感或面对困难时的幽默态度。

    Translated: The content and laughter of this emoticon package is that it shows a cartoon rabbit. The rabbit's expression looks helpless and a little angry. The caption is "I am tired of living, but you still make things difficult for me online." This quote expresses in a humorous way the frustration or annoyance that many people may experience when surfing the Internet, especially when something difficult or doesn't go their way. This lighthearted take on the pressures of modern life has made the meme popular on social media, where people use it to express their feelings of exhaustion in the online world or to use humor in the face of difficulties.

  9. F

    Canadian French Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Canadian French Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-french-canada
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French, Canada
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Canadian French Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of French speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native Canadian French speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native Canadian French speakers from our contributor community.
    Regions: Diverse provinces across Canada to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

    <b

  10. Global Blockchain AI Market Size By Technology (Computer Vision, Natural...

    • verifiedmarketresearch.com
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Blockchain AI Market Size By Technology (Computer Vision, Natural Language Processing, Machine Learning), By Deployment (Cloud, On-Premise), By Application (Smart Contracts, Governance, Logistics and Supply Chain Management, Payments & Settlements), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/blockchain-ai-market/
    Explore at:
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Blockchain AI Market size was valued at USD 448 Million in 2023 and is projected to reach USD 2730 Million by 2031, at a CAGR of 25.5% from 2024 to 2031.

    Global Blockchain AI Market Drivers

    The market drivers for the Blockchain AI Market can be influenced by various factors. These may include:

    Enhanced Data Security: By offering a decentralized and unchangeable record for information sharing and archiving, the combination of blockchain technology and artificial intelligence improves data security. Sensitive information is especially valuable in this secure infrastructure for supply chain management, banking, and healthcare. Increased Adoption of AI: As AI is used more and more in many industries, there is a greater need for blockchain-based solutions to deal with issues with data transparency and integrity. Blockchain technology ensures the quality and dependability of AI-powered services and apps by verifying the legitimacy of the data used to train AI algorithms. Growing worries About Data Privacy: Organizations are investigating blockchain AI solutions that provide more control over data access and usage due to growing worries about data privacy and ownership. Blockchain gives people control over their data while allowing AI algorithms to access it selectively for processing and analysis. Demand for Transparent and Reliable AI Systems: Companies and customers alike are looking for reliable and transparent AI systems that can shed light on the decision-making process. Blockchain technology makes it possible to transparently record the decisions and acts of AI algorithms, which promotes transparency and confidence in AI-powered systems. Decentralized AI Marketplaces Are Necessary: Blockchain technology is enabling the development of decentralized AI marketplaces, which are democratizing access to AI datasets and algorithms. These markets enable peer-to-peer exchanges and cooperation, enabling businesses and developers to profitably and effectively share AI resources. Regulatory Compliance Requirements: The adoption of blockchain AI solutions is being driven by regulatory mandates, such as the GDPR (General Data Protection Regulation) in Europe and HIPAA (Health Insurance Portability and Accountability Act) in the healthcare industry, to ensure compliance with data protection regulations. The transparent data governance offered by blockchain's immutability and auditability features facilitate regulatory compliance. Growing Interest in Federated Learning: Due to privacy concerns and data localization requirements, federated learning, a distributed machine learning approach, is gaining interest. It trains AI models across various decentralized devices. Blockchain technology guarantees data privacy, integrity, and incentive among participating nodes, which can enable safe and effective federated learning. Extension of DAOs and Smart Contracts: Automated and untrusted decision-making and agreement execution is made possible by the combination of AI systems with smart contracts and decentralized autonomous organizations (DAOs). Smart contracts built on the blockchain can carry out predetermined scenarios and transactions based on insights generated by artificial intelligence, simplifying corporate processes and lowering dependency on middlemen. The emergence of AI-driven token economies: is being fueled by the convergence of blockchain and AI technology. In these economies, tokens are utilized as incentives for sharing data, training models, and improving algorithms. These token economies ensure equitable reward for contributions while encouraging cooperation and creativity in AI research and development. Partnerships and Cross-Industry Collaboration: The adoption of blockchain AI solutions is being accelerated by partnerships and cross-industry collaboration among research institutions, industry consortia, and technology vendors. Inter-industry collaborations enable the sharing of knowledge, assets, and optimal methodologies, promoting the advancement of blockchain artificial intelligence solutions that are both interoperable and scalable.

  11. F

    Russian Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Russian Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-russian-russia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Russian Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Russian speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native Russian speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native Russian speakers from our contributor community.
    Regions: Diverse provinces across Russia to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

    <b style="font-weight:

  12. Data from: NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde (2023). NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection [Dataset]. http://doi.org/10.5281/zenodo.7931113
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NeSy4VRD

    NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.

    Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.

    The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.

    NeSy4VRD on Zenodo: the NeSy4VRD dataset package

    This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.

    The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.

    Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.

    The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.

    Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.

    All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.

    NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code

    The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:

    • comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the NeSy4VRD ontology, VRD-World, as well)
    • open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the NeSy4VRD ontology, VRD-World, and RDF knowledge graphs.

    The NeSy4VRD infrastructure supporting extensibility consists of:

    • open source Python code for conducting deep and comprehensive analyses of the NeSy4VRD dataset (the VRD images and their associated NeSy4VRD visual relationship annotations)
    • an open source, custom-designed NeSy4VRD protocol for specifying visual relationship annotation customisation instructions declaratively, in text files
    • an open source, custom-designed NeSy4VRD workflow, implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process.

    The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.

    The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.

    To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:

    • use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.);
    • use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images;
    • use the NeSy4VRD protocol to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>);
    • use the NeSy4VRD workflow to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images;
    • introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor);
    • continue experimenting, now with the added benefit of the additional stuff object class 'water';
    • contribute the enriched set of NeSy4VRD visual relationship

  13. F

    English Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The English General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world English usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level English conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native English speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    Words per Chat: 300–700
    Turns per Chat: Up to 50 dialogue turns
    Contributors: 200 native English speakers from the FutureBeeAI Crowd Community
    Format: TXT, DOCS, JSON or CSV (customizable)
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    Music, books, and movies
    Health and wellness
    Children and parenting
    Family life and relationships
    Food and cooking
    Education and studying
    Festivals and traditions
    Environment and daily life
    Internet and tech usage
    Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level English usage with:

    Colloquial expressions and local dialect influence
    Domain-relevant terminology
    Language-specific grammar, phrasing, and sentence flow
    Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    Participant Age
    Gender
    Country/Region
    Chat Domain
    Chat Topic
    Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    Manual review for content completeness
    Format checks for chat turns and metadata
    Linguistic verification by native speakers
    Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    Conversational AI / Chatbots
    Smart assistants and voicebots
    <div

  14. Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

    • datarade.ai
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Area covered
    United States
    Description

    Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

    Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

    API Features:

    • Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.
    • High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.
    • Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

    Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

    Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

    Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

    Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

    Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

    Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...

  15. f

    Description of multimodal dataset.

    • plos.figshare.com
    xls
    Updated Nov 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sobhana Jahan; Kazi Abu Taher; M. Shamim Kaiser; Mufti Mahmud; Md. Sazzadur Rahman; A. S. M. Sanwar Hosen; In-Ho Ra (2023). Description of multimodal dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0294253.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Sobhana Jahan; Kazi Abu Taher; M. Shamim Kaiser; Mufti Mahmud; Md. Sazzadur Rahman; A. S. M. Sanwar Hosen; In-Ho Ra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundAccording to the World Health Organization (WHO), dementia is the seventh leading reason of death among all illnesses and one of the leading causes of disability among the world’s elderly people. Day by day the number of Alzheimer’s patients is rising. Considering the increasing rate and the dangers, Alzheimer’s disease should be diagnosed carefully. Machine learning is a potential technique for Alzheimer’s diagnosis but general users do not trust machine learning models due to the black-box nature. Even, some of those models do not provide the best performance because of using only neuroimaging data.ObjectiveTo solve these issues, this paper proposes a novel explainable Alzheimer’s disease prediction model using a multimodal dataset. This approach performs a data-level fusion using clinical data, MRI segmentation data, and psychological data. However, currently, there is very little understanding of multimodal five-class classification of Alzheimer’s disease.MethodFor predicting five class classifications, 9 most popular Machine Learning models are used. These models are Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Multi-Layer Perceptron (MLP), K-Nearest Neighbor (KNN), Gradient Boosting (GB), Adaptive Boosting (AdaB), Support Vector Machine (SVM), and Naive Bayes (NB). Among these models RF has scored the highest value. Besides for explainability, SHapley Additive exPlanation (SHAP) is used in this research work.Results and conclusionsThe performance evaluation demonstrates that the RF classifier has a 10-fold cross-validation accuracy of 98.81% for predicting Alzheimer’s disease, cognitively normal, non-Alzheimer’s dementia, uncertain dementia, and others. In addition, the study utilized Explainable Artificial Intelligence based on the SHAP model and analyzed the causes of prediction. To the best of our knowledge, we are the first to present this multimodal (Clinical, Psychological, and MRI segmentation data) five-class classification of Alzheimer’s disease using Open Access Series of Imaging Studies (OASIS-3) dataset. Besides, a novel Alzheimer’s patient management architecture is also proposed in this work.

  16. F

    Mandarin Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mandarin Chinese Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Mandarin -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native Mandarin Chinese speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native Mandarin Chinese contributors from our verified pool.
    Regions: Covering multiple China provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train Mandarin speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px;

  17. A

    ‘World Happiness Report 2019’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘World Happiness Report 2019’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-world-happiness-report-2019-f29c/e8e08550/?iid=004-258&v=presentation
    Explore at:
    Dataset updated
    Nov 20, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘World Happiness Report 2019’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/PromptCloudHQ/world-happiness-report-2019 on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    The data has been released by SDSN and extracted by PromptCloud's custom web crawling solution.

    Context

    The World Happiness Report is a landmark survey of the state of global happiness that ranks 156 countries by how happy their citizens perceive themselves to be. This year’s World Happiness Report focuses on happiness and the community: how happiness has evolved over the past dozen years, with a focus on the technologies, social norms, conflicts and government policies that have driven those changes.

    Content

    What is Dystopia?

    Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive (or zero, in six instances) width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom, and least social support, it is referred to as “Dystopia,” in contrast to Utopia.

    What are the residuals?

    The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2016-2018 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.7 shows the average residual for each country if the equation in Table 2.1 is applied to average 2016- 2018 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.7, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.88 on the 0 to 10 scale. Table 7 of the online Statistical Appendix 1 for Chapter 2 puts the Dystopia plus residual block at the left side, and also draws the Dystopia line, making it easy to compare the signs and sizes of the residuals in different countries.

    Why do we use these six factors to explain life evaluations?

    The variables used reflect what has been broadly found in the research literature to be important in explaining national-level differences in life evaluations. Some important variables, such as unemployment or inequality, do not appear because comparable international data are not yet available for the full sample of countries. The variables are intended to illustrate important lines of correlation rather than to reflect clean causal estimates, since some of the data are drawn from the same survey sources, some are correlated with each other (or with other important factors for which we do not have measures), and in several instances there are likely to be two-way relations between life evaluations and the chosen variables (for example, healthy people are overall happier, but as Chapter 4 in the World Happiness Report 2013 demonstrated, happier people are overall healthier). In Statistical Appendix 1 of World Happiness Report 2018, we assessed the possible importance of using explanatory data from the same people whose life evaluations are being explained. We did this by randomly dividing the samples into two groups, and using the average values for .e.g. freedom gleaned from one group to explain the life evaluations of the other group. This lowered the effects, but only very slightly (e.g. 2% to 3%), assuring us that using data from the same individuals is not seriously affecting the results.

    Data source: http://worldhappiness.report/ed/2019/

    More such datasets can be downloaded from DataStock.

    --- Original source retains full ownership of the source dataset ---

  18. F

    South Asian Facial Images Dataset | Selfie & ID Card Images

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). South Asian Facial Images Dataset | Selfie & ID Card Images [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-selfie-id-south-asian
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    South Asia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the South Asian Human Facial Images Dataset, curated to advance facial recognition technology and support the development of secure biometric identity systems, KYC verification processes, and AI-driven computer vision applications. This dataset is designed to serve as a robust foundation for real-world face matching and recognition use cases.

    Facial Image Data

    The dataset contains over 8,000 facial image sets of South Asian individuals. Each set includes:

    Selfie Images: 5 high-quality selfie images taken under different conditions
    ID Card Images: 2 clear facial images extracted from different government-issued ID cards

    Diversity & Representation

    Geographic Diversity: Participants represent South Asian countries including India, Pakistan, Bangladesh, Nepal, Sri Lanka, Bhutan, Maldives, and more
    Demographics: Individuals aged 18 to 70 years with a 60:40 male-to-female ratio
    File Formats: Images are provided in JPEG and HEIC formats for compatibility and quality retention

    Image Quality & Capture Conditions

    All images were captured with real-world variability to enhance dataset robustness:

    Lighting: Captured under diverse lighting setups to simulate real environments
    Backgrounds: A wide variety of indoor and outdoor backgrounds
    Device Quality: Captured using modern smartphones to ensure high resolution and clarity

    Metadata

    Each participant’s data is accompanied by rich metadata to support AI model training, including:

    Unique participant ID
    Image file names
    Age at the time of capture
    Gender
    Country of origin
    Demographic details
    File format information

    This metadata enables targeted filtering and training across diverse scenarios.

    Use Cases & Applications

    This dataset is ideal for a wide range of AI and biometric applications:

    Facial Recognition: Train accurate and generalizable face matching models
    KYC & Identity Verification: Enhance onboarding and compliance systems in fintech and government services
    Biometric Identification: Build secure facial recognition systems for access control and identity authentication
    Age Prediction: Train models to estimate age from facial features
    Generative AI: Provide reference data for synthetic face generation or augmentation tasks

    Secure & Ethical Collection

    Data Security: All images were securely stored and processed on FutureBeeAI’s proprietary platform
    Ethical Compliance: Data collection was conducted in full alignment with privacy laws and ethical standards
    Informed Consent: Every participant provided written consent, with full awareness of the intended uses of the data

    Dataset Updates & Customization

    To meet evolving AI demands, this dataset is regularly updated and can be customized. Available options include:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  19. 📚 Student Performance Dataset 📚

    • kaggle.com
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2025). 📚 Student Performance Dataset 📚 [Dataset]. https://www.kaggle.com/datasets/waqi786/student-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Understanding student performance is key to improving education systems and learning outcomes. This synthetic dataset is designed to simulate real-world academic data, enabling researchers, educators, and data scientists to analyze factors influencing student achievement in a structured and ethical manner.

    With AI-generated records, this dataset provides insights into how demographic attributes, academic performance, and attendance patterns interact to shape student success.

    🔍 Key Features: ✔️ Demographics & Grade Levels – Understand how age, gender, and grade level influence academic outcomes ✔️ Subject-Specific Performance – Modeled Math, Reading, and Writing scores for detailed analysis ✔️ Attendance Records – Explore the correlation between school presence and academic success ✔️ Comprehensive Student Data – Synthetic records designed for educational research and machine learning applications

    📊 Dataset Overview: This dataset has been synthetically generated and does not contain real-world data. It is intended for educational purposes, machine learning practice, and exploratory data analysis related to student performance.

    📖 Columns Description: Student_ID – Unique identifier for each synthetic student Gender – Simulated gender representation Age – Modeled student age Grade_Level – Academic level of the student Math_Score, Reading_Score, Writing_Score – Simulated subject-wise scores Attendance – Modeled school attendance record ⚠️ Disclaimer: This dataset is completely synthetic and should not be used for real-world educational policy-making, student assessments, or institutional reporting. It serves as a safe, ethical resource for learning, research, and model development.

    🔹 Use this dataset to explore student performance trends, build predictive models, and gain insights into educational success factors! 🎯📊

  20. F

    Hindi Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Hindi healthcare communication and includes:

    Authentic Naming Patterns: Hindi personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Hindi formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Hindi-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Global impact of AI and big-data analytics on jobs 2023-2027 [Dataset]. https://www.statista.com/statistics/1383919/ai-bigdata-impact-jobs/
Organization logo

Global impact of AI and big-data analytics on jobs 2023-2027

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2022 - Feb 2023
Area covered
Worldwide
Description

Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.

Search
Clear search
Close search
Google apps
Main menu