Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2
Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
This dataset presents a comprehensive taxonomy of 3307 values expressed by Claude (an AI assistant) across hundreds of thousands of real-world conversations. Using a novel privacy-preserving methodology, these values were extracted and classified without human reviewers accessing any conversation content. The dataset reveals patterns in how AI systems express values "in the wild" when interacting with diverse users and tasks. We're releasing this resource to advance research… See the full description on the dataset page: https://huggingface.co/datasets/Anthropic/values-in-the-wild.
Success.ai’s Commercial Real Estate Data and B2B Contact Data for Global Real Estate Professionals is a comprehensive dataset designed to connect businesses with industry leaders in real estate worldwide. With over 170M verified profiles, including work emails and direct phone numbers, this solution ensures precise outreach to agents, brokers, property developers, and key decision-makers in the real estate sector.
Utilizing advanced AI-driven validation, our data is continuously updated to maintain 99% accuracy, offering actionable insights that empower targeted marketing, streamlined sales strategies, and efficient recruitment efforts. Whether you’re engaging with top real estate executives or sourcing local property experts, Success.ai provides reliable and compliant data tailored to your needs.
Key Features of Success.ai’s Real Estate Professional Contact Data
AI-Powered Validation: All profiles are verified using cutting-edge AI to ensure up-to-date accuracy. Real-Time Updates: Our database is refreshed continuously to reflect the most current information. Global Compliance: Fully aligned with GDPR, CCPA, and other regional regulations for ethical data use.
API Integration: Directly integrate data into your CRM or project management systems for seamless workflows. Custom Flat Files: Receive detailed datasets customized to your specifications, ready for immediate application.
Why Choose Success.ai for Real Estate Contact Data?
Best Price Guarantee Enjoy competitive pricing that delivers exceptional value for verified, comprehensive contact data.
Precision Targeting for Real Estate Professionals Our dataset equips you to connect directly with real estate decision-makers, minimizing misdirected efforts and improving ROI.
Strategic Use Cases
Lead Generation: Target qualified real estate agents and brokers to expand your network. Sales Outreach: Engage with property developers and executives to close high-value deals. Marketing Campaigns: Drive targeted campaigns tailored to real estate markets and demographics. Recruitment: Identify and attract top talent in real estate for your growing team. Market Research: Access firmographic and demographic data for in-depth industry analysis.
Data Highlights 170M+ Verified Professional Profiles 50M Work Emails 30M Company Profiles 700M Global Professional Profiles
Powerful APIs for Enhanced Functionality
Enrichment API Ensure your contact database remains relevant and up-to-date with real-time enrichment. Ideal for businesses seeking to maintain competitive agility in dynamic markets.
Lead Generation API Boost your lead generation with verified contact details for real estate professionals, supporting up to 860,000 API calls per day for robust scalability.
Targeted Outreach for New Projects Connect with property developers and brokers to pitch your services or collaborate on upcoming projects.
Real Estate Marketing Campaigns Execute personalized marketing campaigns targeting agents and clients in residential, commercial, or industrial sectors.
Enhanced Sales Strategies Shorten sales cycles by directly engaging with decision-makers and key stakeholders.
Recruitment and Talent Acquisition Access profiles of highly skilled professionals to strengthen your real estate team.
Market Analysis and Intelligence Leverage firmographic and demographic insights to identify trends and optimize business strategies.
Success.ai’s B2B Contact Data for Global Real Estate Professionals delivers the tools you need to connect with the right people at the right time, driving efficiency and success in your business operations. From agents and brokers to property developers and executiv...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In October and November 2023, researchers at the Schwartz Reisman Institute for Technology and Society and the Policy, Elections and Representation Lab at the Munk School of Global Affairs and Public Policy at the University of Toronto completed a survey on public perceptions of and attitudes toward AI. The survey was administered to over 1,000 people in each of 21 countries, for a total of 23,882 surveys conducted in 12 languages. The combined populations of the countries sampled represent a majority of the world's population. Countries: Argentina, Australia, Brazil, Canada, Chile, China, France, Germany, India, Indonesia, Italy, Japan, Kenya, Mexico, Pakistan, Poland, Portugal, South Africa, Spain, United Kingdom, United States of America Languages: Chinese (Simplified), English, French, German, Indonesian, Italian, Japanese, Polish, Portuguese (Portugal), Portuguese (Brazil), Spanish (Spain), Spanish (Latin America). The survey explored general knowledge of and attitudes toward AI. Topics included concerns about AI, safety, regulation, autonomous vehicles and AI's effect on jobs now and in the future. Participants were asked whether they are interested in or trust applications of AI for clothes, travel, grocery shopping, dating or finance. Respondents were asked about their attitudes toward the use of emerging technologies in education, the justice system, health care and immigration. Respondents were also asked about their knowledge of and experience with ChatGPT and deepfakes.
The study on the future of work was conducted by Kantar Public on behalf of the Press and Information Office of the Federal Government. During the survey period from 13 to 22 June 2023, German-speaking people aged 16 to 67 in Germany, excluding pensioners, were surveyed in online interviews (CAWI) on the following topics: current life and work situation, future expectations, the use of AI and the digitalization of the world of work as well as attitudes towards demographic change and the shortage of skilled workers. The respondents were selected using a quota sample from an online access panel. Future: general life satisfaction; satisfaction with selected aspects of life (working conditions, education, qualifications, health situation, professional remuneration, family situation, financial situation); expectations for the future: rather confident vs. rather worried about the private and professional future; rather confident vs. rather worried about the professional future of younger people or the next generation; rather confident vs. rather worried about the future of Germany; confidence vs. concern regarding the competitiveness of the German economy in various areas (digitalization and automation of the working world, climate protection goals of industry, effects of the Ukraine war on the German economy, access to important raw materials such as rare earths or metals, reliable supply of energy, number of qualified specialists, general price development, development of wages and salaries, development of pensions); probability of various future scenarios for Germany in 2030 (Germany is once again the world export champion, unemployment is at an all-time low - full employment prevails in Germany, the energy transition has already created hundreds of thousands of new jobs in German industry, Germany has emerged the strongest in the EU from the crises of the last 15 years, the price crisis has led to the fact The price crisis has meant that politics and business have successfully set the course for the future, citizens can deal with all official matters digitally from home, German industry is much faster than expected in terms of climate targets and is already almost climate-neutral, Germany is the most popular country of immigration for foreign university graduates, the nursing shortage in Germany has been overcome thanks to the immigration of skilled workers). 2. Importance of work: importance of different areas of life (ranking); work to earn money vs. as a vocation; importance of different work characteristics (e.g. job security, adequate income, development prospects and career opportunities, etc.). 3. Professional situation: satisfaction with various aspects of work (job security, pay/income, development/career opportunities, interesting work, sufficient contact with other people, compatibility of family/private life and work. Work climate/ working atmosphere, further training opportunities, social recognition, meaningful and useful work); job satisfaction; expected development of working conditions in own professional field; recognition for own work from the company/ employer, from colleagues, from other people from the work context, from the personal private environment, from society in general and from politics; unemployed people were asked: currently looking for a new job; assessment of chances of finding a new job; pupils, students and trainees were asked: assessment of future career opportunities; reasons for assessing career opportunities as poor (open). 4. AI: use of artificial intelligence (AI) in the world of work rather as an opportunity or rather as a danger; expected effects of AI on working conditions in their own professional field (improvement, deterioration, no effects); opportunities and dangers of digitization, AI and automation based on comparisons (all in all, digitization leads to a greater burden on the environment, as computers, tablets, smartphones and data centers are major power guzzlers vs. All in all, digitalization protects the environment through less mobility and more efficient management, artificial intelligence and digitalization help to reduce the workload and relieve employees of repetitive and monotonous tasks vs. artificial intelligence and digitalization overburden many employees through further work intensification. Stress and burnouts will increasingly be the result, artificial intelligence and digitalization will primarily lead to job losses vs. artificial intelligence and digitalization will create more new, future-proof jobs than old ones will be lost, our economy will benefit greatly from global networking through speed and efficiency gains vs. our economy is threatened by global networking by becoming more susceptible to cyberattacks and hacker attacks, digitalization will lead to new, more flexible working time models and a better work-life balance vs. digitalization will lead to a blurring of boundaries between work and leisure time and thus, above all, to more self-exploitation by employees). 5. Home office: local focus of own work currently, before the corona pandemic and during the corona pandemic (exclusively/ predominantly in the company or from home, at changing work locations (company, at home, mobile from on the road); Agreement with various statements on the topic of working from home (wherever possible, employers should give their employees the opportunity to work from home, working from home leads to a loss of cohesion in the company, working from home enables a better work-life balance, digital communication makes coordination processes more complicated, home office makes an important contribution to climate protection due to fewer journeys to work, home office leads to a mixture of work and leisure time and thus to a greater workload, home office leads to greater job satisfaction and thus to higher productivity, since many professions cannot be carried out in the home office, it would be fairer if everyone had to work outside the home); attitude towards a general 4-day working week (A four-day week for everyone would increase the shortage of skilled workers vs. a four-day week for everyone would increase motivation and therefore productivity). 6. Demographic change: knowledge of the meaning of the term demographic change; expected impact of demographic change on the future of Germany; opinion on the future in Germany based on alternative future scenarios (in the future, poverty in old age will increase noticeably vs. the future generation of pensioners will be wealthier than ever before, in the future, politics and elections will be increasingly determined by older people vs. the influence of the younger generation on politics will become much more important, our social security systems will continue to ensure intergenerational fairness and equalization in the future vs. the distribution conflicts between the younger and older generations will increase noticeably, future generations will have to work longer due to the shortage of skilled workers vs. people will have to work less in the future due to digitalization and automation and will be able to retire earlier). 7. Shortage of skilled workers: shortage of skilled workers in own company; additional personal burden due to shortage of skilled workers; company is doing enough to counteract the shortage of skilled workers; use of artificial intelligence (AI) in the company could compensate for the shortage of skilled workers; evaluation of various measures taken by the federal government to combat the shortage of skilled workers (improvement of training and further education opportunities, increasing the participation of women in the labor market (e.g. by expanding childcare services, more flexible working hours, offers for older skilled workers to stay in work longer, facilitating the immigration of foreign skilled workers); evaluation of the work of the federal government to combat the shortage of skilled workers; attractiveness (reputation in society) of various professions with a shortage of skilled workers (e.g. social pedagogues/educators); evaluation of the work of the federal government to combat the shortage of skilled workers. B. social pedagogue, nursery school teacher, etc.); job recommendation for younger people; own activity in one of the professions mentioned with a shortage of skilled workers. Demography: sex; age; age in age groups; employment; federal state; region west/east; school education; vocational training; self-placement social class; employment status; occupation differentiated workers, employees, civil servants; industry; household size; number of children under 18 in the household; net household income (grouped); location size; party sympathy; migration background (respondent, one parent or both parents). Additionally coded were: consecutive interview number; school education head group (low, medium, high); weighting factor.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an emoticon visual annotation data set, which collects 5329 emoticons and uses the glm-4v api and step-free-api projects to complete the visual annotation through multi-modal large models.
Example:
0f20b31d-e019-4565-9286-fdf29cc8e144.jpg
Original 这个表情包中的内容和笑点在于它展示了一只卡通兔子,兔子的表情看起来既无奈又有些生气,配文是“活着已经够累了,上网你还要刁难我”。这句话以一种幽默的方式表达了许多人在上网时可能会遇到的挫折感或烦恼,尤其是当遇到困难或不顺心的事情时。这种对现代生活压力的轻松吐槽使得这个表情包在社交媒体上很受欢迎,人们用它来表达自己在网络世界中的疲惫感或面对困难时的幽默态度。
Translated: The content and laughter of this emoticon package is that it shows a cartoon rabbit. The rabbit's expression looks helpless and a little angry. The caption is "I am tired of living, but you still make things difficult for me online." This quote expresses in a humorous way the frustration or annoyance that many people may experience when surfing the Internet, especially when something difficult or doesn't go their way. This lighthearted take on the pressures of modern life has made the meme popular on social media, where people use it to express their feelings of exhaustion in the online world or to use humor in the face of difficulties.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Canadian French Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of French speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Canadian French speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Blockchain AI Market size was valued at USD 448 Million in 2023 and is projected to reach USD 2730 Million by 2031, at a CAGR of 25.5% from 2024 to 2031.
Global Blockchain AI Market Drivers
The market drivers for the Blockchain AI Market can be influenced by various factors. These may include:
Enhanced Data Security: By offering a decentralized and unchangeable record for information sharing and archiving, the combination of blockchain technology and artificial intelligence improves data security. Sensitive information is especially valuable in this secure infrastructure for supply chain management, banking, and healthcare. Increased Adoption of AI: As AI is used more and more in many industries, there is a greater need for blockchain-based solutions to deal with issues with data transparency and integrity. Blockchain technology ensures the quality and dependability of AI-powered services and apps by verifying the legitimacy of the data used to train AI algorithms. Growing worries About Data Privacy: Organizations are investigating blockchain AI solutions that provide more control over data access and usage due to growing worries about data privacy and ownership. Blockchain gives people control over their data while allowing AI algorithms to access it selectively for processing and analysis. Demand for Transparent and Reliable AI Systems: Companies and customers alike are looking for reliable and transparent AI systems that can shed light on the decision-making process. Blockchain technology makes it possible to transparently record the decisions and acts of AI algorithms, which promotes transparency and confidence in AI-powered systems. Decentralized AI Marketplaces Are Necessary: Blockchain technology is enabling the development of decentralized AI marketplaces, which are democratizing access to AI datasets and algorithms. These markets enable peer-to-peer exchanges and cooperation, enabling businesses and developers to profitably and effectively share AI resources. Regulatory Compliance Requirements: The adoption of blockchain AI solutions is being driven by regulatory mandates, such as the GDPR (General Data Protection Regulation) in Europe and HIPAA (Health Insurance Portability and Accountability Act) in the healthcare industry, to ensure compliance with data protection regulations. The transparent data governance offered by blockchain's immutability and auditability features facilitate regulatory compliance. Growing Interest in Federated Learning: Due to privacy concerns and data localization requirements, federated learning, a distributed machine learning approach, is gaining interest. It trains AI models across various decentralized devices. Blockchain technology guarantees data privacy, integrity, and incentive among participating nodes, which can enable safe and effective federated learning. Extension of DAOs and Smart Contracts: Automated and untrusted decision-making and agreement execution is made possible by the combination of AI systems with smart contracts and decentralized autonomous organizations (DAOs). Smart contracts built on the blockchain can carry out predetermined scenarios and transactions based on insights generated by artificial intelligence, simplifying corporate processes and lowering dependency on middlemen. The emergence of AI-driven token economies: is being fueled by the convergence of blockchain and AI technology. In these economies, tokens are utilized as incentives for sharing data, training models, and improving algorithms. These token economies ensure equitable reward for contributions while encouraging cooperation and creativity in AI research and development. Partnerships and Cross-Industry Collaboration: The adoption of blockchain AI solutions is being accelerated by partnerships and cross-industry collaboration among research institutions, industry consortia, and technology vendors. Inter-industry collaborations enable the sharing of knowledge, assets, and optimal methodologies, promoting the advancement of blockchain artificial intelligence solutions that are both interoperable and scalable.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Russian Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Russian speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Russian speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NeSy4VRD
NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.
Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.
The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.
NeSy4VRD on Zenodo: the NeSy4VRD dataset package
This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.
The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.
Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.
The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.
Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.
All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.
NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code
The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:
The NeSy4VRD infrastructure supporting extensibility consists of:
The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.
The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.
To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The English General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world English usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level English conversations covering a broad spectrum of everyday topics.
This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native English speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.
Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:
This diversity ensures the dataset is useful across multiple NLP and language understanding applications.
Chats reflect informal, native-level English usage with:
Every chat instance is accompanied by structured metadata, which includes:
This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.
All chat records pass through a rigorous QA process to maintain consistency and accuracy:
This ensures a clean, reliable dataset ready for high-performance AI model training.
This dataset is ideal for training and evaluating a wide range of text-based AI systems:
Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.
Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.
API Features:
Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.
Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.
Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.
Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.
Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.
Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAccording to the World Health Organization (WHO), dementia is the seventh leading reason of death among all illnesses and one of the leading causes of disability among the world’s elderly people. Day by day the number of Alzheimer’s patients is rising. Considering the increasing rate and the dangers, Alzheimer’s disease should be diagnosed carefully. Machine learning is a potential technique for Alzheimer’s diagnosis but general users do not trust machine learning models due to the black-box nature. Even, some of those models do not provide the best performance because of using only neuroimaging data.ObjectiveTo solve these issues, this paper proposes a novel explainable Alzheimer’s disease prediction model using a multimodal dataset. This approach performs a data-level fusion using clinical data, MRI segmentation data, and psychological data. However, currently, there is very little understanding of multimodal five-class classification of Alzheimer’s disease.MethodFor predicting five class classifications, 9 most popular Machine Learning models are used. These models are Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Multi-Layer Perceptron (MLP), K-Nearest Neighbor (KNN), Gradient Boosting (GB), Adaptive Boosting (AdaB), Support Vector Machine (SVM), and Naive Bayes (NB). Among these models RF has scored the highest value. Besides for explainability, SHapley Additive exPlanation (SHAP) is used in this research work.Results and conclusionsThe performance evaluation demonstrates that the RF classifier has a 10-fold cross-validation accuracy of 98.81% for predicting Alzheimer’s disease, cognitively normal, non-Alzheimer’s dementia, uncertain dementia, and others. In addition, the study utilized Explainable Artificial Intelligence based on the SHAP model and analyzed the causes of prediction. To the best of our knowledge, we are the first to present this multimodal (Clinical, Psychological, and MRI segmentation data) five-class classification of Alzheimer’s disease using Open Access Series of Imaging Studies (OASIS-3) dataset. Besides, a novel Alzheimer’s patient management architecture is also proposed in this work.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Mandarin Chinese Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Mandarin -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Mandarin Chinese speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘World Happiness Report 2019’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/PromptCloudHQ/world-happiness-report-2019 on 30 September 2021.
--- Dataset description provided by original source is as follows ---
The data has been released by SDSN and extracted by PromptCloud's custom web crawling solution.
The World Happiness Report is a landmark survey of the state of global happiness that ranks 156 countries by how happy their citizens perceive themselves to be. This year’s World Happiness Report focuses on happiness and the community: how happiness has evolved over the past dozen years, with a focus on the technologies, social norms, conflicts and government policies that have driven those changes.
What is Dystopia?
Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive (or zero, in six instances) width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom, and least social support, it is referred to as “Dystopia,” in contrast to Utopia.
What are the residuals?
The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2016-2018 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.7 shows the average residual for each country if the equation in Table 2.1 is applied to average 2016- 2018 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.7, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.88 on the 0 to 10 scale. Table 7 of the online Statistical Appendix 1 for Chapter 2 puts the Dystopia plus residual block at the left side, and also draws the Dystopia line, making it easy to compare the signs and sizes of the residuals in different countries.
Why do we use these six factors to explain life evaluations?
The variables used reflect what has been broadly found in the research literature to be important in explaining national-level differences in life evaluations. Some important variables, such as unemployment or inequality, do not appear because comparable international data are not yet available for the full sample of countries. The variables are intended to illustrate important lines of correlation rather than to reflect clean causal estimates, since some of the data are drawn from the same survey sources, some are correlated with each other (or with other important factors for which we do not have measures), and in several instances there are likely to be two-way relations between life evaluations and the chosen variables (for example, healthy people are overall happier, but as Chapter 4 in the World Happiness Report 2013 demonstrated, happier people are overall healthier). In Statistical Appendix 1 of World Happiness Report 2018, we assessed the possible importance of using explanatory data from the same people whose life evaluations are being explained. We did this by randomly dividing the samples into two groups, and using the average values for .e.g. freedom gleaned from one group to explain the life evaluations of the other group. This lowered the effects, but only very slightly (e.g. 2% to 3%), assuring us that using data from the same individuals is not seriously affecting the results.
Data source: http://worldhappiness.report/ed/2019/
More such datasets can be downloaded from DataStock.
--- Original source retains full ownership of the source dataset ---
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the South Asian Human Facial Images Dataset, curated to advance facial recognition technology and support the development of secure biometric identity systems, KYC verification processes, and AI-driven computer vision applications. This dataset is designed to serve as a robust foundation for real-world face matching and recognition use cases.
The dataset contains over 8,000 facial image sets of South Asian individuals. Each set includes:
All images were captured with real-world variability to enhance dataset robustness:
Each participant’s data is accompanied by rich metadata to support AI model training, including:
This metadata enables targeted filtering and training across diverse scenarios.
This dataset is ideal for a wide range of AI and biometric applications:
To meet evolving AI demands, this dataset is regularly updated and can be customized. Available options include:
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Understanding student performance is key to improving education systems and learning outcomes. This synthetic dataset is designed to simulate real-world academic data, enabling researchers, educators, and data scientists to analyze factors influencing student achievement in a structured and ethical manner.
With AI-generated records, this dataset provides insights into how demographic attributes, academic performance, and attendance patterns interact to shape student success.
🔍 Key Features: ✔️ Demographics & Grade Levels – Understand how age, gender, and grade level influence academic outcomes ✔️ Subject-Specific Performance – Modeled Math, Reading, and Writing scores for detailed analysis ✔️ Attendance Records – Explore the correlation between school presence and academic success ✔️ Comprehensive Student Data – Synthetic records designed for educational research and machine learning applications
📊 Dataset Overview: This dataset has been synthetically generated and does not contain real-world data. It is intended for educational purposes, machine learning practice, and exploratory data analysis related to student performance.
📖 Columns Description: Student_ID – Unique identifier for each synthetic student Gender – Simulated gender representation Age – Modeled student age Grade_Level – Academic level of the student Math_Score, Reading_Score, Writing_Score – Simulated subject-wise scores Attendance – Modeled school attendance record ⚠️ Disclaimer: This dataset is completely synthetic and should not be used for real-world educational policy-making, student assessments, or institutional reporting. It serves as a safe, ethical resource for learning, research, and model development.
🔹 Use this dataset to explore student performance trends, build predictive models, and gain insights into educational success factors! 🎯📊
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
This dataset reflects the natural flow of Hindi healthcare communication and includes:
These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversations range from simple inquiries to complex advisory sessions, including:
Each conversation typically includes these structural components:
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Available in JSON, CSV, and TXT formats, each conversation includes:
Between 2023 and 2027, the majority of companies surveyed worldwide expect big data to have a more positive than negative impact on the global job market and employment, with ** percent of the companies reporting the technology will create jobs and * percent expecting the technology to displace jobs. Meanwhile, artificial intelligence (AI) is expected to result in more significant labor market disruptions, with ** percent of organizations expecting the technology to displace jobs and ** percent expecting AI to create jobs.