100+ datasets found

g
Medical Data Collection
gts.ai
json
Updated Jul 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2022). Medical Data Collection [Dataset]. https://gts.ai/case-study/medical-data-collection/
Explore at:
jsonAvailable download formats
Dataset updated
Jul 19, 2022
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Guide to Medical Data Collection Key techniques, ethics, and tech advancements reshaping healthcare data management for improved care.
D
Data Collection And Labeling Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Collection And Labeling Report [Dataset]. https://www.datainsightsmarket.com/reports/data-collection-and-labeling-1415734
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Collection and Labeling market is experiencing robust growth, driven by the increasing demand for high-quality training data to fuel the advancements in artificial intelligence (AI) and machine learning (ML) technologies. The market's expansion is fueled by the burgeoning adoption of AI across diverse sectors, including healthcare, automotive, finance, and retail. Companies are increasingly recognizing the critical role of accurate and well-labeled data in developing effective AI models. This has led to a surge in outsourcing data collection and labeling tasks to specialized companies, contributing to the market's expansion. The market is segmented by data type (image, text, audio, video), labeling technique (supervised, unsupervised, semi-supervised), and industry vertical. We project a steady CAGR of 20% for the period 2025-2033, reflecting continued strong demand across various applications. Key trends include the increasing use of automation and AI-powered tools to streamline the data labeling process, resulting in higher efficiency and lower costs. The growing demand for synthetic data generation is also emerging as a significant trend, alleviating concerns about data privacy and scarcity. However, challenges remain, including data bias, ensuring data quality, and the high cost associated with manual labeling for complex datasets. These restraints are being addressed through technological innovations and improvements in data management practices. The competitive landscape is characterized by a mix of established players and emerging startups. Companies like Scale AI, Appen, and others are leading the market, offering comprehensive solutions that span data collection, annotation, and model validation. The presence of numerous companies suggests a fragmented yet dynamic market, with ongoing competition driving innovation and service enhancements. The geographical distribution of the market is expected to be broad, with North America and Europe currently holding significant market share, followed by Asia-Pacific showing robust growth potential. Future growth will depend on technological advancements, increasing investment in AI, and the emergence of new applications that rely on high-quality data.
TREC 2022 Deep Learning test collection
catalog.data.gov
gimi9.com
+1more
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Machine learning algorithm validation with a limited sample size
plos.figshare.com
text/x-python
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J. Casson (2023). Machine learning algorithm validation with a limited sample size [Dataset]. http://doi.org/10.1371/journal.pone.0224365
Explore at:
text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0224365
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J. Casson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we have investigated whether this bias could be caused by the use of validation methods which do not sufficiently control overfitting. Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size. We also show that feature selection if performed on pooled training and testing data is contributing to bias considerably more than parameter tuning. In addition, the contribution to bias by data dimensionality, hyper-parameter space and number of CV folds was explored, and validation methods were compared with discriminable data. The results suggest how to design robust testing methodologies when working with small datasets and how to interpret the results of other studies based on what validation method was used.
Speech Recognition Data Collection Services | 100+ Languages Resources...
datarade.ai
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Data | Machine Learning (ML) Data [Dataset]. https://datarade.ai/data-products/nexdata-speech-recognition-data-collection-services-100-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Dec 28, 2023
Dataset authored and provided by
Nexdata
Area covered
Cambodia, Estonia, Sri Lanka, Malaysia, Austria, Brazil, United Kingdom, Lithuania, Haiti, El Salvador
Description
Overview With extensive experience in speech recognition, Nexdata has resource pool covering more than 50 countries and regions. Our linguist team works closely with clients to assist them with dictionary and text corpus construction, speech quality inspection, linguistics consulting and etc.

Our Capacity -Global Resources: Global resources covering hundreds of languages worldwide

-Compliance: All the Machine Learning (ML) Data are collected with proper authorization -Quality: Multiple rounds of quality inspections ensures high quality data output

-Secure Implementation: NDA is signed to gurantee secure implementation and Machine Learning (ML) Data is destroyed upon delivery.

About Nexdata Nexdata is equipped with professional Speech Data collection devices, tools and environments, as well as experienced project managers in data collection and quality control, so that we can meet the data collection requirements in various scenarios and types. Please visit us at https://www.nexdata.ai/service/speech-recognition?source=Datarade
D
Data Collection And Labeling Report
datainsightsmarket.com
doc, pdf, ppt
Updated Nov 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Collection And Labeling Report [Dataset]. https://www.datainsightsmarket.com/reports/data-collection-and-labeling-1945059
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Nov 17, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Explore the booming data collection and labeling market, driven by AI advancements. Discover key growth drivers, market trends, and forecasts for 2025-2033, essential for AI development across IT, automotive, and healthcare.
d
A Dataset for Machine Learning Algorithm Development
catalog.data.gov
fisheries.noaa.gov
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2024). A Dataset for Machine Learning Algorithm Development [Dataset]. https://catalog.data.gov/dataset/a-dataset-for-machine-learning-algorithm-development2
Explore at:
Dataset updated
May 1, 2024
Dataset provided by
(Point of Contact, Custodian)
Description
This dataset consists of imagery, imagery footprints, associated ice seal detections and homography files associated with the KAMERA Test Flights conducted in 2019. This dataset was subset to include relevant data for detection algorithm development. This dataset is limited to data collected during flights 4, 5, 6 and 7 from our 2019 surveys.
Z
Data from: MLFMF: Data Sets for Machine Learning for Mathematical...
data.niaid.nih.gov
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo (2023). MLFMF: Data Sets for Machine Learning for Mathematical Formalization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10041074
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
University of Ljubljana
Institute of Mathematics, Physics, and Mechanics
Authors
Bauer, Andrej; Petković, Matej; Todorovski, Ljupčo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MLFMF MLFMF (Machine Learning for Mathematical Formalization) is a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. With more than 250,000 entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format. In addition to benchmarking the recommendation systems, the data sets can also be used for benchmarking node classification and link prediction algorithms. The four data sets Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes

the largest Lean 4 library Mathlib, the three largest Agda libraries:

the standard library the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the syntax trees give complete and easily parsed information about each entry. The Lean library data set was obtained by converting .olean files into s-expressions (see the lean2sexp tool). The Agda data sets were obtained with an s-expression extension of the official Agda repository (use either master-sexp or release-2.6.3-sexp branch). For more details, see our arXiv copy of the paper. Directory structure First, the mlfmf.zip archive needs to be unzipped. It contains a separate directory for every library (for example, the standard library of Agda can be found in the stdlib directory) and some auxiliary files. Every library directory contains

the network file from which the heterogeneous network can be loaded, a zip of the entries directory that contains (many) files with abstract syntax trees. Each of those files describes a single entry of the library. In addition to the auxiliary files which are used for loading the data (and described below), the zipped sources of lean2sexp and Agda s-expression extension are present. Loading the data In addition to the data files, there is also a simple python script main.py for loading the data. To run it, you will have to install the packages listed in the file requirements.txt: tqdm and networkx. The easiest way to do so is calling pip install -r requirements.txt. When running main.py for the first time, the script will unzip the entry files into the directory named entries. After that, the script loads the syntax trees of the entries (see the Entry class) and the network (as networkx.MultiDiGraph object). Note. The entry files have extension .dag (directed acyclic graph), since Lean uses node sharing, which breaks the tree structure (a shared node has more than one parent node). More information For more information about the data collection process, detailed data (and data format) description, and baseline experiments that were already performed with these data, see our arXiv copy of the paper. For the code that was used to perform the experiments and data format description, visit our github repository https://github.com/ul-fmf/mlfmf-data. Funding Since not all the funders are available in the Zenodo's database, we list them here:

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-21-1-0024. The authors also acknowledge the financial support of the Slovenian Research Agency via the research core funding No. P2-0103 and No. P1-0294.
Cone-Beam X-Ray CT Data Collection Designed for Machine Learning: Samples...
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henri Der Sarkissian; Felix Lucka; Felix Lucka; Maureen CWI van Eijnatten; Giulia Colacicco; Sophia Bethany Coban; K. Joost Batenburg; Henri Der Sarkissian; Maureen CWI van Eijnatten; Giulia Colacicco; Sophia Bethany Coban; K. Joost Batenburg (2020). Cone-Beam X-Ray CT Data Collection Designed for Machine Learning: Samples 17-24 [Dataset]. http://doi.org/10.5281/zenodo.2687387
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2687387
Dataset updated
Mar 12, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Henri Der Sarkissian; Felix Lucka; Felix Lucka; Maureen CWI van Eijnatten; Giulia Colacicco; Sophia Bethany Coban; K. Joost Batenburg; Henri Der Sarkissian; Maureen CWI van Eijnatten; Giulia Colacicco; Sophia Bethany Coban; K. Joost Batenburg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This upload contains samples 17 - 24 from the data collection described in

Henri Der Sarkissian, Felix Lucka, Maureen van Eijnatten, Giulia Colacicco, Sophia Bethany Coban, Kees Joost Batenburg, "A Cone-Beam X-Ray CT Data Collection Designed for Machine Learning", Sci Data 6, 215 (2019). https://doi.org/10.1038/s41597-019-0235-y or arXiv:1905.04787 (2019)

Abstract:
"Unlike previous works, this open data collection consists of X-ray cone-beam (CB) computed tomography (CT) datasets specifically designed for machine learning applications and high cone-angle artefact reduction: Forty-two walnuts were scanned with a laboratory X-ray setup to provide not only data from a single object but from a class of objects with natural variability. For each walnut, CB projections on three different orbits were acquired to provide CB data with different cone angles as well as being able to compute artefact-free, high-quality ground truth images from the combined data that can be used for supervised learning. We provide the complete image reconstruction pipeline: raw projection data, a description of the scanning geometry, pre-processing and reconstruction scripts using open software, and the reconstructed volumes. Due to this, the dataset can not only be used for high cone-angle artefact reduction but also for algorithm development and evaluation for other tasks, such as image reconstruction from limited or sparse-angle (low-dose) scanning, super resolution, or segmentation."

The scans are performed using a custom-built, highly flexible X-ray CT scanner, the FleX-ray scanner, developed by XRE nvand located in the FleX-ray Lab at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, Netherlands. The general purpose of the FleX-ray Lab is to conduct proof of concept experiments directly accessible to researchers in the field of mathematics and computer science. The scanner consists of a cone-beam microfocus X-ray point source that projects polychromatic X-rays onto a 1536-by-1944 pixels, 14-bit flat panel detector (Dexella 1512NDT) and a rotation stage in-between, upon which a sample is mounted. All three components are mounted on translation stages which allow them to move independently from one another.

Please refer to the paper for all further technical details.

The complete data set can be found via the following links: 1-8, 9-16, 17-24, 25-32, 33-37, 38-42

The corresponding Python scripts for loading, pre-processing and reconstructing the projection data in the way described in the paper can be found on github

For more information or guidance in using these dataset, please get in touch with

henri.dersarkissian [at] gmail.com

Felix.Lucka [at] cwi.nl
D
Data Collection and Labelling Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Collection and Labelling Report [Dataset]. https://www.marketresearchforecast.com/reports/data-collection-and-labelling-33030
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data collection and labeling market is experiencing robust growth, fueled by the escalating demand for high-quality training data in artificial intelligence (AI) and machine learning (ML) applications. The market, estimated at $15 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 25% over the forecast period (2025-2033), reaching approximately $75 billion by 2033. This expansion is primarily driven by the increasing adoption of AI across diverse sectors, including healthcare (medical image analysis, drug discovery), automotive (autonomous driving systems), finance (fraud detection, risk assessment), and retail (personalized recommendations, inventory management). The rising complexity of AI models and the need for more diverse and nuanced datasets are significant contributing factors to this growth. Furthermore, advancements in data annotation tools and techniques, such as active learning and synthetic data generation, are streamlining the data labeling process and making it more cost-effective. However, challenges remain. Data privacy concerns and regulations like GDPR necessitate robust data security measures, adding to the cost and complexity of data collection and labeling. The shortage of skilled data annotators also hinders market growth, necessitating investments in training and upskilling programs. Despite these restraints, the market’s inherent potential, coupled with ongoing technological advancements and increased industry investments, ensures sustained expansion in the coming years. Geographic distribution shows strong concentration in North America and Europe initially, but Asia-Pacific is poised for rapid growth due to increasing AI adoption and the availability of a large workforce. This makes strategic partnerships and global expansion crucial for market players aiming for long-term success.
f
Data from: Democratizing AI: Non-expert design of prediction tasks
figshare.com
txt
Updated Mar 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Bagrow (2020). Democratizing AI: Non-expert design of prediction tasks [Dataset]. http://doi.org/10.6084/m9.figshare.9468512.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9468512.v1
Dataset updated
Mar 10, 2020
Dataset provided by
figshare
Authors
James Bagrow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets supporting the paper, "Democratizing AI: Non-expert design of prediction tasks".
Speech Recognition Data Collection Services | 100+ Languages Resources...
data.nexdata.ai
Updated Aug 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). Speech Recognition Data Collection Services | 100+ Languages Resources |Audio Data | Speech Recognition Data | Machine Learning (ML) Data [Dataset]. https://data.nexdata.ai/products/nexdata-speech-recognition-data-collection-services-100-nexdata
Explore at:
Dataset updated
Aug 3, 2024
Dataset authored and provided by
Nexdata
Area covered
Finland, Jordan, Cambodia, Singapore, Azerbaijan, New Zealand, Lebanon, Luxembourg, Netherlands, Mongolia
Description
Nexdata is equipped with professional recording equipment and has resources pool of 70+ countries and regions, and provide various types of speech recognition data collection services for Machine Learning (ML) Data.
Machine Learning market size was USD 24,345.76 million in 2021!
cognitivemarketresearch.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research, Machine Learning market size was USD 24,345.76 million in 2021! [Dataset]. https://www.cognitivemarketresearch.com/machine-learning-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
As per Cognitive Market Research's latest published report, the Global Machine Learning market size was USD 24,345.76 million in 2021 and it is forecasted to reach USD 206,235.41 million by 2028. Machine Learning Industry's Compound Annual Growth Rate will be 42.64% from 2023 to 2030. Market Dynamics of Machine Learning Market

Key Drivers for Machine Learning Market

Explosion of Big Data Across Industries: The substantial increase in both structured and unstructured data generated by sensors, social media, transactions, and IoT devices is driving the demand for machine learning-based data analysis.

Widespread Adoption of AI in Business Processes: Machine learning is facilitating automation, predictive analytics, and optimization in various sectors such as healthcare, finance, manufacturing, and retail, thereby enhancing efficiency and outcomes.

Increased Availability of Open-Source Frameworks and Cloud Platforms: Resources like TensorFlow, PyTorch, and scalable cloud infrastructure are simplifying the process for developers and enterprises to create and implement machine learning models.

Growing Investments in AI-Driven Innovation: Governments, venture capitalists, and major technology companies are making substantial investments in machine learning research and startups, which is accelerating progress and market entry.

Key Restraints for Machine Learning Market

Shortage of Skilled Talent in ML and AI: The need for data scientists, machine learning engineers, and domain specialists significantly surpasses the available supply, hindering scalability and implementation in numerous organizations.

High Computational and Operational Costs: The training of intricate machine learning models necessitates considerable computing power, energy, and infrastructure, resulting in high costs for startups and smaller enterprises.

Data Privacy and Regulatory Compliance Challenges: Issues related to user privacy, data breaches, and adherence to regulations such as GDPR and HIPAA present obstacles in the collection and utilization of data for machine learning.

Lack of Model Transparency and Explainability: The opaque nature of certain machine learning models undermines trust, particularly in sensitive areas like finance and healthcare, where the need for explainable AI is paramount.

Key Trends for Machine Learning Market

Growth of AutoML and No-Code ML Platforms: Automated machine learning tools are making AI development more accessible, enabling individuals without extensive coding or mathematical expertise to construct models.

Integration of ML with Edge Computing: Executing machine learning models locally on edge devices (such as cameras and smartphones) is enhancing real-time performance and minimizing latency in applications.

Ethical AI and Responsible Machine Learning Practices: Increasing emphasis on fairness, bias reduction, and accountability is shaping ethical frameworks and governance in ML adoption.

Industry-Specific ML Applications on the Rise: Custom ML solutions are rapidly emerging in sectors like agriculture (crop prediction), logistics (route optimization), and education (personalized learning).

COVID-19 Impact:

Similar to other industries, the covid-19 situation has affected the machine learning industry. Despite the dire conditions and uncertain collapse, some industries have continued to grow during the pandemic. During covid 19, the machine learning market remains stable with positive growth and opportunities. The global machine learning market faces minimal impact compared to some other industries.The growth of the global machine learning market has stagnated owing to automation developments and technological advancements. Pre-owned machines and smartphones widely used for remote work are leading to positive growth of the market. Several industries have transplanted the market progress using new technologies of machine learning systems. June 2020, DeCaprio et al. Published COVID-19 pandemic risk research is still in its early stages. In the report, DeCaprio et al. mentions that it has used machine learning to build an initial vulnerability index for the coronavirus. The lab further noted that as more data and results from ongoing research become available, it will be able to see more practical applications of machine learning in predicting infection risk. What is&nbs...
Filtering Process for Records in AI Bibliometric Analysis
figshare.com
xlsx
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Duarte (2025). Filtering Process for Records in AI Bibliometric Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.28924640.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28924640.v1
Dataset updated
May 2, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Verónica Duarte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes the filtering steps and final record counts used in the bibliometric analysis of AI technologies applied to biomedical research, disaggregated by disease and technology.
D
Data Annotation and Collection Services Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Annotation and Collection Services Report [Dataset]. https://www.marketresearchforecast.com/reports/data-annotation-and-collection-services-30704
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The booming Data Annotation & Collection Services market is projected to reach $75 Billion by 2033, fueled by AI adoption in autonomous driving, healthcare, and finance. Explore market trends, key players (Appen, Amazon, Google), and regional growth in this comprehensive analysis.
m
MAAD : Multi-Label Arabic Articles Dataset
data.mendeley.com
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marwah Yahya Al-Nahari (2025). MAAD : Multi-Label Arabic Articles Dataset [Dataset]. http://doi.org/10.17632/hbfc9j8hj8.2
Explore at:
Unique identifier
https://doi.org/10.17632/hbfc9j8hj8.2
Dataset updated
Oct 27, 2025
Authors
Marwah Yahya Al-Nahari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MAAD dataset represents a comprehensive collection of Arabic news articles that may be employed across a diverse array of Arabic Natural Language Processing (NLP) applications, including but not limited to classification, text generation, summarization, and various other tasks. The dataset was diligently assembled through the application of specifically designed Python scripts that targeted six prominent news platforms: Al Jazeera, BBC Arabic, Youm7, Russia Today, and Al Ummah, in conjunction with regional and local media outlets, ultimately resulting in a total of 602,792 articles. This dataset exhibits a total word count of 29,371,439, with the number of unique words totaling 296,518; the average word length has been determined to be 6.36 characters, while the mean article length is calculated at 736.09 characters. This extensive dataset is categorized into ten distinct classifications: Political, Economic, Cultural, Arts, Sports, Health, Technology, Community, Incidents, and Local. The data fields are categorized into five distinct types: Title, Article, Summary, Category, and Published_ Date. The MAAD dataset is structured into six files, each named after the corresponding news outlets from which the data was sourced; within each directory, text files are provided, containing the number of categories represented in a single file, formatted in txt to accommodate all news articles. This dataset serves as an expansive standard resource designed for utilization within the context of our research endeavors.
A
AI Training Data Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). AI Training Data Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-data-1500199
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Aug 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI Training Data market is booming, projected to reach $89.4 Billion by 2033, with a CAGR of 25%. This comprehensive analysis explores market drivers, trends, restraints, key players (Google, Amazon, Microsoft), and regional breakdowns. Discover the future of AI data and its impact on various industries.
D
Data Annotation and Collection Services Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Annotation and Collection Services Report [Dataset]. https://www.marketresearchforecast.com/reports/data-annotation-and-collection-services-30703
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 9, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Annotation and Collection Services market is booming, projected to reach $45 billion by 2033, driven by AI and ML adoption. Explore key market trends, segments (image, text, video annotation), leading companies, and regional growth in this comprehensive analysis.
D
Data Collection Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Collection Software Report [Dataset]. https://www.datainsightsmarket.com/reports/data-collection-software-1941257
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jun 10, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the booming Data Collection Software market! Our analysis reveals a $15 billion market in 2025, projected to reach $45 billion by 2033, driven by cloud adoption, AI, and regulatory compliance. Explore key trends, leading companies (Logikcull, AmoCRM, Tableau, etc.), and regional growth forecasts.
Nexdata | Person Multi-modal Collection Data | 10,000 Hours
datarade.ai
Updated Nov 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Person Multi-modal Collection Data | 10,000 Hours [Dataset]. https://datarade.ai/data-products/nexdata-person-multi-modal-collection-data-10-000-hours-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 14, 2025
Dataset authored and provided by
Nexdata
Area covered
Hungary, Belgium, United Republic of, Tunisia, Oman, Uzbekistan, Norway, Albania, Austria, Spain
Description
Data size: 10,000 Hours，including high quality data 1000h (Chinese and English)

Data format: The video format is commonly used formats such as MP4, and the annotation format is json

Data type: human video

Data Format: The image data format is commonly used formats such as. jpg, the video format is commonly used formats such as MP4, and the annotation format is json