100+ datasets found
  1. i

    Dataset for the manuscript of Analysis on constructing the training data to...

    • ieee-dataport.org
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dianxin Luan (2024). Dataset for the manuscript of Analysis on constructing the training data to train neural networks for channel estimation [Dataset]. https://ieee-dataport.org/documents/dataset-manuscript-analysis-constructing-training-data-train-neural-networks-channel
    Explore at:
    Dataset updated
    Jun 20, 2024
    Authors
    Dianxin Luan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    but its feasibility is challenged by the tremendous computational resources required.

  2. Data sources used by companies for training AI models South Korea 2024

    • statista.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data sources used by companies for training AI models South Korea 2024 [Dataset]. https://www.statista.com/statistics/1452822/south-korea-data-sources-for-training-artificial-intelligence-models/
    Explore at:
    Dataset updated
    May 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2024 - Nov 2024
    Area covered
    South Korea
    Description

    As of 2024, customer data was the leading source of information used to train artificial intelligence (AI) models in South Korea, with nearly ** percent of surveyed companies answering that way. About ** percent responded to use public sector support initiatives.

  3. m

    FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM)...

    • filemarket.mydatastorefront.com
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data | [Dataset]. https://filemarket.mydatastorefront.com/products/filemarket-ai-training-data-large-language-model-llm-data-filemarket
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    El Salvador, Norway, Poland, China, Greenland, Antigua and Barbuda, Saint Helena, Saint Kitts and Nevis, Bahrain, Ghana
    Description

    Enhance your LLMs with our comprehensive and diverse large language model data sets, designed for optimal training and performance.

  4. d

    Training data from SPCAM for machine learning in moist physics

    • datadryad.org
    • explore.openaire.eu
    zip
    Updated Aug 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guang Zhang; Yilun Han; Xiaomeng Huang; Yong Wang (2020). Training data from SPCAM for machine learning in moist physics [Dataset]. http://doi.org/10.6075/J0CZ35PP
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 7, 2020
    Dataset provided by
    Dryad
    Authors
    Guang Zhang; Yilun Han; Xiaomeng Huang; Yong Wang
    Time period covered
    2020
    Description

    The training samples of the entire year (from yr-2 of simulation) are compressed in SPCAM_ML_Han_et_al_0.tar.gz, and testing samples of the entire year (from yr-3 of simulation) are compressed in SPCAM_ML_Han_et_al_1.tar.gz. In each dataset, there are a data documentation file and 365 netCDF data files (one file for each day) that are marked by its date. The variable fields contain temperature and moisture tendencies and cloud water and cloud ice from the CRM, and vertical profiles of temperature and moisture and large-scale temperature and moisture tendencies from the dynamic core of SPCAM’s host model CAM5 and PBL diffusion. In addition, we include surface sensible and latent heat fluxes. For more details, please read the data documentation inside the tar.gz files.

  5. A

    Artificial Intelligence Training Dataset Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Artificial Intelligence Training Dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/artificial-intelligence-training-dataset-1958994
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 3, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Artificial Intelligence (AI) Training Dataset market is experiencing robust growth, driven by the increasing adoption of AI across diverse sectors. The market's expansion is fueled by the burgeoning need for high-quality data to train sophisticated AI algorithms capable of powering applications like smart campuses, autonomous vehicles, and personalized healthcare solutions. The demand for diverse dataset types, including image classification, voice recognition, natural language processing, and object detection datasets, is a key factor contributing to market growth. While the exact market size in 2025 is unavailable, considering a conservative estimate of a $10 billion market in 2025 based on the growth trend and reported market sizes of related industries, and a projected CAGR (Compound Annual Growth Rate) of 25%, the market is poised for significant expansion in the coming years. Key players in this space are leveraging technological advancements and strategic partnerships to enhance data quality and expand their service offerings. Furthermore, the increasing availability of cloud-based data annotation and processing tools is further streamlining operations and making AI training datasets more accessible to businesses of all sizes. Growth is expected to be particularly strong in regions with burgeoning technological advancements and substantial digital infrastructure, such as North America and Asia Pacific. However, challenges such as data privacy concerns, the high cost of data annotation, and the scarcity of skilled professionals capable of handling complex datasets remain obstacles to broader market penetration. The ongoing evolution of AI technologies and the expanding applications of AI across multiple sectors will continue to shape the demand for AI training datasets, pushing this market toward higher growth trajectories in the coming years. The diversity of applications—from smart homes and medical diagnoses to advanced robotics and autonomous driving—creates significant opportunities for companies specializing in this market. Maintaining data quality, security, and ethical considerations will be crucial for future market leadership.

  6. A

    AI Training Dataset Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). AI Training Dataset Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-market-5881
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    global
    Variables measured
    Market Size
    Description

    The AI Training Dataset Market size was valued at USD 2124.0 million in 2023 and is projected to reach USD 8593.38 million by 2032, exhibiting a CAGR of 22.1 % during the forecasts period. An AI training dataset is a collection of data used to train machine learning models. It typically includes labeled examples, where each data point has an associated output label or target value. The quality and quantity of this data are crucial for the model's performance. A well-curated dataset ensures the model learns relevant features and patterns, enabling it to generalize effectively to new, unseen data. Training datasets can encompass various data types, including text, images, audio, and structured data. The driving forces behind this growth include:

  7. d

    Process-guided deep learning water temperature predictions: 6 Model...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Jun 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Climate Adaptation Science Centers (2024). Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE) [Dataset]. https://catalog.data.gov/dataset/process-guided-deep-learning-water-temperature-predictions-6-model-evaluation-test-data-an
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Climate Adaptation Science Centers
    Description

    This dataset includes evaluation data ("test" data) and performance metrics for water temperature predictions from multiple modeling frameworks. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Performance was measured as root-mean squared errors relative to temperature observations during the test period. Test data include compiled water temperature data from a variety of sources, including the Water Quality Portal (Read et al. 2017), the North Temperate Lakes Long-TERM Ecological Research Program (https://lter.limnology.wisc.edu/), the Minnesota department of Natural Resources, and the Global Lake Ecological Observatory Network (gleon.org). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).

  8. d

    Process-guided deep learning water temperature predictions: 4 Training data

    • catalog.data.gov
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Climate Adaptation Science Centers (2024). Process-guided deep learning water temperature predictions: 4 Training data [Dataset]. https://catalog.data.gov/dataset/process-guided-deep-learning-water-temperature-predictions-4-training-data-dca4e
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Climate Adaptation Science Centers
    Description

    This dataset includes compiled water temperature data from a variety of sources, including the Water Quality Portal (Read et al. 2017), the North Temperate Lakes Long-TERM Ecological Research Program (https://lter.limnology.wisc.edu/), the Minnesota department of Natural Resources, and the Global Lake Ecological Observatory Network (gleon.org). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).

  9. D

    Notable AI Models

    • epoch.ai
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Epoch AI, Notable AI Models [Dataset]. https://epoch.ai/data/notable-ai-models
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    Epoch AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Global
    Variables measured
    https://epoch.ai/data/notable-ai-models-documentation#records
    Measurement technique
    https://epoch.ai/data/notable-ai-models-documentation#records
    Description

    Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.

  10. MullerNCItrain_.train-data.min.tsv

    • figshare.com
    txt
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaofei Zhao (2025). MullerNCItrain_.train-data.min.tsv [Dataset]. http://doi.org/10.6084/m9.figshare.28745798.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Xiaofei Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  11. U

    U.S. AI Training Dataset Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). U.S. AI Training Dataset Market Report [Dataset]. https://www.archivemarketresearch.com/reports/us-ai-training-dataset-market-4957
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    United States
    Variables measured
    Market Size
    Description

    The U.S. AI Training Dataset Market size was valued at USD 590.4 million in 2023 and is projected to reach USD 1880.70 million by 2032, exhibiting a CAGR of 18.0 % during the forecasts period. The U. S. AI training dataset market deals with the generation, selection, and organization of datasets used in training artificial intelligence. These datasets contain the requisite information that the machine learning algorithms need to infer and learn from. Conducts include the advancement and improvement of AI solutions in different fields of business like transport, medical analysis, computing language, and money related measurements. The applications include training the models for activities such as image classification, predictive modeling, and natural language interface. Other emerging trends are the change in direction of more and better-quality, various and annotated data for the improvement of model efficiency, synthetic data generation for data shortage, and data confidentiality and ethical issues in dataset management. Furthermore, due to arising technologies in artificial intelligence and machine learning, there is a noticeable development in building and using the datasets. Recent developments include: In February 2024, Google struck a deal worth USD 60 million per year with Reddit that will give the former real-time access to the latter’s data and use Google AI to enhance Reddit’s search capabilities. , In February 2024, Microsoft announced around USD 2.1 billion investment in Mistral AI to expedite the growth and deployment of large language models. The U.S. giant is expected to underpin Mistral AI with Azure AI supercomputing infrastructure to provide top-notch scale and performance for AI training and inference workloads. .

  12. m

    Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training...

    • data.mealme.ai
    Updated Jan 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MealMe (2025). Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores [Dataset]. https://data.mealme.ai/products/ai-training-data-rag-for-grocery-restaurant-and-retail-ra-mealme
    Explore at:
    Dataset updated
    Jan 23, 2025
    Dataset authored and provided by
    MealMe
    Area covered
    Wallis and Futuna, Venezuela, Somalia, Uzbekistan, South Sudan, Madagascar, Sao Tome and Principe, Greenland, Austria, Bosnia and Herzegovina
    Description

    Comprehensive training data on 1M+ stores across the US & Canada. Includes detailed menus, inventory, pricing, and availability. Ideal for AI/ML models, powering retrieval-augmented generation, search, and personalization systems.

  13. AI Training Dataset Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). AI Training Dataset Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-ai-training-dataset-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI Training Dataset Market Outlook



    The global AI training dataset market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 6.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 20.5% from 2024 to 2032. This substantial growth is driven by the increasing adoption of artificial intelligence across various industries, the necessity for large-scale and high-quality datasets to train AI models, and the ongoing advancements in AI and machine learning technologies.



    One of the primary growth factors in the AI training dataset market is the exponential increase in data generation across multiple sectors. With the proliferation of internet usage, the expansion of IoT devices, and the digitalization of industries, there is an unprecedented volume of data being generated daily. This data is invaluable for training AI models, enabling them to learn and make more accurate predictions and decisions. Moreover, the need for diverse and comprehensive datasets to improve AI accuracy and reliability is further propelling market growth.



    Another significant factor driving the market is the rising investment in AI and machine learning by both public and private sectors. Governments around the world are recognizing the potential of AI to transform economies and improve public services, leading to increased funding for AI research and development. Simultaneously, private enterprises are investing heavily in AI technologies to gain a competitive edge, enhance operational efficiency, and innovate new products and services. These investments necessitate high-quality training datasets, thereby boosting the market.



    The proliferation of AI applications in various industries, such as healthcare, automotive, retail, and finance, is also a major contributor to the growth of the AI training dataset market. In healthcare, AI is being used for predictive analytics, personalized medicine, and diagnostic automation, all of which require extensive datasets for training. The automotive industry leverages AI for autonomous driving and vehicle safety systems, while the retail sector uses AI for personalized shopping experiences and inventory management. In finance, AI assists in fraud detection and risk management. The diverse applications across these sectors underline the critical need for robust AI training datasets.



    As the demand for AI applications continues to grow, the role of Ai Data Resource Service becomes increasingly vital. These services provide the necessary infrastructure and tools to manage, curate, and distribute datasets efficiently. By leveraging Ai Data Resource Service, organizations can ensure that their AI models are trained on high-quality and relevant data, which is crucial for achieving accurate and reliable outcomes. The service acts as a bridge between raw data and AI applications, streamlining the process of data acquisition, annotation, and validation. This not only enhances the performance of AI systems but also accelerates the development cycle, enabling faster deployment of AI-driven solutions across various sectors.



    Regionally, North America currently dominates the AI training dataset market due to the presence of major technology companies and extensive R&D activities in the region. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid technological advancements, increasing investments in AI, and the growing adoption of AI technologies across various industries in countries like China, India, and Japan. Europe and Latin America are also anticipated to experience significant growth, supported by favorable government policies and the increasing use of AI in various sectors.



    Data Type Analysis



    The data type segment of the AI training dataset market encompasses text, image, audio, video, and others. Each data type plays a crucial role in training different types of AI models, and the demand for specific data types varies based on the application. Text data is extensively used in natural language processing (NLP) applications such as chatbots, sentiment analysis, and language translation. As the use of NLP is becoming more widespread, the demand for high-quality text datasets is continually rising. Companies are investing in curated text datasets that encompass diverse languages and dialects to improve the accuracy and efficiency of NLP models.



    Image data is critical for computer vision application

  14. Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human...

    • catalog.data.gov
    • data.nist.gov
    • +1more
    Updated Jul 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models [Dataset]. https://catalog.data.gov/dataset/dataset-an-open-combinatorial-diffraction-dataset-including-consensus-human-and-machine-le-0de06
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations.Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.

  15. U

    Training dataset for NABat Machine Learning V1.0

    • data.usgs.gov
    • s.cnmilf.com
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Gotthold; Ali Khalighifar; Bethany Straw; Brian Reichert, Training dataset for NABat Machine Learning V1.0 [Dataset]. http://doi.org/10.5066/P969TX8F
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Benjamin Gotthold; Ali Khalighifar; Bethany Straw; Brian Reichert
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jul 18, 2012 - Jun 17, 2021
    Description

    Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and ...

  16. Trojan Detection Software Challenge - image-classification-jun2020-train

    • data.nist.gov
    • catalog.data.gov
    Updated Mar 31, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Paul Majurski (2020). Trojan Detection Software Challenge - image-classification-jun2020-train [Dataset]. http://doi.org/10.18434/M32195
    Explore at:
    Dataset updated
    Mar 31, 2020
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Michael Paul Majurski
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    Round 1 Training Dataset The data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1000 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present. Errata: This dataset had a software bug in the trigger embedding code that caused 4 models trained for this dataset to have a ground truth value of 'poisoned' but which did not contain any triggers embedded. These models should not be used. Models Without a Trigger Embedded: id-00000184 id-00000599 id-00000858 id-00001088 Google Drive Mirror: https://drive.google.com/open?id=1uwVt3UCRL2fCX9Xvi2tLoz_z-DwbU6Ce

  17. U

    Coast Train--Labeled imagery for training and evaluation of data-driven...

    • data.usgs.gov
    • catalog.data.gov
    Updated Jan 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand (2025). Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation [Dataset]. http://doi.org/10.5066/P91NP87I
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 2008 - Dec 31, 2020
    Description

    Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...

  18. Dollar street 10 - 64x64x3

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven van der burg; Sven van der burg (2025). Dollar street 10 - 64x64x3 [Dataset]. http://doi.org/10.5281/zenodo.10970014
    Explore at:
    binAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sven van der burg; Sven van der burg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MLCommons Dollar Street Dataset is a collection of images of everyday household items from homes around the world that visually captures socioeconomic diversity of traditionally underrepresented populations. It consists of public domain data, licensed for academic, commercial and non-commercial usage, under CC-BY and CC-BY-SA 4.0. The dataset was developed because similar datasets lack socioeconomic metadata and are not representative of global diversity.

    This is a subset of the original dataset that can be used for multiclass classification with 10 categories. It is designed to be used in teaching, similar to the widely used, but unlicensed CIFAR-10 dataset.

    These are the preprocessing steps that were performed:

    1. Only take examples with one imagenet_synonym label
    2. Use only examples with the 10 most frequently occuring labels
    3. Downscale images to 64 x 64 pixels
    4. Split data in train and test
    5. Store as numpy array

    This is the label mapping:

    Categorylabel
    day bed0
    dishrag1
    plate2
    running shoe3
    soap dispenser4
    street sign5
    table lamp6
    tile roof7
    toilet seat8
    washing machine9

    Checkout https://github.com/carpentries-lab/deep-learning-intro/blob/main/instructors/prepare-dollar-street-data.ipynb" target="_blank" rel="noopener">this notebook to see how the subset was created.

    The original dataset was downloaded from https://www.kaggle.com/datasets/mlcommons/the-dollar-street-dataset. See https://mlcommons.org/datasets/dollar-street/ for more information.

  19. d

    FileMarket | 10,000 HQ Model Images from Multiple Angles for AI | LLM | ML |...

    • datarade.ai
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket | 10,000 HQ Model Images from Multiple Angles for AI | LLM | ML | DL Training Data [Dataset]. https://datarade.ai/data-products/filemarket-10-000-hq-model-images-from-multiple-angles-for-filemarket
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Aug 18, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    Jordan, Croatia, Austria, Malaysia, Switzerland, Vietnam, Liechtenstein, Romania, Oman, Sri Lanka
    Description

    Overview: FileMarket's dataset offers 10,000 high-resolution images of professional models, captured in a controlled studio environment by experienced photographers. Each image is expertly lit to ensure clarity and consistency across all photos, making this dataset an invaluable resource for various AI-driven applications.

    What Makes This Data Unique? This dataset stands out due to its meticulous attention to quality. Each model is photographed from multiple angles, providing a comprehensive view that is ideal for AI training. The diversity of models, encompassing various ethnicities, ages, and body types, ensures that the data is representative and inclusive. The consistency in lighting and background across all images reduces the need for additional preprocessing, making the data immediately usable for machine learning and deep learning projects.

    Data Sourcing: The images in this dataset were sourced exclusively from professional studio shoots. The controlled environment ensures that each image meets the highest standards, with consistent lighting, background, and quality. The photographers involved have extensive experience in fashion and commercial photography, guaranteeing that every image is of premium quality.

    Primary Use-Cases: This dataset is versatile and can be effectively used in several AI and machine learning contexts, including:

    Object Detection Data: The clear and consistent images make this dataset ideal for training models in object detection, specifically in identifying human figures and facial features. Machine Learning (ML) Data: The diversity and high quality of the images are perfect for feeding into machine learning algorithms, particularly those focused on human recognition and categorization. Deep Learning (DL) Data: The multi-angle shots of models offer a rich dataset for deep learning models that require a variety of perspectives to improve accuracy, such as in 3D reconstruction and pose estimation. Biometric Data: The detailed and varied images are suitable for training biometric systems, enhancing their ability to recognize and verify individuals across different conditions and contexts. Broader Data Offering: This dataset integrates seamlessly with other FileMarket offerings, allowing data buyers to combine it with other data types, such as text or video data, for more comprehensive AI training models. Whether for enhancing virtual try-on technologies for clothing and makeup or improving the accuracy of biometric systems, this dataset serves as a cornerstone in developing robust AI applications.

  20. D

    Data Collection and Labelling Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Collection and Labelling Report [Dataset]. https://www.marketresearchforecast.com/reports/data-collection-and-labelling-33030
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data collection and labeling market is experiencing robust growth, fueled by the escalating demand for high-quality training data in artificial intelligence (AI) and machine learning (ML) applications. The market, estimated at $15 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 25% over the forecast period (2025-2033), reaching approximately $75 billion by 2033. This expansion is primarily driven by the increasing adoption of AI across diverse sectors, including healthcare (medical image analysis, drug discovery), automotive (autonomous driving systems), finance (fraud detection, risk assessment), and retail (personalized recommendations, inventory management). The rising complexity of AI models and the need for more diverse and nuanced datasets are significant contributing factors to this growth. Furthermore, advancements in data annotation tools and techniques, such as active learning and synthetic data generation, are streamlining the data labeling process and making it more cost-effective. However, challenges remain. Data privacy concerns and regulations like GDPR necessitate robust data security measures, adding to the cost and complexity of data collection and labeling. The shortage of skilled data annotators also hinders market growth, necessitating investments in training and upskilling programs. Despite these restraints, the market’s inherent potential, coupled with ongoing technological advancements and increased industry investments, ensures sustained expansion in the coming years. Geographic distribution shows strong concentration in North America and Europe initially, but Asia-Pacific is poised for rapid growth due to increasing AI adoption and the availability of a large workforce. This makes strategic partnerships and global expansion crucial for market players aiming for long-term success.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dianxin Luan (2024). Dataset for the manuscript of Analysis on constructing the training data to train neural networks for channel estimation [Dataset]. https://ieee-dataport.org/documents/dataset-manuscript-analysis-constructing-training-data-train-neural-networks-channel

Dataset for the manuscript of Analysis on constructing the training data to train neural networks for channel estimation

Explore at:
Dataset updated
Jun 20, 2024
Authors
Dianxin Luan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

but its feasibility is challenged by the tremendous computational resources required.

Search
Clear search
Close search
Google apps
Main menu