100+ datasets found

Machine Learning Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.
d
A Dataset for Machine Learning Algorithm Development
catalog.data.gov
fisheries.noaa.gov
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2024). A Dataset for Machine Learning Algorithm Development [Dataset]. https://catalog.data.gov/dataset/a-dataset-for-machine-learning-algorithm-development2
Explore at:
Dataset updated
May 1, 2024
Dataset provided by
(Point of Contact, Custodian)
Description
This dataset consists of imagery, imagery footprints, associated ice seal detections and homography files associated with the KAMERA Test Flights conducted in 2019. This dataset was subset to include relevant data for detection algorithm development. This dataset is limited to data collected during flights 4, 5, 6 and 7 from our 2019 surveys.
Machine Learning model data
ecmwf.int
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Centre for Medium-Range Weather Forecasts (2023). Machine Learning model data [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/machine-learning-model-data
Explore at:
Dataset updated
Jan 1, 2023
Dataset authored and provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
three of these models are available:
e
SYNERGY - Open machine learning dataset on study selection in systematic...
b2find.eudat.eu
Updated Jul 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). SYNERGY - Open machine learning dataset on study selection in systematic reviews - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/1bea4d3c-ceef-5f63-89ed-80aeab18f601
Explore at:
Dataset updated
Jul 21, 2024
Description
SYNERGY is a free and open dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews. Only 2,834 (1.67%) of the academic works in the binary classified dataset are included in the systematic reviews. This makes the SYNERGY dataset a unique dataset for the development of information retrieval algorithms, especially for sparse labels. Due to the many available variables available per record (i.e. titles, abstracts, authors, references, topics), this dataset is useful for researchers in NLP, machine learning, network analysis, and more. In total, the dataset contains 82,668,134 trainable data points. The easiest way to get the SYNERGY dataset is via the synergy-dataset Python package. See https://github.com/asreview/synergy-dataset for all information. The recommended way to work with the SYNERGY dataset is via the Python package "synergy-dataset". This flexible package downloads and builds the dataset.
m
Data for: MACHINE LEARNING IN MEDICINE: CLASSIFICATION AND PREDICTION OF...
data.mendeley.com
Updated Jul 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gopi Battineni (2019). Data for: MACHINE LEARNING IN MEDICINE: CLASSIFICATION AND PREDICTION OF DEMENTIA BY SUPPORT VECTOR MACHINES (SVM) [Dataset]. http://doi.org/10.17632/tsy6rbc5d4.1
Explore at:
Unique identifier
https://doi.org/10.17632/tsy6rbc5d4.1
Dataset updated
Jul 2, 2019
Authors
Gopi Battineni
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
This set consists of a longitudinal collection of 150 subjects aged 60 to 96. Each subject was scanned on two or more visits, separated by at least one year for a total of 373 imaging sessions. For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions are included. The subjects are all right-handed and include both men and women. 72 of the subjects were characterized as nondemented throughout the study. 64 of the included subjects were characterized as demented at the time of their initial visits and remained so for subsequent scans, including 51 individuals with mild to moderate Alzheimer’s disease. Another 14 subjects were characterized as nondemented at the time of their initial visit and were subsequently characterized as demented at a later visit.
d
Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Xverum LLC
Authors
Xverum
Area covered
India, Barbados, Jordan, Oman, Dominican Republic, United Kingdom, Western Sahara, Sint Maarten (Dutch part), Cook Islands, Norway
Description
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

What Makes Our Data Unique?

Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

Primary Use Cases and Verticals

Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

Contact us for sample datasets or to discuss your specific needs.
R
Projek Machine Learning Dataset
universe.roboflow.com
zip
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
projek machine learning (2024). Projek Machine Learning Dataset [Dataset]. https://universe.roboflow.com/projek-machine-learning/projek-machine-learning-ucmet
Explore at:
zipAvailable download formats
Dataset updated
Jun 6, 2024
Dataset authored and provided by
projek machine learning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Deteksi Rempah Rempah Bounding Boxes
Description
Projek Machine Learning

## Overview Projek Machine Learning is a dataset for object detection tasks - it contains Deteksi Rempah Rempah annotations for 2,978 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
mmlu-machine-learning
huggingface.co
Updated Feb 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruce W. Lee (2024). mmlu-machine-learning [Dataset]. https://huggingface.co/datasets/brucewlee1/mmlu-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2024
Authors
Bruce W. Lee
Description
brucewlee1/mmlu-machine-learning dataset hosted on Hugging Face and contributed by the HF Datasets community
m
AI & ML Training Data | Artificial Intelligence (AI) | Machine Learning (ML)...
apiscrapy.mydatastorefront.com
Updated Nov 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
APISCRAPY (2024). AI & ML Training Data | Artificial Intelligence (AI) | Machine Learning (ML) Datasets | Deep Learning Datasets | Easy to Integrate | Free Sample [Dataset]. https://apiscrapy.mydatastorefront.com/products/ai-ml-training-data-ai-learning-dataset-ml-learning-dataset-apiscrapy
Explore at:
Dataset updated
Nov 19, 2024
Dataset authored and provided by
APISCRAPY
Area covered
Switzerland, Åland Islands, France, Romania, Slovakia, United Kingdom, Belgium, Japan, Monaco, Canada
Description
APISCRAPY's AI & ML training data is meticulously curated and labelled to ensure the best quality. Our training data comes from a variety of areas, including healthcare and banking, as well as e-commerce and natural language processing.
machine-learning dataset
figshare.com
xlsx
Updated Sep 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhang xin (2023). machine-learning dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24115383.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24115383.v1
Dataset updated
Sep 10, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
zhang xin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is used to train machine learning model for the study of passivation effect of small molecules
Global Machine Learning Market Size By Component (Hardware, Software), By...
verifiedmarketresearch.com
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Machine Learning Market Size By Component (Hardware, Software), By Enterprise Size (SMEs, Large Enterprises), By End-User (Healthcare, Retail), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/global-machine-learning-market-size-and-forecast/
Explore at:
Dataset updated
Oct 10, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Machine Learning Market size was valued at USD 10.24 Billion in 2024 and is projected to reach USD 200.08 Billion by 2031, growing at a CAGR of 10.9% from 2024 to 2031.

Key Market Drivers:

Increasing Data Volume and Complexity: The explosion of digital data is fueling ML adoption across industries. Organizations are leveraging ML to extract insights from vast, complex datasets. According to the European Commission, the volume of data globally is projected to grow from 33 zettabytes in 2018 to 175 zettabytes by 2025. For instance, on September 15, 2023, Google Cloud announced new ML-powered data analytics tools to help enterprises handle increasing data complexity.

Advancements in AI and Deep Learning Algorithms: Continuous improvements in AI algorithms are expanding ML capabilities. Deep learning breakthroughs are enabling more sophisticated applications. The U.S. National Science Foundation reported a 63% increase in AI research publications from 2017 to 2021. For instance, on August 24, 2023, DeepMind unveiled Graphcast, a new ML weather forecasting model achieving unprecedented accuracy.
h
Machine-Learning-QA-dataset
huggingface.co
Updated Jan 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Mahamulkar (2025). Machine-Learning-QA-dataset [Dataset]. https://huggingface.co/datasets/prsdm/Machine-Learning-QA-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2025
Authors
Prasad Mahamulkar
Description
prsdm/Machine-Learning-QA-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
R
Banana Machine Learning Dataset
universe.roboflow.com
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MHFaisalb (2023). Banana Machine Learning Dataset [Dataset]. https://universe.roboflow.com/mhfaisalb/banana-machine-learning
Explore at:
zipAvailable download formats
Dataset updated
Dec 11, 2023
Dataset authored and provided by
MHFaisalb
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Pisang Bounding Boxes
Description
Banana Machine Learning

## Overview Banana Machine Learning is a dataset for object detection tasks - it contains Pisang annotations for 200 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Z
MISATO - Machine learning dataset for structure-based drug discovery
data.niaid.nih.gov
zenodo.org
Updated May 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Filipe Menezes (2023). MISATO - Machine learning dataset for structure-based drug discovery [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7711952
Explore at:
Dataset updated
May 25, 2023
Dataset provided by
Filipe Menezes
Sabrina Benassou
Erinc Merdivan
Michael Sattler
Stefan Kesselheim
Marie Piraud
Till Siebenmorgen
Fabian J. Theis
Grzegorz M. Popowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Developments in Artificial Intelligence (AI) have had an enormous impact on scientific research in recent years. Yet, relatively few robust methods have been reported in the field of structure-based drug discovery. To train AI models to abstract from structural data, highly curated and precise biomolecule-ligand interaction datasets are urgently needed. We present MISATO, a curated dataset of almost 20000 experimental structures of protein-ligand complexes, associated molecular dynamics traces, and electronic properties. Semi-empirical quantum mechanics was used to systematically refine protonation states of proteins and small molecule ligands. Molecular dynamics traces for protein-ligand complexes were obtained in explicit water. The dataset is made readily available to the scientific community via simple python data-loaders. AI baseline models are provided for dynamical and electronic properties. This highly curated dataset is expected to enable the next-generation of AI models for structure-based drug discovery. Our vision is to make MISATO the first step of a vibrant community project for the development of powerful AI-based drug discovery tools.
d
Data from: Machine-learning model predictions and groundwater-quality...
catalog.data.gov
data.usgs.gov
+1more
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Machine-learning model predictions and groundwater-quality rasters of specific conductance, total dissolved solids, and chloride in aquifers of the Mississippi Embayment [Dataset]. https://catalog.data.gov/dataset/machine-learning-model-predictions-and-groundwater-quality-rasters-of-specific-conductance
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).
R
Final Project Machine Learning Dataset
universe.roboflow.com
zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jane Safirin (2024). Final Project Machine Learning Dataset [Dataset]. https://universe.roboflow.com/jane-safirin-mxn06/final-project-machine-learning
Explore at:
zipAvailable download formats
Dataset updated
Jun 3, 2024
Dataset authored and provided by
Jane Safirin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cars Bounding Boxes
Description
Final Project Machine Learning

## Overview Final Project Machine Learning is a dataset for object detection tasks - it contains Cars annotations for 1,637 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
d
Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...
datarade.ai
.csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori
Explore at:
.csvAvailable download formats
Dataset authored and provided by
Factori
Area covered
Taiwan, Sweden, Uzbekistan, Cameroon, Austria, Egypt, Faroe Islands, Palestine, Turks and Caicos Islands, Japan
Description
Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts
Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...
zenodo.org
csv
Updated Sep 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous authors; Anonymous authors (2023). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.6607065
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6607065
Dataset updated
Sep 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous authors; Anonymous authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.

The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.

Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.

The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.

Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).

As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).

The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.
Multi-Laboratory Hematoxylin and Eosin Staining Variance Unsupervised...
figshare.com
data.mendeley.com
zip
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabi Prezja; Ilkka Pölönen; Sami Äyrämö; Pekka Ruusuvuori; Teijo Kuopio (2023). Multi-Laboratory Hematoxylin and Eosin Staining Variance Unsupervised Machine Learning Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.21391107.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21391107.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Fabi Prezja; Ilkka Pölönen; Sami Äyrämö; Pekka Ruusuvuori; Teijo Kuopio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We provide the generated dataset used for unsupervised machine learning in [1]. The data is in CSV format and contains all principal components and ground truth labels, per tissue type. Tissue type codes used are; C1 for kidney, C2 for skin, C3 for colon, and 'PC' for the principal component. Please see the original design in [1] for feature extraction specifications. Features have been extracted independently for each tissue type.

Reference: Prezja, F.; Pölönen, I.; Äyrämö, S.; Ruusuvuori, P.; Kuopio, T. H&E Multi-Laboratory Staining Variance Exploration with Machine Learning. Appl. Sci. 2022, 12, 7511. https://doi.org/10.3390/app12157511
m
A dataset for machine learning research in the field of stress analyses of...
data.mendeley.com
narcis.nl
Updated Jul 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaroslav Matej (2020). A dataset for machine learning research in the field of stress analyses of mechanical structures [Dataset]. http://doi.org/10.17632/wzbzznk8z3.2
Explore at:
Unique identifier
https://doi.org/10.17632/wzbzznk8z3.2
Dataset updated
Jul 25, 2020
Authors
Jaroslav Matej
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is prepared and intended as a data source for development of a stress analysis method based on machine learning. It consists of finite element stress analyses of randomly generated mechanical structures. The dataset contains more than 270,794 pairs of stress analyses images (von Mises stress) of randomly generated 2D structures with predefined thickness and material properties. All the structures are fixed at their bottom edges and loaded with gravity force only. See PREVIEW directory with some examples. The zip file contains all the files in the dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning

Machine Learning Dataset

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Jun 19, 2024

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.

Clear search

Close search

Google apps

Main menu

Machine Learning Dataset

A Dataset for Machine Learning Algorithm Development

Machine Learning model data

SYNERGY - Open machine learning dataset on study selection in systematic...

Data for: MACHINE LEARNING IN MEDICINE: CLASSIFICATION AND PREDICTION OF...

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

Projek Machine Learning Dataset

Projek Machine Learning

mmlu-machine-learning

AI & ML Training Data | Artificial Intelligence (AI) | Machine Learning (ML)...

machine-learning dataset

Global Machine Learning Market Size By Component (Hardware, Software), By...

Machine-Learning-QA-dataset

Banana Machine Learning Dataset

Banana Machine Learning

MISATO - Machine learning dataset for structure-based drug discovery

Data from: Machine-learning model predictions and groundwater-quality...

Final Project Machine Learning Dataset

Final Project Machine Learning

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...

Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

Multi-Laboratory Hematoxylin and Eosin Staining Variance Unsupervised...

A dataset for machine learning research in the field of stress analyses of...

Machine Learning Dataset