100+ datasets found

Domain generalization results (%) in the low-data regime with a comparison...
plos.figshare.com
xls
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko (2025). Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on PACS (Photo: P, Art: A, Cartoon: C, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 random seeds. [Dataset]. http://doi.org/10.1371/journal.pone.0329799.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0329799.t005
Dataset updated
Sep 4, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on PACS (Photo: P, Art: A, Cartoon: C, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 random seeds.
Domain generalization results (%) in the low-data regime with a comparison...
plos.figshare.com
xls
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko (2025). Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on OfficeHome (Art: A, Clipart: C, Product: P, and Real-World: R). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 runs. [Dataset]. http://doi.org/10.1371/journal.pone.0329799.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0329799.t006
Dataset updated
Sep 4, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on OfficeHome (Art: A, Clipart: C, Product: P, and Real-World: R). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 runs.
Brazilian Legal Proceedings
kaggle.com
zip
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felipe Maia Polo (2021). Brazilian Legal Proceedings [Dataset]. https://www.kaggle.com/felipepolo/brazilian-legal-proceedings
Explore at:
zip(124024147 bytes)Available download formats
Dataset updated
May 14, 2021
Authors
Felipe Maia Polo
Description
The Dataset

These datasets were used while writing the following work:

Polo, F. M., Ciochetti, I., and Bertolo, E. (2021). Predicting legal proceedings status: approaches based on sequential text data. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pages 264–265.

Please cite us if you use our datasets in your academic work:

@inproceedings{polo2021predicting, title={Predicting legal proceedings status: approaches based on sequential text data}, author={Polo, Felipe Maia and Ciochetti, Itamar and Bertolo, Emerson}, booktitle={Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law}, pages={264--265}, year={2021} }

More details below!

Context

Every legal proceeding in Brazil is one of three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. The three possible classes are given in a specific instant in time, which may be temporary or permanent. Moreover, they are decided by the courts to organize their workflow, which in Brazil may reach thousands of simultaneous cases per judge. Developing machine learning models to classify legal proceedings according to their status can assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency.

In this dataset, each proceeding is made up of a sequence of short texts called “motions” written in Portuguese by the courts’ administrative staff. The motions relate to the proceedings, but not necessarily to their legal status.

Content

Our data is composed of two datasets: a dataset of ~3*10^6 unlabeled motions and a dataset containing 6449 legal proceedings, each with an individual and a variable number of motions, but which have been labeled by lawyers. Among the labeled data, 47.14% is classified as archived (class 1), 45.23% is classified as active (class 2), and 7.63% is classified as suspended (class 3).

The datasets we use are representative samples from the first (São Paulo) and third (Rio de Janeiro) most significant state courts. State courts handle the most variable types of cases throughout Brazil and are responsible for 80% of the total amount of lawsuits. Therefore, these datasets are a good representation of a very significant portion of the use of language and expressions in Brazilian legal vocabulary.

Regarding the labels dataset, the key "-1" denotes the most recent text while "-2" the second most recent and so on.

Acknowledgements

We would like to thank Ana Carolina Domingues Borges, Andrews Adriani Angeli, and Nathália Caroline Juarez Delgado from Tikal Tech for helping us to obtain the datasets. This work would not be possible without their efforts.

Inspiration

Can you develop good machine learning classifiers for text sequences? :)
R
AI in Semi-supervised Learning Market Research Report 2033
researchintelo.com
csv, pdf, pptx
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Intelo (2025). AI in Semi-supervised Learning Market Research Report 2033 [Dataset]. https://researchintelo.com/report/ai-in-semi-supervised-learning-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Research Intelo
License
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
Time period covered
2024 - 2033
Area covered
Global
Description
AI in Semi-supervised Learning Market Outlook

According to our latest research, the AI in Semi-supervised Learning market size reached USD 1.82 billion in 2024 globally, driven by rapid advancements in artificial intelligence and machine learning applications across diverse industries. The market is expected to expand at a robust CAGR of 28.1% from 2025 to 2033, reaching a projected value of USD 17.17 billion by 2033. This exponential growth is primarily fueled by the increasing need for efficient data labeling, the proliferation of unstructured data, and the growing adoption of AI-driven solutions in both large enterprises and small and medium businesses. As per the latest research, the surging demand for automation, accuracy, and cost-efficiency in data processing is significantly accelerating the adoption of semi-supervised learning models worldwide.

One of the most significant growth factors for the AI in Semi-supervised Learning market is the explosive increase in data generation across industries such as healthcare, finance, retail, and automotive. Organizations are continually collecting vast amounts of structured and unstructured data, but the process of labeling this data for supervised learning remains time-consuming and expensive. Semi-supervised learning offers a compelling solution by leveraging small amounts of labeled data alongside large volumes of unlabeled data, thus reducing the dependency on extensive manual annotation. This approach not only accelerates the deployment of AI models but also enhances their accuracy and scalability, making it highly attractive for enterprises seeking to maximize the value of their data assets while minimizing operational costs.

Another critical driver propelling the growth of the AI in Semi-supervised Learning market is the increasing sophistication of AI algorithms and the integration of advanced technologies such as deep learning, natural language processing, and computer vision. These advancements have enabled semi-supervised learning models to achieve remarkable performance in complex tasks like image and speech recognition, medical diagnostics, and fraud detection. The ability to process and interpret vast datasets with minimal supervision is particularly valuable in sectors where labeled data is scarce or expensive to obtain. Furthermore, the ongoing investments in research and development by leading technology companies and academic institutions are fostering innovation, resulting in more robust and scalable semi-supervised learning frameworks that can be seamlessly integrated into enterprise workflows.

The proliferation of cloud computing and the increasing adoption of hybrid and multi-cloud environments are also contributing significantly to the expansion of the AI in Semi-supervised Learning market. Cloud-based deployment offers unparalleled scalability, flexibility, and cost-efficiency, allowing organizations of all sizes to access cutting-edge AI tools and infrastructure without the need for substantial upfront investments. This democratization of AI technology is empowering small and medium enterprises to leverage semi-supervised learning for competitive advantage, driving widespread adoption across regions and industries. Additionally, the emergence of AI-as-a-Service (AIaaS) platforms is further simplifying the integration and management of semi-supervised learning models, enabling businesses to accelerate their digital transformation initiatives and unlock new growth opportunities.

From a regional perspective, North America currently dominates the AI in Semi-supervised Learning market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading AI vendors, robust technological infrastructure, and high investments in AI research and development are key factors driving market growth in these regions. Asia Pacific is expected to witness the fastest CAGR during the forecast period, fueled by rapid digitalization, expanding IT infrastructure, and increasing government initiatives to promote AI adoption. Meanwhile, Latin America and the Middle East & Africa are also showing promising growth potential, supported by rising awareness of AI benefits and growing investments in digital transformation projects across various sectors.

Component Analysis

The component segment of the AI in Semi-supervised Learning market is divided into software, hardware, and services, each playing a pivotal role in the adoption and implementation of semi-s
M
Machine Learning in Chip Design Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Machine Learning in Chip Design Report [Dataset]. https://www.archivemarketresearch.com/reports/machine-learning-in-chip-design-40714
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Market Size and Growth: The global market for Machine Learning (ML) in Chip Design is projected to reach USD 19.7 billion by 2033, registering a CAGR of 25.2% from 2025 to 2033. This growth is attributed to the increasing demand for faster, more power-efficient chips and the ability of ML to automate and optimize the chip design process. Key drivers include the need to reduce design time and cost, improve performance, and address emerging technologies such as AI and IoT. Market Segmentation and Trends: Based on type, supervised learning is expected to dominate the market due to its wide applications in chip design, including design rule checking, yield prediction, and fault diagnosis. Semi-supervised learning is gaining traction as it combines labeled and unlabeled data for training, offering improved accuracy. Unsupervised learning and reinforcement learning are also finding use in chip design, particularly in areas such as auto layout and routing. Major chipmakers such as Intel, NVIDIA, and Cadence Design Systems are investing heavily in ML technologies to enhance their chip design capabilities. Additionally, the adoption of ML in foundries is growing as they seek to improve yield and efficiency for their customers. This comprehensive report provides an in-depth analysis of the Machine Learning in Chip Design market, offering insights into key market dynamics, regional trends, growth drivers, and competitive landscapes. Covering the period from 2023 to 2029, the report forecasts market size and growth to assist businesses in making strategic decisions and capturing untapped opportunities.
Domain generalization results (%) in the low-data regime with a comparison...
plos.figshare.com
xls
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko (2025). Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on miniDomainNet (Clipart: C, Painting: P, Real: R, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 runs. [Dataset]. http://doi.org/10.1371/journal.pone.0329799.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0329799.t007
Dataset updated
Sep 4, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on miniDomainNet (Clipart: C, Painting: P, Real: R, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 runs.
Machine Learning Basics for Beginners🤖🧠
kaggle.com
zip
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhanupratap Biswas (2023). Machine Learning Basics for Beginners🤖🧠 [Dataset]. https://www.kaggle.com/datasets/bhanupratapbiswas/machine-learning-basics-for-beginners
Explore at:
zip(492015 bytes)Available download formats
Dataset updated
Jun 22, 2023
Authors
Bhanupratap Biswas
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Sure! I'd be happy to provide you with an introduction to machine learning basics for beginners. Machine learning is a subfield of artificial intelligence (AI) that focuses on enabling computers to learn and make predictions or decisions without being explicitly programmed. Here are some key concepts and terms to help you get started:

Supervised Learning: In supervised learning, the machine learning algorithm learns from labeled training data. The training data consists of input examples and their corresponding correct output or target values. The algorithm learns to generalize from this data and make predictions or classify new, unseen examples.

Unsupervised Learning: Unsupervised learning involves learning patterns and relationships from unlabeled data. Unlike supervised learning, there are no target values provided. Instead, the algorithm aims to discover inherent structures or clusters in the data.

Training Data and Test Data: Machine learning models require a dataset to learn from. The dataset is typically split into two parts: the training data and the test data. The model learns from the training data, and the test data is used to evaluate its performance and generalization ability.

Features and Labels: In supervised learning, the input examples are often represented by features or attributes. For example, in a spam email classification task, features might include the presence of certain keywords or the length of the email. The corresponding output or target values are called labels, indicating the class or category to which the example belongs (e.g., spam or not spam).

Model Evaluation Metrics: To assess the performance of a machine learning model, various evaluation metrics are used. Common metrics include accuracy (the proportion of correctly predicted examples), precision (the proportion of true positives among all positive predictions), recall (the proportion of true positives predicted correctly), and F1 score (a combination of precision and recall).

Overfitting and Underfitting: Overfitting occurs when a model becomes too complex and learns to memorize the training data instead of generalizing well to unseen examples. On the other hand, underfitting happens when a model is too simple and fails to capture the underlying patterns in the data. Balancing the complexity of the model is crucial to achieve good generalization.

Feature Engineering: Feature engineering involves selecting or creating relevant features that can help improve the performance of a machine learning model. It often requires domain knowledge and creativity to transform raw data into a suitable representation that captures the important information.

Bias and Variance Trade-off: The bias-variance trade-off is a fundamental concept in machine learning. Bias refers to the errors introduced by the model's assumptions and simplifications, while variance refers to the model's sensitivity to small fluctuations in the training data. Reducing bias may increase variance and vice versa. Finding the right balance is important for building a well-performing model.

Supervised Learning Algorithms: There are various supervised learning algorithms, including linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), and neural networks. Each algorithm has its own strengths, weaknesses, and specific use cases.

Unsupervised Learning Algorithms: Unsupervised learning algorithms include clustering algorithms like k-means clustering and hierarchical clustering, dimensionality reduction techniques like principal component analysis (PCA) and t-SNE, and anomaly detection algorithms, among others.

These concepts provide a starting point for understanding the basics of machine learning. As you delve deeper, you can explore more advanced topics such as deep learning, reinforcement learning, and natural language processing. Remember to practice hands-on with real-world datasets to gain practical experience and further refine your skills.
G
Semi-Supervised Learning for Pallet Vision Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Semi-Supervised Learning for Pallet Vision Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/semi-supervised-learning-for-pallet-vision-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Oct 7, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Semi-Supervised Learning for Pallet Vision Market Outlook

According to our latest research, the global Semi-Supervised Learning for Pallet Vision market size reached USD 1.12 billion in 2024, reflecting robust adoption across logistics, manufacturing, and retail sectors. The market is projected to grow at a CAGR of 19.3% from 2025 to 2033, reaching an estimated USD 5.34 billion by 2033. This impressive growth trajectory is primarily driven by the rising demand for automation, efficiency, and real-time data analytics in supply chain and warehouse operations, underpinned by advancements in artificial intelligence and computer vision technologies.

The surge in demand for semi-supervised learning for pallet vision is fundamentally rooted in the need for more intelligent, scalable, and cost-effective solutions for managing complex warehouse and logistics environments. Traditional supervised learning models require vast amounts of labeled data, which is often expensive and time-consuming to obtain, especially in dynamic industrial settings. Semi-supervised learning bridges this gap by leveraging both labeled and unlabeled data, enabling organizations to rapidly deploy vision-based systems for pallet identification, tracking, and quality inspection with reduced annotation costs. This approach not only accelerates model development but also enhances adaptability to new pallet types, environmental changes, and operational nuances, positioning it as a transformative force in industrial automation.

Another significant growth factor is the increasing adoption of Industry 4.0 principles, where the integration of AI-powered vision systems is becoming a cornerstone of digital transformation strategies. Enterprises across logistics, retail, and manufacturing are investing heavily in smart warehousing, automated material handling, and predictive maintenance, all of which benefit from robust pallet vision solutions. The ability of semi-supervised learning algorithms to continuously learn from operational data and improve over time is driving higher accuracy rates in object detection, inventory tracking, and anomaly identification. This continuous improvement loop is essential for meeting the evolving demands of omnichannel retail, just-in-time manufacturing, and globalized supply chains, further fueling market expansion.

Furthermore, regulatory pressures and increasing emphasis on workplace safety and compliance are propelling the adoption of advanced pallet vision systems. Governments and industry bodies are mandating stricter standards for product traceability, load verification, and quality assurance, particularly in sectors like food and beverage, pharmaceuticals, and automotive. Semi-supervised learning enables organizations to rapidly adapt to these regulatory requirements by facilitating faster model retraining and deployment while maintaining high levels of accuracy and reliability. As a result, companies are able to mitigate risks, reduce manual errors, and ensure compliance, which is a critical driver for sustained market growth.

Regionally, North America continues to lead the global semi-supervised learning for pallet vision market due to its advanced logistics infrastructure, high adoption of AI technologies, and strong presence of major solution providers. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid industrialization, e-commerce expansion, and increasing investments in smart manufacturing across China, Japan, and India. Europe also demonstrates substantial growth potential, supported by stringent regulatory frameworks and widespread digital transformation initiatives. As these regions continue to invest in automation and AI-driven supply chain solutions, the market is expected to witness significant growth opportunities worldwide.

Component Analysis

The component segment of the semi-supervised learning for pallet vision market is divided into software, hardware, and services, each playing a pivotal role in the overall ecosystem. S
Dataset for Fetal Ultrasound Grand Challenge: Semi-Supervised Cervical...
zenodo.org
png
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jieyun Bai; Jieyun Bai; Ziduo Yang; Ziduo Yang; Jie Gan; Hasan Md. Kamrul; Zhuonan Liang; Weidong Cai; Tan Tao; Ye Jing; Yaqub Mohammad; Ni Dong; Slimani Saad; Ohene-Botwe Benard; Víctor Manuel Campello; Víctor Manuel Campello; Karim Lekadir; Karim Lekadir; Jie Gan; Hasan Md. Kamrul; Zhuonan Liang; Weidong Cai; Tan Tao; Ye Jing; Yaqub Mohammad; Ni Dong; Slimani Saad; Ohene-Botwe Benard (2024). Dataset for Fetal Ultrasound Grand Challenge: Semi-Supervised Cervical Segmentation (ISBI 2025) [Dataset]. http://doi.org/10.5281/zenodo.14305302
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14305302
Dataset updated
Dec 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jieyun Bai; Jieyun Bai; Ziduo Yang; Ziduo Yang; Jie Gan; Hasan Md. Kamrul; Zhuonan Liang; Weidong Cai; Tan Tao; Ye Jing; Yaqub Mohammad; Ni Dong; Slimani Saad; Ohene-Botwe Benard; Víctor Manuel Campello; Víctor Manuel Campello; Karim Lekadir; Karim Lekadir; Jie Gan; Hasan Md. Kamrul; Zhuonan Liang; Weidong Cai; Tan Tao; Ye Jing; Yaqub Mohammad; Ni Dong; Slimani Saad; Ohene-Botwe Benard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 6, 2024
Description
Transvaginal ultrasound is the preferred method for visualizing the cervix in most patients, offering detailed insight into cervical anatomy and structure. Accurate segmentation of ultrasound (US) images of the cervical muscles is essential for analyzing deep muscle structures, assessing their function, and monitoring treatment protocols tailored to individual patients.

The manual annotation of cervical structures in transvaginal ultrasound images is labor-intensive and time-consuming, limiting the availability of large labeled datasets required for robust machine learning models. In response to this challenge, semi supervised learning approaches have shown potential by leveraging both labeled and unlabeled data, enabling the extraction of useful information from unannotated cases. This method could reduce the need for extensive manual annotation while maintaining accuracy, thus accelerating the development of automated cervical image segmentation systems. The envisioned impact of this challenge is twofold: improving clinical decision-making through more accessible and accurate diagnostic tools and advancing machine learning techniques for medical image analysis, particularly in resource-constrained environments.

We extend the MICCAI PSFHS 2023 Challenge and the MICCAI IUGC 2024 Challenge from fully supervised settings to a semi-supervised setting that focuses on how to use unlabeled data.

Training/Validation/Test=500/90/300

The dataset can be accessible after signing the data-sharing agreement and sending it to the organizer (fugc.isbi25@gmail.com).
t
Teapot dataset - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Teapot dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/teapot-dataset
Explore at:
Dataset updated
Dec 2, 2024
Description
The dataset used in the paper is a wide domain image dataset, and the authors propose a weakly semi-supervised method for disentangling using both labeled and unlabeled data.
h
SemiEvol
huggingface.co
Updated Oct 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
junyu (2024). SemiEvol [Dataset]. https://huggingface.co/datasets/luojunyu/SemiEvol
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2024
Authors
junyu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Dataset Name

The SemiEvol dataset is part of the broader work on semi-supervised fine-tuning for Large Language Models (LLMs). The dataset includes labeled and unlabeled data splits designed to enhance the reasoning capabilities of LLMs through a bi-level knowledge propagation and selection framework, as proposed in the paper SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation.

Dataset Details Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/luojunyu/SemiEvol.
Z
Data used in Machine learning reveals the waggle drift's role in the honey...
data-staging.niaid.nih.gov
zenodo.org
+1more
Updated May 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dormagen, David M; Wild, Benjamin; Wario, Fernando; Landgraf, Tim (2023). Data used in Machine learning reveals the waggle drift's role in the honey bee dance communication system [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7928120
Explore at:
Dataset updated
May 18, 2023
Dataset provided by
Freie Universität Berlin
Universidad de Guadalajara
Authors
Dormagen, David M; Wild, Benjamin; Wario, Fernando; Landgraf, Tim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and metadata used in "Machine learning reveals the waggle drift’s role in the honey bee dance communication system"

All timestamps are given in ISO 8601 format.

The following files are included:

Berlin2019_waggle_phases.csv, Berlin2021_waggle_phases.csv

Automatic individual detections of waggle phases during our recording periods in 2019 and 2021.

timestamp: Date and time of the detection.

cam_id: Camera ID (0: left side of the hive, 1: right side of the hive).

x_median, y_median: Median position of the bee during the waggle phase (for 2019 given in millimeters after applying a homography, for 2021 in the original image coordinates).

waggle_angle: Body orientation of the bee during the waggle phase in radians (0: oriented to the right, PI / 4: oriented upwards).

Berlin2019_dances.csv

Automatic detections of dance behavior during our recording period in 2019.

dancer_id: Unique ID of the individual bee.

dance_id: Unique ID of the dance.

ts_from, ts_to: Date and time of the beginning and end of the dance.

cam_id: Camera ID (0: left side of the hive, 1: right side of the hive).

median_x, median_y: Median position of the individual during the dance.

feeder_cam_id: ID of the feeder that the bee was detected at prior to the dance.

Berlin2019_followers.csv

Automatic detections of attendance and following behavior, corresponding to the dances in Berlin2019_dances.csv.

dance_id: Unique ID of the dance being attended or followed.

follower_id: Unique ID of the individual attending or following the dance.

ts_from, ts_to: Date and time of the beginning and end of the interaction.

label: “attendance” or “follower”

cam_id: Camera ID (0: left side of the hive, 1: right side of the hive).

Berlin2019_dances_with_manually_verified_times.csv

A sample of dances from Berlin2019_dances.csv where the exact timestamps have been manually verified to correspond to the beginning of the first and last waggle phase down to a precision of ca. 166 ms (video material was recorded at 6 FPS).

dance_id: Unique ID of the dance.

dancer_id: Unique ID of the dancing individual.

cam_id: Camera ID (0: left side of the hive, 1: right side of the hive).

feeder_cam_id: ID of the feeder that the bee was detected at prior to the dance.

dance_start, dance_end: Manually verified date and times of the beginning and end of the dance.

Berlin2019_dance_classifier_labels.csv

Manually annotated waggle phases or following behavior for our recording season in 2019 that was used to train the dancing and following classifier. Can be merged with the supplied individual detections.

timestamp: Timestamp of the individual frame the behavior was observed in.

frame_id: Unique ID of the video frame the behavior was observed in.

bee_id: Unique ID of the individual bee.

label: One of “nothing”, “waggle”, “follower”

Berlin2019_dance_classifier_unlabeled.csv

Additional unlabeled samples of timestamp and individual ID with the same format as Berlin2019_dance_classifier_labels.csv, but without a label. The data points have been sampled close to detections of our waggle phase classifier, so behaviors related to the waggle dance are likely overrepresented in that sample.

Berlin2021_waggle_phase_classifier_labels.csv

Manually annotated detections of our waggle phase detector (bb_wdd2) that were used to train the neural network filter (bb_wdd_filter) for the 2021 data.

detection_id: Unique ID of the waggle phase.

label: One of “waggle”, “activating”, “ventilating”, “trembling”, “other”. Where “waggle” denoted a waggle phase, “activating” is the shaking signal, “ventilating” is a bee fanning her wings. “trembling” denotes a tremble dance, but the distinction from the “other” class was often not clear, so “trembling” was merged into “other” for training.

orientation: The body orientation of the bee that triggered the detection in radians (0: facing to the right, PI /4: facing up).

metadata_path: Path to the individual detection in the same directory structure as created by the waggle dance detector.

Berlin2021_waggle_phase_classifier_ground_truth.zip

The output of the waggle dance detector (bb_wdd2) that corresponds to Berlin2021_waggle_phase_classifier_labels.csv and is used for training. The archive includes a directory structure as output by the bb_wdd2 and each directory includes the original image sequence that triggered the detection in an archive and the corresponding metadata. The training code supplied in bb_wdd_filter directly works with this directory structure.

Berlin2019_tracks.zip

Detections and tracks from the recording season in 2019 as produced by our tracking system. As the full data is several terabytes in size, we include the subset of our data here that is relevant for our publication which comprises over 46 million detections. We included tracks for all detected behaviors (dancing, following, attending) including one minute before and after the behavior. We also included all tracks that correspond to the labeled and unlabeled data that was used to train the dance classifier including 30 seconds before and after the data used for training. We grouped the exported data by date to make the handling easier, but to efficiently work with the data, we recommend importing it into an indexable database.

The individual files contain the following columns:

cam_id: Camera ID (0: left side of the hive, 1: right side of the hive).

timestamp: Date and time of the detection.

frame_id: Unique ID of the video frame of the recording from which the detection was extracted.

track_id: Unique ID of an individual track (short motion path from one individual). For longer tracks, the detections can be linked based on the bee_id.

bee_id: Unique ID of the individual bee.

bee_id_confidence: Confidence between 0 and 1 that the bee_id is correct as output by our tracking system.

x_pos_hive, y_pos_hive: Spatial position of the bee in the hive on the side indicated by cam_id. Given in millimeters after applying a homography on the video material.

orientation_hive: Orientation of the bees’ thorax in the hive in radians (0: oriented to the right, PI / 4: oriented upwards).

Berlin2019_feeder_experiment_log.csv

Experiment log for our feeder experiments in 2019.

date: Date given in the format year-month-day.

feeder_cam_id: Numeric ID of the feeder.

coordinates: Longitude and latitude of the feeder. For feeders 1 and 2 this is only given once and held constant. Feeder 3 had varying locations.

time_opened, time_closed: Date and time when the feeder was set up or closed again. sucrose_solution: Concentration of the sucrose solution given as sugar:water (in terms of weight). On days where feeder 3 was open, the other two feeders offered water without sugar.

Software used to acquire and analyze the data:

bb_pipeline: Tag localization and decoding pipeline

bb_pipeline_models: Pretrained localizer and decoder models for bb_pipeline

bb_binary: Raw detection data storage format

bb_irflash: IR flash system schematics and arduino code

bb_imgacquisition: Recording and network storage

bb_behavior: Database interaction and data (pre)processing, feature extraction

bb_tracking: Tracking of bee detections over time

bb_wdd2: Automatic detection and decoding of honey bee waggle dances

bb_wdd_filter: Machine learning model to improve the accuracy of the waggle dance detector

bb_dance_networks: Detection of dancing and following behavior from trajectories
f
DataSheet_1_HiRAND: A novel GCN semi-supervised deep learning-based...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jan 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huang, Yue; Zhang, Liuchao; He, Jia; Li, Kang; Rong, Zhiwei; Xu, Zhenyi; Ji, Jianxin; Hou, Yan; Liu, Weisha (2023). DataSheet_1_HiRAND: A novel GCN semi-supervised deep learning-based framework for classification and feature selection in drug research and development.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000994299
Explore at:
Dataset updated
Jan 26, 2023
Authors
Huang, Yue; Zhang, Liuchao; He, Jia; Li, Kang; Rong, Zhiwei; Xu, Zhenyi; Ji, Jianxin; Hou, Yan; Liu, Weisha
Description
The prediction of response to drugs before initiating therapy based on transcriptome data is a major challenge. However, identifying effective drug response label data costs time and resources. Methods available often predict poorly and fail to identify robust biomarkers due to the curse of dimensionality: high dimensionality and low sample size. Therefore, this necessitates the development of predictive models to effectively predict the response to drugs using limited labeled data while being interpretable. In this study, we report a novel Hierarchical Graph Random Neural Networks (HiRAND) framework to predict the drug response using transcriptome data of few labeled data and additional unlabeled data. HiRAND completes the information integration of the gene graph and sample graph by graph convolutional network (GCN). The innovation of our model is leveraging data augmentation strategy to solve the dilemma of limited labeled data and using consistency regularization to optimize the prediction consistency of unlabeled data across different data augmentations. The results showed that HiRAND achieved better performance than competitive methods in various prediction scenarios, including both simulation data and multiple drug response data. We found that the prediction ability of HiRAND in the drug vorinostat showed the best results across all 62 drugs. In addition, HiRAND was interpreted to identify the key genes most important to vorinostat response, highlighting critical roles for ribosomal protein-related genes in the response to histone deacetylase inhibition. Our HiRAND could be utilized as an efficient framework for improving the drug response prediction performance using few labeled data.
a
Stanford STL-10 Image Dataset
academictorrents.com
bittorrent
Updated Nov 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Coates and Honglak Lee and Andrew Y. Ng (2015). Stanford STL-10 Image Dataset [Dataset]. https://academictorrents.com/details/a799a2845ac29a66c07cf74e2a2838b6c5698a6a
Explore at:
bittorrent(2640397119)Available download formats
Dataset updated
Nov 26, 2015
Dataset authored and provided by
Adam Coates and Honglak Lee and Andrew Y. Ng
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
![]() The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior. We also expect that the higher resolution of this dataset (96x96) will make it a challenging benchmark for developing more scalable unsupervised learning methods. Overview 10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck. Images are 96x96 pixels, color. 500 training images (10 pre-defined folds), 800 test images per class. 100000 unlabeled images for uns
Weed Detection ( Unsupervised Learning )
kaggle.com
zip
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aryan Kaushik 005 (2025). Weed Detection ( Unsupervised Learning ) [Dataset]. https://www.kaggle.com/datasets/aryankaushik005/weed-detection-renamed
Explore at:
zip(79727855 bytes)Available download formats
Dataset updated
Feb 3, 2025
Authors
Aryan Kaushik 005
Description
Weed Detection (Unsupervised + Supervised Learning)

Overview

This dataset is designed to support both supervised and unsupervised learning for the task of weed detection in crop fields. It provides labeled data in YOLO format suitable for training object detection models, unlabeled data for semi-supervised or unsupervised learning, and a separate test set for evaluation. The objective is to detect and distinguish between weed and crop instances using deep learning models like YOLOv5 or YOLOv8.

Dataset Structure

│ ├── labeled/ │ ├── images/ # Labeled images for training │ └── labels/ # YOLO-format annotations │ ├── unlabeled/ # Unlabeled images for unsupervised or semi-supervised learning │ └── test/ ├── images/ # Test images └── labels/ # Ground truth annotations in YOLO format
UCI and OpenML Data Sets for Ordinal Quantification
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jul 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8177302
Dataset updated
Jul 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

Usage

You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended), or

julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
R
AI in Unsupervised Learning Market Research Report 2033
researchintelo.com
csv, pdf, pptx
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Intelo (2025). AI in Unsupervised Learning Market Research Report 2033 [Dataset]. https://researchintelo.com/report/ai-in-unsupervised-learning-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Research Intelo
License
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
Time period covered
2024 - 2033
Area covered
Global
Description
AI in Unsupervised Learning Market Outlook

According to our latest research, the AI in Unsupervised Learning market size reached USD 3.8 billion globally in 2024, demonstrating robust expansion as organizations increasingly leverage unsupervised techniques for extracting actionable insights from unlabelled data. The market is forecasted to grow at a CAGR of 28.2% from 2025 to 2033, propelling the industry to an estimated USD 36.7 billion by 2033. This remarkable growth trajectory is primarily fueled by the escalating adoption of artificial intelligence across diverse sectors, an exponential surge in data generation, and the pressing need for advanced analytics that can operate without manual data labeling.

One of the key growth factors driving the AI in Unsupervised Learning market is the rising complexity and volume of data generated by enterprises in the digital era. Organizations are inundated with unstructured and unlabelled data from sources such as social media, IoT devices, and transactional systems. Traditional supervised learning methods are often impractical due to the time and cost associated with manual labeling. Unsupervised learning algorithms, such as clustering and dimensionality reduction, offer a scalable solution by autonomously identifying patterns, anomalies, and hidden structures within vast datasets. This capability is increasingly vital for industries aiming to enhance decision-making, streamline operations, and gain a competitive edge through advanced analytics.

Another significant driver is the rapid advancement in computational power and AI infrastructure, which has made it feasible to implement sophisticated unsupervised learning models at scale. The proliferation of cloud computing and specialized AI hardware has reduced barriers to entry, enabling even small and medium enterprises to deploy unsupervised learning solutions. Additionally, the evolution of neural networks and deep learning architectures has expanded the scope of unsupervised algorithms, allowing for more complex tasks such as image recognition, natural language processing, and anomaly detection. These technological advancements are not only accelerating adoption but also fostering innovation across sectors including healthcare, finance, manufacturing, and retail.

Furthermore, regulatory compliance and the growing emphasis on data privacy are pushing organizations to adopt unsupervised learning methods. Unlike supervised approaches that require sensitive data labeling, unsupervised algorithms can process data without explicit human intervention, thereby reducing the risk of privacy breaches. This is particularly relevant in sectors such as healthcare and BFSI, where stringent data protection regulations are in place. The ability to derive insights from unlabelled data while maintaining compliance is a compelling value proposition, further propelling the market forward.

Regionally, North America continues to dominate the AI in Unsupervised Learning market owing to its advanced technological ecosystem, significant investments in AI research, and strong presence of leading market players. Europe follows closely, driven by robust regulatory frameworks and a focus on ethical AI deployment. The Asia Pacific region is exhibiting the fastest growth, fueled by rapid digital transformation, government initiatives, and increasing adoption of AI across industries. Latin America and the Middle East & Africa are also witnessing steady growth, albeit at a slower pace, as awareness and infrastructure continue to develop.

Component Analysis

The Component segment of the AI in Unsupervised Learning market is categorized into Software, Hardware, and Services, each playing a pivotal role in the overall ecosystem. The software segment, comprising machine learning frameworks, data analytics platforms, and AI development tools, holds the largest market share. This dominance is attributed to the continuous evolution of AI algorithms and the increasing availability of open-source and proprietary solutions tailored for unsupervised learning. Enterprises are investing heavily in software that can facilitate the seamless integration of unsupervised learning capabilities into existing workflows, enabling automation, predictive analytics, and pattern recognition without the need for labeled data.

The hardware segment, while smaller in comparison to software, is experiencing significant growth due to the escalating demand for high-perf
Dataset: Data-Driven Machine Learning-Informed Framework for Model...
zenodo.org
csv
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edgar Amalyan; Edgar Amalyan (2025). Dataset: Data-Driven Machine Learning-Informed Framework for Model Predictive Control in Vehicles [Dataset]. http://doi.org/10.5281/zenodo.15288740
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15288740
Dataset updated
May 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Edgar Amalyan; Edgar Amalyan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset belonging to the paper: Data-Driven Machine Learning-Informed Framework for Model Predictive Control in Vehicles

labeled_seed.csv: Processed and labeled data of all maneuvers combined into a single file, sorted by label

raw_track_session.csv: Untouched CSV file from Racebox track session

unlabeled_exemplar.csv: Processed but unlabeled data of street and track data
Domain generalization results (%) in the low-data regime with a comparison...
plos.figshare.com
xls
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko (2025). Domain generalization results (%) in the low-data regime with a comparison of various models in SSDG settings, evaluated on all datasets. Results are reported as mean ± standard deviation over 5 random seeds. Here, u denotes utilization of unlabeled data. Paired t-tests were conducted between CAT and other baselines, with p-values shown in the last row. [Dataset]. http://doi.org/10.1371/journal.pone.0329799.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0329799.t004
Dataset updated
Sep 4, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Domain generalization results (%) in the low-data regime with a comparison of various models in SSDG settings, evaluated on all datasets. Results are reported as mean ± standard deviation over 5 random seeds. Here, u denotes utilization of unlabeled data. Paired t-tests were conducted between CAT and other baselines, with p-values shown in the last row.
R
Hyper Kvasir Dataset
universe.roboflow.com
zip
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simula (2025). Hyper Kvasir Dataset [Dataset]. https://universe.roboflow.com/simula/hyper-kvasir/model/1
Explore at:
zipAvailable download formats
Dataset updated
Oct 7, 2025
Dataset authored and provided by
Simula
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
GI Tract
Description
Overview This is the largest Gastrointestinal dataset generously provided by Simula Research Laboratory in Norway

You can read their research paper here in Nature

In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). Each class-folder is located in a subfolder describing the type of finding, which again is located in a folder describing wheter it is a lower GI or upper GI finding. The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.

The data is collected during real gastro- and colonoscopy examinations at a Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists.

Use Cases

"Artificial intelligence is currently a hot topic in medicine. The fact that medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel to perform the cumbersome and tedious labeling of the data, leads to technical limitations. In this respect, we share the Hyper-Kvasir dataset, which is the largest image and video dataset from the gastrointestinal tract available today."

"We have used the labeled data to research the classification and segmentation of GI findings using both computer vision and ML approaches to potentially be used in live and post-analysis of patient examinations. Areas of potential utilization are analysis, classification, segmentation, and retrieval of images and videos with particular findings or particular properties from the computer science area. The labeled data can also be used for teaching and training in medical education. Having expert gastroenterologists providing the ground truths over various findings, HyperKvasir provides a unique and diverse learning set for future clinicians. Moreover, the unlabeled data is well suited for semi-supervised and unsupervised methods, and, if even more ground truth data is needed, the users of the data can use their own local medical experts to provide the needed labels. Finally, the videos can in addition be used to simulate live endoscopies feeding the video into the system like it is captured directly from the endoscopes enable developers to do image classification."

Borgli, H., Thambawita, V., Smedsrud, P.H. et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 7, 283 (2020). https://doi.org/10.1038/s41597-020-00622-y

Using this Dataset

Hyper-Kvasir is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source. This means that in all documents and papers that use or refer to the Hyper-Kvasir dataset or report experimental results based on the dataset, a reference to the related article needs to be added: PREPRINT: https://osf.io/mkzcq/. Additionally, one should provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko (2025). Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on PACS (Photo: P, Art: A, Cartoon: C, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 random seeds. [Dataset]. http://doi.org/10.1371/journal.pone.0329799.t005

Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on PACS (Photo: P, Art: A, Cartoon: C, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 random seeds.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0329799.t005

Dataset updated

Sep 4, 2025

Dataset provided by

PLOShttp://plos.org/

Authors

Sumaiya Zoha; Jeong-Gun Lee; Young-Woong Ko

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Domain generalization results (%) in the low-data regime with a comparison of various models in different settings (fully labeled, DG, SSL, and SSDG), evaluated on PACS (Photo: P, Art: A, Cartoon: C, and Sketch: S). Here, u means utilization of unlabeled data. Results are reported as mean ± standard deviation over 5 random seeds.

Clear search

Close search

Google apps

Main menu

Domain generalization results (%) in the low-data regime with a comparison...

Domain generalization results (%) in the low-data regime with a comparison...

Brazilian Legal Proceedings

The Dataset

Context

Content

Acknowledgements

Inspiration

AI in Semi-supervised Learning Market Research Report 2033

AI in Semi-supervised Learning Market Outlook

Component Analysis

Machine Learning in Chip Design Report

Domain generalization results (%) in the low-data regime with a comparison...

Machine Learning Basics for Beginners🤖🧠

Semi-Supervised Learning for Pallet Vision Market Research Report 2033

Semi-Supervised Learning for Pallet Vision Market Outlook

Component Analysis

Dataset for Fetal Ultrasound Grand Challenge: Semi-Supervised Cervical...

Teapot dataset - Dataset - LDM

SemiEvol

Data used in Machine learning reveals the waggle drift's role in the honey...

DataSheet_1_HiRAND: A novel GCN semi-supervised deep learning-based...

Stanford STL-10 Image Dataset

Weed Detection ( Unsupervised Learning )

Weed Detection (Unsupervised + Supervised Learning)

Overview

Dataset Structure

UCI and OpenML Data Sets for Ordinal Quantification

AI in Unsupervised Learning Market Research Report 2033

AI in Unsupervised Learning Market Outlook

Component Analysis

Dataset: Data-Driven Machine Learning-Informed Framework for Model...

Domain generalization results (%) in the low-data regime with a comparison...

Hyper Kvasir Dataset