Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Small Dataset Ml is a dataset for object detection tasks - it contains Post annotations for 571 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
of Prediction of Student’s performance by modelling small dataset size
A set of databases has been curated to cater to the academic community's research needs in the realm of Machine Learning algorithm performance, particularly in scenarios with limited sample sizes and regression problems. These databases encompass varying sample sizes, data dimensionality, and the linearity of the response variable.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Demo Test Small is a dataset for object detection tasks - it contains Flowers annotations for 803 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Atomic structure data used in the research article entitled "Small Dataset Machine-Learning Approaches to Explore the Design Space of High-Entropy Alloys: Engineering ZnTe-based Multicomponent Alloys for the Photo-Splitting of Water"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With recent success in supervised learning, artificial intelligence (AI) and machine learning (ML) can play a vital role in precision medicine. Deep learning neural networks have been used in drug discovery when larger data is available. However, applications of machine learning in clinical trials with small sample size (around a few hundreds) are limited. We propose a Similarity-Principle-Based Machine Learning (SBML) method, which is applicable for small and large sample size problems. In SBML, the attribute-scaling factors are introduced to objectively determine the relative importance of each attribute (predictor). The gradient method is used in learning (training), that is, updating the attribute-scaling factors. We evaluate SBML when the sample size is small and investigate the effects of tuning parameters. Simulations show that SBML achieves better predictions in terms of mean squared errors for various complicated nonlinear situations than full linear models, optimal and ridge regressions, mixed effect models, support vector machine and decision tree methods.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Tiny WebText
The Tiny WebText dataset is designed to help models learn about perception on web text while neutralizing the bias of the source text using critical thinking methods. By providing a rich and diverse set of texts, I aim to improve the ability of models to understand and analyze information in a more objective and unbiased manner. This dataset can be used to train and evaluate natural language processing and machine learning models, with the goal of improving their… See the full description on the dataset page: https://huggingface.co/datasets/nampdn-ai/tiny-webtext.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To develop predictive models for the reactivity of organic contaminants toward four oxidantsSO4•–, HClO, O3, and ClO2all with small sample sizes, we proposed two approaches: combining small data sets and transferring knowledge between them. We first merged these data sets and developed a unified model using machine learning (ML), which showed better predictive performance than the individual models for HClO (RMSEtest: 2.1 to 2.04), O3 (2.06 to 1.94), ClO2 (1.77 to 1.49), and SO4•– (0.75 to 0.70) because the model “corrected” the wrongly learned effects of several atom groups. We further developed knowledge transfer models for three pairs of the data sets and observed different predictive performances: improved for O3 (RMSEtest: 2.06 to 2.01)/HClO (2.10 to 1.98), mixed for O3 (2.06 to 2.01)/ClO2 (1.77 to 1.95), and unchanged for ClO2 (1.77 to 1.77)/HClO (2.1 to 2.1). The effectiveness of the latter approach depended on whether there was consistent knowledge shared between the data sets and on the performance of the individual models. We also compared our approaches with multitask learning and image-based transfer learning and found that our approaches consistently improved the predictive performance for all data sets while the other two did not. This study demonstrated the effectiveness of combining small, similar data sets and transferring knowledge between them to improve ML model performance.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Understanding plant uptake and translocation of nanomaterials is crucial for ensuring the successful and sustainable applications of seed nanotreatment. Here, we collect a dataset with 280 instances from experiments for predicting the relative metal/metalloid concentration (RMC) in maize seedlings after seed priming by various metal and metalloid oxide nanoparticles. To obtain unbiased predictions and explanations on small datasets, we present an averaging strategy and add a dimension for interpretable machine learning. The findings in post-hoc interpretations of sophisticated LightGBM models demonstrate that solubility is highly correlated with model performance. Surface area, concentration, zeta potential, and hydrodynamic diameter of nanoparticles and seedling part and relative weight of plants are dominant factors affecting RMC, and their effects and interactions are explained. Furthermore, self-interpretable models using the RuleFit algorithm are established to successfully predict RMC only based on six important features identified by post-hoc explanations. We then develop a visualization tool called RuleGrid to depict feature effects and interactions in numerous generated rules. Consistent parameter-RMC relationships are obtained by different methods. This study offers a promising interpretable data-driven approach to expand the knowledge of nanoparticle fate in plants and may profoundly contribute to the safety-by-design of nanomaterials in agricultural and environmental applications.
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
Dataset Card for tiny-imagenet
Dataset Summary
Tiny ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images.
Languages
The class labels in the dataset are in English.
Dataset Structure
Data Instances
{ 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=64x64 at 0x1A800E8E190, 'label': 15 }… See the full description on the dataset page: https://huggingface.co/datasets/zh-plus/tiny-imagenet.
Datasets used for training and testing of machine learning models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of the datasets.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Federated Learning Solutions Market size was valued at USD 151.03 Million in 2024 and is projected to reach USD 292.47 Million by 2031, growing at a CAGR of 9.50% from 2024 to 2031.
Global Federated Learning Solutions Market Drivers
The market drivers for the Federated Learning Solutions Market can be influenced by various factors. These may include:
Data privacy worries are becoming more and more of a concern. Federated learning provides a mechanism to train machine learning models without gathering sensitive data centrally, which makes it a desirable solution for companies and organizations. Data Security: Federated learning makes it possible for data to stay on local devices, lowering the possibility of data breaches and guaranteeing data security, which is essential for sectors like healthcare and finance that handle sensitive data. Cost-Effectiveness: Federated learning can save organizations money by reducing the requirement for large-scale centralized infrastructure by dispersing the training process to local devices. Regulatory Compliance: By keeping data local and minimizing data transfer, federated learning offers a solution for enterprises to comply with increasingly strict data protection rules, such as GDPR and HIPAA. Edge Computing: By enabling model training directly on edge devices, edge computing—where data processing is done closer to the source of data—has boosted the viability and efficiency of federated learning. Industry Adoption: To capitalize on the advantages of machine learning while resolving privacy and security concerns, a number of businesses, including healthcare, banking, and telecommunications, are progressively implementing federated learning solutions. Technological developments in AI and ML: Federated learning has become a viable method for training models on dispersed data sources as AI and ML technologies develop, spurring additional market innovation and uptake.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Small Language Model market is projected to grow from $6,430 million in 2025 to $37,780 million by 2033, at a CAGR of 17.8%. Growing adoption of AI, machine learning (ML), and natural language processing (NLP) technologies is driving the market. Additionally, increasing demand for virtual assistants, chatbots, and content generation tools is further fueling the growth. The market is segmented into application, type, region, and company. Based on application, the market is divided into artificial intelligence training, chatbots and virtual assistants, content generation, language translation, code development, medical diagnosis and treatment, education, and others. Based on type, the market is classified into below 5 billion parameters and above 5 billion parameters. Geographically, the market is segmented into North America, South America, Europe, Middle East & Africa, and Asia Pacific. Key players in the market include Llama 2 (Meta AI), Phi2 (Microsoft), Orca (Microsoft), Stable Beluga 7B (Meta AI), X Gen (Salesforce AI), Qwen (Alibaba), Alpaca 7B (Meta), MPT (Mosaic ML), Falcon 7B (Technology Innovation Institute (TII) from the UAE), and Zephyr (Hugging Face).
ashraq/ml-latest-small dataset hosted on Hugging Face and contributed by the HF Datasets community
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Tiny Machine Learning (TinyML) market is experiencing rapid growth, driven by the increasing demand for edge AI applications across various sectors. The market's expansion is fueled by the convergence of advancements in low-power microcontrollers, efficient machine learning algorithms, and the need for real-time data processing at the edge. Applications are diverse, ranging from smart home devices and wearables to industrial IoT sensors and medical diagnostics. The reduced latency, enhanced privacy, and decreased reliance on cloud connectivity offered by TinyML are key advantages driving adoption. While the market is currently relatively nascent, we project a substantial Compound Annual Growth Rate (CAGR) of 30% between 2025 and 2033, resulting in a market size exceeding $5 billion by 2033. This growth is further propelled by the decreasing cost of hardware and the increasing availability of user-friendly TinyML development tools and frameworks. Major players like Google, Microsoft, and ARM are heavily invested, fostering innovation and accelerating market maturity. However, challenges remain. The limitations of processing power and memory in resource-constrained devices present ongoing hurdles to overcome. Furthermore, the need for specialized expertise in developing and deploying TinyML models poses a barrier to wider adoption, especially for smaller companies. Despite these constraints, the long-term outlook remains exceptionally positive, fueled by ongoing technological advancements and the burgeoning demand for intelligent edge devices across various industry verticals. The market segmentation reflects this diversity, with significant growth anticipated in sectors such as healthcare, automotive, and industrial automation. The competition is intensifying, with established tech giants and emerging startups vying for market share, leading to increased innovation and improved solutions.
https://www.kbvresearch.com/privacy-policy/https://www.kbvresearch.com/privacy-policy/
The Global Machine Learning Model Operationalization Management (MLOps) Market size is expected to reach $29.05 billion by 2032, rising at a market growth of 39.3% CAGR during the forecast period. The MLOps market for large enterprises is witnessing significant trends driven by increasing AI adoptio
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Tiny Machine Learning (TinyML) market is experiencing rapid growth, driven by the increasing demand for edge AI applications. This market, estimated at $1.5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 35% from 2025 to 2033. This significant expansion is fueled by several key factors. The proliferation of low-power microcontrollers and sensors is enabling the deployment of intelligent functionalities in resource-constrained devices, leading to new applications in various sectors. Furthermore, advancements in model optimization techniques and efficient algorithms are continuously improving the performance and accuracy of TinyML models, making them suitable for a wider range of use cases. The rising adoption of IoT devices and the need for real-time data processing at the edge are also significantly contributing to the market's growth. Major players like Google, Microsoft, ARM, and STMicroelectronics are actively investing in research and development, fostering innovation and expanding the market's capabilities. Despite the impressive growth trajectory, the TinyML market faces certain challenges. High development costs and the complexity of integrating TinyML solutions into existing systems can hinder wider adoption, particularly among smaller companies. Furthermore, the need for robust security measures to protect against potential vulnerabilities in edge devices remains a crucial concern. Nevertheless, ongoing efforts to reduce development complexities and enhance security protocols are likely to mitigate these challenges and further accelerate market growth in the coming years. The increasing availability of user-friendly development tools and frameworks is also expected to broaden the accessibility of TinyML technology, encouraging greater participation from developers and accelerating innovation.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
"'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8
A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.
MedMNIST Landscape :
https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">
About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks
###
Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.
Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.
User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.
Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.
Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8
Github Page: https://github.com/MedMNIST/MedMNIST
My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937
Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA
The code is under Apache-2.0 License.
The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Small Dataset Ml is a dataset for object detection tasks - it contains Post annotations for 571 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).