47 datasets found

n
PyTorch geometric datasets for morphVQ models
data.niaid.nih.gov
dataone.org
+1more
zip
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bvq83bkcr
Dataset updated
Sep 29, 2022
Dataset provided by
City University of New York
University of Illinois Urbana-Champaign
American Museum of Natural History
Authors
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 μm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.
u
Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...
observatorio-cientifico.ua.es
scidb.cn
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel (2025). DIPSEER: A Dataset for In-Person Student Emotion and Engagement Recognition in the Wild [Dataset]. https://observatorio-cientifico.ua.es/documentos/67321d21aea56d4af0484172
Explore at:
Dataset updated
2025
Authors
Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel
Description
Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.
Z
Data from: Exploring deep learning models for 4D-STEM-DPC data processing
data.niaid.nih.gov
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dagenborg, Sivert (2024). Exploring deep learning models for 4D-STEM-DPC data processing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10890767
Explore at:
Dataset updated
Oct 7, 2024
Dataset provided by
Sørhaug, Jørgen
Nordahl, Gregory
Nord, Magnus
Dagenborg, Sivert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains scanning transmission electron microscopy data and processing files used in the journal publication "Exploring deep learning models for 4D-STEM-DPC data processing". DOI: 10.1016/j.ultramic.2024.114058

Prerequisites

The scripts presented below require certain open-source Python packages to run. Library versions used to run the scripts are:

hyperspy 1.7.1

pyxem 0.14.2

fpd 0.2.5

pytorch 1.12.1 (cudatoolkit 11.6.0)

jupyterlab 4.0.7

Data files

Three zipped folders are included. Two of them contain the training- and inference data for the neural networks, aptly named training_data.zip and inference_data.zip. PyTorch state dictionaries for trained models are included in the models.zip folder.

Processing scripts

All scripts are included in an IPython notebook format (.ipynb extension). The notebooks Segmentation.ipynb and Regression.ipynb contain the code for training and inference of the segmentation and regression models, respectively. The Training_data_creation.ipynb notebook contains the code to preprocess the training data for both neural network models. The Standard_algorithms.ipynb notebook has the code for doing center of mass and edge filtering/disc detection algorithms for STEM-DPC processing.
pytorch_image_models
kaggle.com
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 29, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HyeongChan Kim
Description
PyTorch Image Models

Sponsors

What's New

Introduction

Models

Features

Results

Getting Started (Documentation)

Train, Validation, Inference Scripts

Awesome PyTorch Resources

Licenses

Citing

Sponsors

A big thank you to my GitHub Sponsors for their support!

In addition to the sponsors at the link above, I've received hardware and/or cloud resources from * Nvidia (https://www.nvidia.com/en-us/) * TFRC (https://www.tensorflow.org/tfrc)

I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.

What's New

Aug 18, 2021

Optimizer bonanza!

Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ timm bits branch)

Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)

Some cleanup on all optimizers and factory. No more .data, a bit more consistency, unit tests for all!

SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).

EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.

Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.

July 12, 2021

Add XCiT models from official facebook impl. Contributed by Alexander Soare

July 5-9, 2021

Add efficientnetv2_rw_t weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)

top-1 82.34 @ 288x288 and 82.54 @ 320x320

Add SAM pretrained in1k weight for ViT B/16 (vit_base_patch16_sam_224) and B/32 (vit_base_patch32_sam_224) models.

Add 'Aggregating Nested Transformer' (NesT) w/ weights converted from official Flax impl. Contributed by Alexander Soare.

jx_nest_base - 83.534, jx_nest_small - 83.120, jx_nest_tiny - 81.426

June 23, 2021

Reproduce gMLP model training, gmlp_s16_224 trained to 79.6 top-1, matching paper. Hparams for this and other recent MLP training here

June 20, 2021

Release Vision Transformer 'AugReg' weights from How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

.npz weight loading support added, can load any of the 50K+ weights from the AugReg series

See example notebook from official impl for navigating the augreg weights

Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.

Highlights: vit_large_patch16_384 (87.1 top-1), vit_large_r50_s32_384 (86.2 top-1), vit_base_patch16_384 (86.0 top-1)

vit_deit_* renamed to just deit_*

Remove my old small model, replace with DeiT compatible small w/ AugReg weights

Add 1st training of my gmixer_24_224 MLP /w GLU, 78.1 top-1 w/ 25M params.

Add weights from official ResMLP release (https://github.com/facebookresearch/deit)

Add eca_nfnet_l2 weights from my 'lightweight' series. 84.7 top-1 at 384x384.

Add distilled BiT 50x1 student and 152x2 Teacher weights from Knowledge distillation: A good teacher is patient and consistent

NFNets and ResNetV2-BiT models work w/ Pytorch XLA now

weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)

eps values adjusted, will be slight differences but should be quite close

Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models ...
M
Machine Learning Framework Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Machine Learning Framework Report [Dataset]. https://www.datainsightsmarket.com/reports/machine-learning-framework-1989715
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Jul 21, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Machine Learning Framework market is experiencing robust growth, driven by the increasing adoption of AI and machine learning across diverse industries. The market's expansion is fueled by several key factors, including the rising availability of large datasets, advancements in computing power (especially cloud computing), and the growing demand for automated decision-making and predictive analytics. Organizations across sectors, from healthcare and finance to manufacturing and retail, are leveraging machine learning frameworks to improve operational efficiency, enhance customer experiences, and gain a competitive edge. The increasing complexity of data analysis tasks and the need for specialized tools are also contributing to market growth. We estimate the 2025 market size to be around $15 billion, considering the significant investments made by major technology players and the accelerating adoption rates. A conservative Compound Annual Growth Rate (CAGR) of 20% is projected for the forecast period (2025-2033), indicating a substantial increase in market value over the next decade. However, several challenges persist. The high cost of implementation and maintenance of machine learning solutions, particularly for smaller businesses, remains a significant restraint. Moreover, the lack of skilled professionals proficient in machine learning and data science creates a talent gap that hinders broader adoption. Security and ethical concerns related to AI and the potential for bias in algorithms also present obstacles to the market's continued expansion. Despite these challenges, the long-term outlook for the machine learning framework market remains positive, with continuous innovation and the development of more user-friendly tools expected to drive further growth. The segmentation of the market encompasses open-source and commercial frameworks, cloud-based and on-premise solutions, and industry-specific applications. The competitive landscape is dominated by established technology giants like Google, Microsoft, Amazon, and IBM, alongside a growing number of specialized providers and open-source contributors. This competitive dynamic fosters innovation and accessibility.
n
neural network Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). neural network Report [Dataset]. https://www.datainsightsmarket.com/reports/neural-network-1493802
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
CA
Variables measured
Market Size
Description
The global neural network market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) across diverse sectors. The market's expansion is fueled by several key factors, including the rising availability of large datasets, advancements in computing power (particularly GPUs from NVIDIA and cloud services from AWS), and the development of more sophisticated neural network architectures capable of handling complex tasks. The period from 2019 to 2024 witnessed significant progress, laying the groundwork for accelerated growth in the coming years. While precise figures are unavailable, considering the average Compound Annual Growth Rate (CAGR) of similar AI segments, a reasonable estimate for the 2025 market size would be around $15 billion, with a projected CAGR of 25% from 2025 to 2033. This growth is further supported by the continuous innovation in deep learning techniques and their successful implementation in various applications, ranging from image and speech recognition to natural language processing and autonomous vehicles. Key players like Google (via TensorFlow), Microsoft, IBM, and Amazon (AWS) are heavily investing in research and development, expanding their neural network offerings, and driving widespread adoption. The market segmentation is likely diverse, encompassing various types of neural networks (convolutional, recurrent, etc.), deployment models (cloud, on-premise), and industry verticals (healthcare, finance, automotive, etc.). While specific segment breakdowns are not provided, the projected growth indicates strong potential across all these areas. Restraining factors include the high computational costs associated with training complex neural networks, the need for specialized expertise, and concerns around data privacy and security. However, these challenges are being addressed through advancements in hardware, software, and data management techniques, paving the way for continued market expansion. The forecast period of 2025-2033 promises significant opportunities for companies offering neural network solutions, with substantial growth anticipated across all regions, particularly in North America and Asia-Pacific regions known for their technological advancements and substantial investments in AI research.
Data from: WaveFake: A data set to facilitate audio DeepFake detection
zenodo.org
bin, zip
Updated Nov 3, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joel Frank; Joel Frank; Lea Schönherr; Lea Schönherr (2021). WaveFake: A data set to facilitate audio DeepFake detection [Dataset]. http://doi.org/10.5281/zenodo.4904579
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904579
Dataset updated
Nov 3, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joel Frank; Joel Frank; Lea Schönherr; Lea Schönherr
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The main purpose of this data set is to facilitate research into audio DeepFakes.
These generated media files have been increasingly used to commit impersonation attempts or online harassment.

The data set consists of 88,600 generated audio clips (16-bit PCM wav).
All of these samples were generated by four different neural network architectures:

MelGAN

Parallel WaveGAN

Multi-Band MelGAN

WaveGlow

Additionally, we examined a bigger version of MelGAN and investigated a variant of Multi-Band MelGAN that computes its auxiliary loss over the full audio instead of its subbands.

Collection Process

For WaveGlow, we utilize the official implementation (commit 8afb643) in conjunction with the official pre-trained network on PyTorch Hub.
We use a popular implementation available on GitHub (commit 12c677e) for the remaining networks.
The repository also offers pre-trained models.
We used the pre-trained networks to generate samples that are similar to their respective training distributions, LJ Speech and JSUT.
When sampling the data set, we first extract Mel spectrograms from the original audio files, using the pre-processing scripts of the corresponding repositories.
We then feed these Mel spectrograms to the respective models to obtain the data set.

This data set is licensed with a CC-BY-SA 4.0 license.

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -- EXC-2092 CaSa -- 390781972.
f
Overview of deep learning terminology.
plos.figshare.com
xls
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron E. Maxwell; Sarah Farhadpour; Srinjoy Das; Yalin Yang (2024). Overview of deep learning terminology. [Dataset]. http://doi.org/10.1371/journal.pone.0315127.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315127.t001
Dataset updated
Dec 5, 2024
Dataset provided by
PLOS ONE
Authors
Aaron E. Maxwell; Sarah Farhadpour; Srinjoy Das; Yalin Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Convolutional neural network (CNN)-based deep learning (DL) methods have transformed the analysis of geospatial, Earth observation, and geophysical data due to their ability to model spatial context information at multiple scales. Such methods are especially applicable to pixel-level classification or semantic segmentation tasks. A variety of R packages have been developed for processing and analyzing geospatial data. However, there are currently no packages available for implementing geospatial DL in the R language and data science environment. This paper introduces the geodl R package, which supports pixel-level classification applied to a wide range of geospatial or Earth science data that can be represented as multidimensional arrays where each channel or band holds a predictor variable. geodl is built on the torch package, which supports the implementation of DL using the R and C++ languages without the need for installing a Python/PyTorch environment. This greatly simplifies the software environment needed to implement DL in R. Using geodl, geospatial raster-based data with varying numbers of bands, spatial resolutions, and coordinate reference systems are read and processed using the terra package, which makes use of C++ and allows for processing raster grids that are too large to fit into memory. Training loops are implemented with the luz package. The geodl package provides utility functions for creating raster masks or labels from vector-based geospatial data and image chips and associated masks from larger files and extents. It also defines a torch dataset subclass for geospatial data for use with torch dataloaders. UNet-based models are provided with a variety of optional ancillary modules or modifications. Common assessment metrics (i.e., overall accuracy, class-level recalls or producer’s accuracies, class-level precisions or user’s accuracies, and class-level F1-scores) are implemented along with a modified version of the unified focal loss framework, which allows for defining a variety of loss metrics using one consistent implementation and set of hyperparameters. Users can assess models using standard geospatial and remote sensing metrics and methods and use trained models to predict to large spatial extents. This paper introduces the geodl workflow, design philosophy, and goals for future development.
D
Deep Learning System Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Deep Learning System Software Report [Dataset]. https://www.datainsightsmarket.com/reports/deep-learning-system-software-1444412
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Deep Learning System Software market is experiencing robust growth, driven by the increasing adoption of AI across various industries. The market's expansion is fueled by the need for efficient and scalable solutions to handle the massive datasets required for training sophisticated deep learning models. Key factors contributing to this growth include the proliferation of cloud computing services offering readily accessible deep learning platforms, the development of more powerful and energy-efficient hardware (GPUs and specialized AI chips), and the rising demand for automated decision-making systems in sectors like healthcare, finance, and manufacturing. The market is segmented by software type (e.g., frameworks, libraries, tools), deployment model (cloud, on-premise), and industry vertical. Leading players like Microsoft, Nvidia, Google (Alphabet), and Intel are actively investing in R&D and strategic acquisitions to strengthen their market positions. Competition is intense, with companies focusing on providing specialized solutions tailored to specific industry needs and improving the ease of use and accessibility of their software. While challenges remain, such as the need for skilled data scientists and the ethical considerations surrounding AI deployment, the overall market outlook remains positive, projecting significant expansion over the forecast period. Despite the positive outlook, several restraints could potentially hinder market growth. These include the high cost of implementation, the complexity of deep learning systems requiring specialized expertise, and concerns regarding data security and privacy. The need for continuous updates and maintenance to keep pace with technological advancements also presents a challenge. However, ongoing research and development in areas such as automated machine learning (AutoML) and edge AI are expected to mitigate some of these challenges. The market is likely to witness increased consolidation as larger players acquire smaller companies with specialized technologies. Furthermore, the growing importance of data annotation and model explainability will create new market opportunities for specialized service providers. The future of the Deep Learning System Software market is characterized by innovation, competition, and the ongoing need to address ethical and practical concerns. We expect the market to demonstrate a steady and considerable increase in value throughout the forecast period.
D
Data Science And Ml Platforms Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Data Science And Ml Platforms Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-science-and-ml-platforms-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Science And ML Platforms Market Outlook

The global market size for Data Science and ML Platforms was estimated to be approximately USD 78.9 billion in 2023, and it is projected to reach around USD 307.6 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 16.4% during the forecast period. This remarkable growth can be largely attributed to the increasing adoption of artificial intelligence (AI) and machine learning (ML) across various industries to enhance operational efficiency, predictive analytics, and decision-making processes.

The surge in big data and the necessity to make sense of unstructured data is a substantial growth driver for the Data Science and ML Platforms market. Organizations are increasingly leveraging data science and machine learning to gain insights that can help them stay competitive. This is especially true in sectors like retail and e-commerce where customer behavior analytics can lead to more targeted marketing strategies, personalized shopping experiences, and improved customer retention rates. Additionally, the proliferation of IoT devices is generating massive amounts of data, which further fuels the need for advanced data analytics platforms.

Another significant growth factor is the increasing adoption of cloud-based solutions. Cloud platforms offer scalable resources, flexibility, and substantial cost savings, making them attractive for enterprises of all sizes. Cloud-based data science and machine learning platforms also facilitate collaboration among distributed teams, enabling more efficient workflows and faster time-to-market for new products and services. Furthermore, advancements in cloud technologies, such as serverless computing and containerization, are making it easier for organizations to deploy and manage their data science models.

Investment in AI and ML by key industry players also plays a crucial role in market growth. Tech giants like Google, Amazon, Microsoft, and IBM are making substantial investments in developing advanced AI and ML tools and platforms. These investments are not only driving innovation but also making these technologies more accessible to smaller enterprises. Additionally, mergers and acquisitions in this space are leading to more integrated and comprehensive solutions, which are further accelerating market growth.

Machine Learning Tools are at the heart of this technological evolution, providing the necessary frameworks and libraries that empower developers and data scientists to create sophisticated models and algorithms. These tools, such as TensorFlow, PyTorch, and Scikit-learn, offer a range of functionalities from data preprocessing to model deployment, catering to both beginners and experts. The accessibility and versatility of these tools have democratized machine learning, enabling a wider audience to harness the power of AI. As organizations continue to embrace digital transformation, the demand for robust machine learning tools is expected to grow, driving further innovation and development in this space.

From a regional perspective, North America is expected to hold the largest market share due to the early adoption of advanced technologies and the presence of major market players. However, the Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period. This is driven by increasing investments in AI and ML, a burgeoning start-up ecosystem, and supportive government policies aimed at digital transformation. Countries like China, India, and Japan are at the forefront of this growth, making significant strides in AI research and application.

Component Analysis

When analyzing the Data Science and ML Platforms market by component, it's essential to differentiate between software and services. The software segment includes platforms and tools designed for data ingestion, processing, visualization, and model building. These software solutions are crucial for organizations looking to harness the power of big data and machine learning. They provide the necessary infrastructure for data scientists to develop, test, and deploy ML models. The software segment is expected to grow significantly due to ongoing advancements in AI algorithms and the increasing need for more sophisticated data analysis tools.

The services segment in the Data Science and ML Platforms market encompasses consulting, system integration, and support services. Consulting services help organizatio

Sentence/Table Pair Data from Wikipedia for Pre-training with...

zenodo.org
data.niaid.nih.gov

application/gzip

Updated Oct 29, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. http://doi.org/10.5281/zenodo.5612316

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5612316

Dataset updated

Oct 29, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

There are two files:

sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

Below is a sample code snippet to load the data

import webdataset as wds

# path to the uncompressed files, should be a directory with a set of tar files
url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
dataset = (
  wds.Dataset(url)
  .shuffle(1000) # cache 1000 samples and shuffle
  .decode()
  .to_tuple("json")
  .batched(20) # group every 20 examples into a batch
)

# Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
# You can also iterate through all examples and dump them with your preferred data format

Below we show how the data is organized with two examples.

Text-only

{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
 's1_all_links': {
  'Sils,_Girona': [[0, 4]],
  'municipality': [[10, 22]],
  'Comarques_of_Catalonia': [[30, 37]],
  'Selva': [[41, 46]],
  'Catalonia': [[51, 60]]
 }, # list of entities and their mentions in the sentence (start, end location)
 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
  {
    'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
    's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
    's2s': [ # list of other sentences that contain the common entity pair, or evidence
     {
       'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
       'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
       's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
       'pair_locs': [ # mentions of the entity pair in the evidence
        [[19, 27]], # mentions of entity 1
        [[0, 5], [288, 293]] # mentions of entity 2
       ],
       'all_links': {
        'Selva': [[0, 5], [288, 293]],
        'Comarques_of_Catalonia': [[19, 27]],
        'Catalonia': [[40, 49]]
       }
      }
    ,...] # there are multiple evidence sentences
   },
 ,...] # there are multiple entity pairs in the query
}

Hybrid

{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
 's1_all_links': {...}, # same as text-only
 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
 'table_pairs': [
  'tid': 'Major_League_Baseball-1',
  'text':[
    ['World Series Records', 'World Series Records', ...],
    ['Team', 'Number of Series won', ...],
    ['St. Louis Cardinals (NL)', '11', ...],
  ...] # table content, list of rows
  'index':[
    [[0, 0], [0, 1], ...],
    [[1, 0], [1, 1], ...],
  ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
  'value_ranks':[
    [0, 0, ...],
    [0, 0, ...],
    [0, 10, ...],
  ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
  'value_inv_ranks': [], # inverse rank
  'all_links':{
    'St._Louis_Cardinals': {
     '2': [
      [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
     ] # list of mentions in the second row, the key is row_id
    },
    'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
  }
  'name': '', # table name, if exists
  'pairs': {
    'pair': ['American_League', 'National_League'],
    's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
    'table_pair_locs': {
     '17': [ # mention of entity pair in row 17
       [
        [[17, 0], [3, 18]],
        [[17, 1], [3, 18]],
        [[17, 2], [3, 18]],
        [[17, 3], [3, 18]]
       ], # mention of the first entity
       [
        [[17, 0], [21, 36]],
        [[17, 1], [21, 36]],
       ] # mention of the second entity
     ]
    }
   }
 ]
}

h
SPIRE_EMA_CORPUS
huggingface.co
Updated Jul 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sathvik Udupa (2025). SPIRE_EMA_CORPUS [Dataset]. https://huggingface.co/datasets/viks66/SPIRE_EMA_CORPUS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2025
Authors
Sathvik Udupa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This corpus contains paired data of speech, articulatory movements and phonemes. There are 38 speakers in the corpus, each with 460 utterances. The raw audio files are in audios.zip. The ema data and preprocessed data is stored in processed.zip. The processed data can be loaded with pytorch and has the following keys -

ema_raw : The raw ema data

ema_clipped : The ema data after trimming using being-end time stamps

ema_trimmed_and_normalised_with_6_articulators: The ema data after trimming… See the full description on the dataset page: https://huggingface.co/datasets/viks66/SPIRE_EMA_CORPUS.
Data from: Data and scripts from: “Denoising autoencoder for reconstructing...
osti.gov
Updated Jan 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bi, Xiangyu; Chou, Chunwei; Johnsen, Timothy; Ramakrishnan, Lavanya; Skone, Jonathan; Varadharajan, Charuleka; Wu, Yuxin (2025). Data and scripts from: “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification” [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2561511
Explore at:
Dataset updated
Jan 1, 2025
Dataset provided by
United States Department of Energyhttp://energy.gov/
Watershed Function SFA
Authors
Bi, Xiangyu; Chou, Chunwei; Johnsen, Timothy; Ramakrishnan, Lavanya; Skone, Jonathan; Varadharajan, Charuleka; Wu, Yuxin
Description
This data package includes data and scripts from the manuscript “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification”.The study addressed common challenges faced in environmental sensing and modeling, including uncertain input data, missing sensor observations, and high-dimensional datasets with interrelated but redundant variables. Point-scaled meteorological and soil sensor observations were perturbed with noises and missing values, and denoising autoencoder (DAE) neural networks were developed to reconstruct the perturbed data and further predict evapotranspiration. This study concluded that (1) the reconstruction quality of each variable depends on its cross-correlation and alignment to the underlying data structure, (2) uncertainties from the models were overall stronger than those from the data corruption, and (3) there was a tradeoff between reducing bias and reducing variance when evaluating the uncertainty of the machine learning models.This package includes:(1) Four ipython scripts (.ipynb): “DAE_train.ipynb” trains and evaluates DAE neural networks, “DAE_predict.ipynb” makes predictions from the trained DAE models, “ET_train.ipynb” trains and evaluates ET prediction neural networks, and “ET_predict.ipynb” makes predictions from trained ET models.(2) One python file (.py): “methods.py” includes all user-defined functions and python codes used in the ipython scripts.(3) A “sub_models” folder that includes fivemore » trained DAE neural networks (in pytorch format, .pt), which could be used to ingest input data before being fed to the downstream ET models in ‘ET_train.ipynb” or ‘ET_predict.ipynb’.(4) Two data files (.csv). Daily meteorological, vegetation, and soil data is in “df_data.csv”, where “df_meta.csv” contains the location and time information of “df_data.csv”. Each row (index) in “df_meta.csv” corresponds to each row in “df_data.csv”. These data files are formatted to follow the data structure requirements and be directly used in the ipython scripts, and they have been shuffled chronologically to train machine learning models. The meteorological and soil data was collected using point sensors between 2019-2023 at(4.a) Three shrub-dominated field sites in East River, Colorado (named “ph1”, “ph2” and “sg5” in “df_meta.csv”, where “ph1” and “ph2” were located at PumpHouse Hillslopes, and “sg5” was at Snodgrass Mountain meadow) and(4.b) One outdoor, mesoscale, and herbaceous-dominated experiment in Berkeley, California (named “tb” in “df_meta.csv”, short for Smartsoils Testbed at Lawrence Berkeley National Lab).- See "df_data_dd.csv" and "df_meta_dd.csv" for variable descriptions and the Methods section for additional data processing steps. See "flmd.csv" and "README.txt" for brief file descriptions.- All ipython scripts and python files are written in and require PYTHON language software.« less
Z
Dataset for class comment analysis
data.niaid.nih.gov
Updated Feb 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
Explore at:
Dataset updated
Feb 22, 2022
Dataset authored and provided by
Pooja Rani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

Structure

Projects/ Java_projects/ eclipse.zip guava.zip guice.zip hadoop.zip spark.zip vaadin.zip Pharo_projects/ images/ GToolkit.zip Moose.zip PetitParser.zip Pillar.zip PolyMath.zip Roassal2.zip Seaside.zip vm/ 70-x64/Pharo Scripts/ ClassCommentExtraction.st SampleSelectionScript.st Python_projects/ django.zip ipython.zip Mailpile.zip pandas.zip pipenv.zip pytorch.zip requests.zip

Contents of the Replication Package

Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

Pharo_projects/

images/ -

GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

vm/ -

70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

Python_projects/

django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django

ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython

Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile

pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas

pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv

pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch

requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
v
Virginia Tech Natural Motion Dataset
data.lib.vt.edu
xlsx
Updated Jun 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Geissinger; Alan Asbeck; Mohammad Mehdi Alemi; S. Emily Chang (2021). Virginia Tech Natural Motion Dataset [Dataset]. http://doi.org/10.7294/2v3w-sb92
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.7294/2v3w-sb92
Dataset updated
Jun 3, 2021
Dataset provided by
University Libraries, Virginia Tech
Authors
Jack Geissinger; Alan Asbeck; Mohammad Mehdi Alemi; S. Emily Chang
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
The Virginia Tech Natural Motion Dataset contains 40 hours of unscripted human motion (full body kinematics) collected in the open world using an XSens MVN Link system. In total, there are data from 17 participants (13 participants on a college campus and 4 at a home improvement store). Participants did a wide variety of activities, including: walking from one place to another; operating machinery; talking with others; manipulating objects; working at a desk; driving; eating; pushing/pulling carts and dollies; physical exercises such as jumping jacks, jogging, and pushups; sweeping; vacuuming; and emptying a dishwasher. The code for analyzing the data is freely available with this dataset and also at: https://github.com/ARLab-VT/VT-Natural-Motion-Processing. The portion of the dataset involving workers was funded by Lowe's, Inc.
Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for...
data.europa.eu
unknown
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for surrogate modeling of history-dependent non-linear problems: application to history-dependent homogenized response of heterogeneous materials" [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-10551272?locale=ro
Explore at:
unknown(26347)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Development of the Self-Consistency reinforced Minimum Recurrent Unit (SC-MRU) This directory contains the data and algorithms generated in publication1 Table of Contents Dependencies and Prerequisites Structure of Repository Part 1: Data preparation Part 2: RNN training Part 3: Multiscale analysis Part 4: Reproduce paper[^1] figures Dependencies and Prerequisites Python, pandas, matplotlib, texttabble and latextable are pre requisites for visualizing and navigating the data. For generating mesh and for vizualization, gmsh (www.gmsh.info) is required. For running simulations, cm3Libraries (http://www.ltas-cm3.ulg.ac.be/openSource.htm) is required. Instructions using apt & pip3 package manager Instructions for Debian/Ubuntu based workstations are as follows. python, pandas and dependencies sudo apt install python3 python3-scipy libpython3-dev python3-numpy python3-pandas matplotlib, texttabble and latextable pip3 install matplotlib texttable latextable Pytorch (only for run with cm3Libraries) Without GPU pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu With GPU pip3 install torch torchvision torchaudio Libtorch (for compiling the cells) Without GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch) wget https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.1.1%2Bcpu.zip unzip libtorch-shared-with-deps-2.1.1%2Bcpu.zip With GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch) wget https://download.pytorch.org/libtorch/cu121/libtorch-shared-with-deps-2.1.1%2Bcu121.zip unzip libtorch-shared-with-deps-2.1.1+cu121.zip Structure of Repository All_Path_Res: results of the direct numerical simulations used as training and testing data, see details in Part 1: Data preparation. ConstRVE: script to run direct numerical finite element simulations, see details in Part 1: Data preparation. MultiScale: scripts to run and visualise the multiscale analyses, see details in Part 3: Multiscale analysis. SC_MRU: implementation of the RNN and scripts to train them, see details in Part 2: RNN training. TrainingData: scripts to collect, normalise and truncate the RVEs direct simulation results as training and testing data, see details in Part 1: Data preparation. The director also contained the storred processed data used in 1. TrainingPaths: scripts to generate the different loading paths for the direct numerical simulations used as training and testing data, see details in Part 1: Data preparation. Part 1: Data preparation Generate the loading paths TrainingPaths/testGenerationData.py is used to generate random walk paths, with the options Rmax = 0.11 # bound on the final Green Lagrange strain TimeStep = 1. # in second EvalStep = [1e-4,5e-3] #Bounds on the Green Lagrange increments Nmax = 2500 #maximum length of the sequence k = 4000 # number of path to generate The path are storred by default in ConstRVE/Paths/. The path has to be existing before launching the script. You can change the name in line 123 saveDir = '../ConstRVE'+'/Paths/'. Examples of generated paths can be found in ConstRVE/PathsExamples/ The command to be run from the directory TrainingPaths is (mkdir ../ConstRVE/Paths) #if needed python3 testGenerationData.py TrainingPaths/generationData_Cyclic.py is used to generate random cylic paths, with the options Rmax = [np.random.uniform(0.,0.04),np.random.uniform(0.,0.06),np.random.uniform(0.0,0.09),0.12] # bound on the final Green Lagrange strain is random TimeStep = 1. # in second EvalStep = [1e-4,5e-3] #Bounds on the Green Lagrange increments Nmax = 2500 #maximum length of the sequence k = 2000 # number of path to generate The path are stored by default in ConstRVE/Paths/. You can change the name in line 123 saveDir = '../ConstRVE'+'/Paths/'. The command to be run from the directory TrainingPaths is (mkdir ../ConstRVE/Paths) #if needed python3 generationData_Cyclic.py TrainingPaths/countPathLength.py gives average, minimum and maximum lengths of the generated paths and the distribution of the \Delta R. By default the paths are read in ConstRVE/Paths/ but the directory can be given as an argument. The file can be used to read either the generated loading paths python3 countPathLength.py '../ConstRVE/PathsExamples' or the results of the simulations python3 countPathLength.py '../All_Path_Res/Path_Res9' TrainingPaths/graphData.py generates illustrations from randomly picked paths in ConstRVE/Paths/ and generate png figures. Generate the RVEs direct simulation results Uses the loading paths existing in ConstRVE/Paths/. ConstRVE/rve.geo is the RVE geometry file that can be read by gmsh (www.gmsh.info). ConstRVE/rve.msh is the RVE mesh file that can be read by gmsh (www.gmsh.info). ConstRVE/utilsFunc.py contains python tools to be used. ConstRVE/Rve_withoutInternalVars.py is used to run all the RVE simulations: This requires cm3Libraries (http://www.ltas-cm3.ulg.ac.be/openSource.htm). All the ouptus are st
Z
Personal Protective Equipment Dataset (PPED)
data.niaid.nih.gov
zenodo.org
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6551757
Explore at:
Dataset updated
May 17, 2022
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Personal Protective Equipment Dataset (PPED)

This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.

The dataset

We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.

1.1. Image collection

We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

Light: Good lighting conditions and poor lighting conditions were studied.

Diversity: Some images contain a single person, and some contain multiple people.

Angle: The pictures we took can be divided into front and side.

A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.

1.2. Label

We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations

1.3. Dataset Features

The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

1.4. Dataset Division

The data set is divided into 9:1 training set and test set.

Baseline Experiments

We provide baseline results with five models, namely Faster R-CNN ®, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.

2.1. Environment and Configuration:

Intel Core i7-8700 CPU

NVIDIA GTX1060 GPU

16 GB of RAM

Python: 3.8.10

pytorch: 1.9.0

pycocotools: pycocotools-win

Windows 10

2.2. Applied Models

The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

2.2.1. Faster R-CNN

Faster R-CNN

backbone: resnet50+fpn

We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

We modified the dataset path, training classes and training parameters including batch size.

We run train_res50_fpn.py start training.

Then, the weights are trained by the training set.

Finally, we validate the results on the test set.

backbone: mobilenetv2

the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

2.2.2. SSD

backbone: resnet50

We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

The same training method as Faster R-CNN is applied.

The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

2.2.3. YOLOv3-spp

backbone: DarkNet53

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov3-spp-ultralytics-608.pt.

The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

2.2.4. YOLOv5

backbone: CSP_DarkNet

We modified the type information of the XML file to match our application.

We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

The weights used are: yolov5s.

The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

2.3. Evaluation

The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

Code Sources

Faster R-CNN (R and M)

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

SSD

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

YOLOv3-spp

https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

YOLOv5

https://github.com/ultralytics/yolov5
d
Using deep convolutional neural networks to forecast spatial patterns of...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Ball; Katerina Petrova; David Coomes; Seth Flaxman (2022). Using deep convolutional neural networks to forecast spatial patterns of Amazonian deforestation: supporting data and outputs [Dataset]. http://doi.org/10.5061/dryad.hdr7sqvjz
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.hdr7sqvjz
Dataset updated
Jul 21, 2022
Dataset provided by
Dryad
Authors
James Ball; Katerina Petrova; David Coomes; Seth Flaxman
Time period covered
Jun 6, 2022
Description
Input raster data all freely available online Original input raster data from:

Global Forest Change - https://glad.earthengine.app/view/global-forest-change ALOS JAXA - https://www.eorc.jaxa.jp/ALOS/en/dataset/aw3d30/aw3d30_e.htm

Processed with code at https://github.com/PatBall1/DeepForestcast Dataset includes:

Input shapefiles for each study site. Input geotiff files (.tif) for each study site. Input PyTorch tensors (.pt) for each study site. Model weights (.pt) for trained networks (for testing and forecasting). Output deforestation forecasts for each study site as geotiffs (.tif).
Fruit Infection Disease Dataset
kaggle.com
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikit kashyap (2022). Fruit Infection Disease Dataset [Dataset]. https://www.kaggle.com/datasets/nikitkashyap/fruit-infection-disease-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nikit kashyap
Description
This dataset was exported via Kaggle.com on Nov 21, 2022 It includes 5494 images. Diseases are annotated in YOLO v7 PyTorch format. The following pre-processing was applied to each image: Auto-orientation of pixel data (with EXIF-orientation stripping) Resize to 416x416 (Stretch) No image augmentation techniques were applied. Classes: 1. Strawberry ['Angular Leafspot' 'Anthracnose Fruit Rot' 'Blossom Blight' 'Gray Mold' 'Leaf Spot' 'Powdery Mildew Fruit' 'Powdery Mildew Leaf'] 2. Tomato ['disease' 'leaf mold' 'spider mites' Bean 'ALS' 'Bean Rust']
f
Training set data expansion.
plos.figshare.com
xls
Updated Mar 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu (2024). Training set data expansion. [Dataset]. http://doi.org/10.1371/journal.pone.0299471.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299471.t002
Dataset updated
Mar 7, 2024
Dataset provided by
PLOS ONE
Authors
Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.

Facebook

Twitter

Click to copy link

Link copied

Cite

Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr

PyTorch geometric datasets for morphVQ models

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.bvq83bkcr

Dataset updated

Sep 29, 2022

Dataset provided by

City University of New York
University of Illinois Urbana-Champaign
American Museum of Natural History

Authors

Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 μm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.

Clear search

Close search

Google apps

Main menu

PyTorch geometric datasets for morphVQ models

Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...

Data from: Exploring deep learning models for 4D-STEM-DPC data processing

pytorch_image_models

PyTorch Image Models

Sponsors

What's New

Aug 18, 2021

July 12, 2021

July 5-9, 2021

June 23, 2021

June 20, 2021

Machine Learning Framework Report

neural network Report

Data from: WaveFake: A data set to facilitate audio DeepFake detection

Overview of deep learning terminology.

Deep Learning System Software Report

Data Science And Ml Platforms Market Report | Global Forecast From 2025 To...

Data Science And ML Platforms Market Outlook

Component Analysis

Sentence/Table Pair Data from Wikipedia for Pre-training with...

SPIRE_EMA_CORPUS

Data from: Data and scripts from: “Denoising autoencoder for reconstructing...

Dataset for class comment analysis

Structure

Contents of the Replication Package

Virginia Tech Natural Motion Dataset

Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for...

Personal Protective Equipment Dataset (PPED)

Using deep convolutional neural networks to forecast spatial patterns of...

Fruit Infection Disease Dataset

Training set data expansion.

PyTorch geometric datasets for morphVQ models