47 datasets found
  1. n

    PyTorch geometric datasets for morphVQ models

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Sep 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 29, 2022
    Dataset provided by
    City University of New York
    University of Illinois Urbana-Champaign
    American Museum of Natural History
    Authors
    Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 ÎŒm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.

  2. u

    Data from: DIPSEER: A Dataset for In-Person Student Emotion and Engagement...

    • observatorio-cientifico.ua.es
    • scidb.cn
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mårquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Mårquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel (2025). DIPSEER: A Dataset for In-Person Student Emotion and Engagement Recognition in the Wild [Dataset]. https://observatorio-cientifico.ua.es/documentos/67321d21aea56d4af0484172
    Explore at:
    Dataset updated
    2025
    Authors
    Mårquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Mårquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel
    Description

    Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.

  3. Z

    Data from: Exploring deep learning models for 4D-STEM-DPC data processing

    • data.niaid.nih.gov
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dagenborg, Sivert (2024). Exploring deep learning models for 4D-STEM-DPC data processing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10890767
    Explore at:
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    SĂžrhaug, JĂžrgen
    Nordahl, Gregory
    Nord, Magnus
    Dagenborg, Sivert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains scanning transmission electron microscopy data and processing files used in the journal publication "Exploring deep learning models for 4D-STEM-DPC data processing". DOI: 10.1016/j.ultramic.2024.114058

    Prerequisites

    The scripts presented below require certain open-source Python packages to run. Library versions used to run the scripts are:

    hyperspy 1.7.1

    pyxem 0.14.2

    fpd 0.2.5

    pytorch 1.12.1 (cudatoolkit 11.6.0)

    jupyterlab 4.0.7

    Data files

    Three zipped folders are included. Two of them contain the training- and inference data for the neural networks, aptly named training_data.zip and inference_data.zip. PyTorch state dictionaries for trained models are included in the models.zip folder.

    Processing scripts

    All scripts are included in an IPython notebook format (.ipynb extension). The notebooks Segmentation.ipynb and Regression.ipynb contain the code for training and inference of the segmentation and regression models, respectively. The Training_data_creation.ipynb notebook contains the code to preprocess the training data for both neural network models. The Standard_algorithms.ipynb notebook has the code for doing center of mass and edge filtering/disc detection algorithms for STEM-DPC processing.

  4. pytorch_image_models

    • kaggle.com
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HyeongChan Kim
    Description

    PyTorch Image Models

    Sponsors

    A big thank you to my GitHub Sponsors for their support!

    In addition to the sponsors at the link above, I've received hardware and/or cloud resources from * Nvidia (https://www.nvidia.com/en-us/) * TFRC (https://www.tensorflow.org/tfrc)

    I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.

    What's New

    Aug 18, 2021

    • Optimizer bonanza!
      • Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ timm bits branch)
      • Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
      • Some cleanup on all optimizers and factory. No more .data, a bit more consistency, unit tests for all!
      • SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
    • EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
    • Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.

    July 12, 2021

    July 5-9, 2021

    • Add efficientnetv2_rw_t weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
      • top-1 82.34 @ 288x288 and 82.54 @ 320x320
    • Add SAM pretrained in1k weight for ViT B/16 (vit_base_patch16_sam_224) and B/32 (vit_base_patch32_sam_224) models.
    • Add 'Aggregating Nested Transformer' (NesT) w/ weights converted from official Flax impl. Contributed by Alexander Soare.
      • jx_nest_base - 83.534, jx_nest_small - 83.120, jx_nest_tiny - 81.426

    June 23, 2021

    • Reproduce gMLP model training, gmlp_s16_224 trained to 79.6 top-1, matching paper. Hparams for this and other recent MLP training here

    June 20, 2021

    • Release Vision Transformer 'AugReg' weights from How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
      • .npz weight loading support added, can load any of the 50K+ weights from the AugReg series
      • See example notebook from official impl for navigating the augreg weights
      • Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
      • Highlights: vit_large_patch16_384 (87.1 top-1), vit_large_r50_s32_384 (86.2 top-1), vit_base_patch16_384 (86.0 top-1)
      • vit_deit_* renamed to just deit_*
      • Remove my old small model, replace with DeiT compatible small w/ AugReg weights
    • Add 1st training of my gmixer_24_224 MLP /w GLU, 78.1 top-1 w/ 25M params.
    • Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
    • Add eca_nfnet_l2 weights from my 'lightweight' series. 84.7 top-1 at 384x384.
    • Add distilled BiT 50x1 student and 152x2 Teacher weights from Knowledge distillation: A good teacher is patient and consistent
    • NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
      • weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
      • eps values adjusted, will be slight differences but should be quite close
    • Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models ...
  5. M

    Machine Learning Framework Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Machine Learning Framework Report [Dataset]. https://www.datainsightsmarket.com/reports/machine-learning-framework-1989715
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jul 21, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Machine Learning Framework market is experiencing robust growth, driven by the increasing adoption of AI and machine learning across diverse industries. The market's expansion is fueled by several key factors, including the rising availability of large datasets, advancements in computing power (especially cloud computing), and the growing demand for automated decision-making and predictive analytics. Organizations across sectors, from healthcare and finance to manufacturing and retail, are leveraging machine learning frameworks to improve operational efficiency, enhance customer experiences, and gain a competitive edge. The increasing complexity of data analysis tasks and the need for specialized tools are also contributing to market growth. We estimate the 2025 market size to be around $15 billion, considering the significant investments made by major technology players and the accelerating adoption rates. A conservative Compound Annual Growth Rate (CAGR) of 20% is projected for the forecast period (2025-2033), indicating a substantial increase in market value over the next decade. However, several challenges persist. The high cost of implementation and maintenance of machine learning solutions, particularly for smaller businesses, remains a significant restraint. Moreover, the lack of skilled professionals proficient in machine learning and data science creates a talent gap that hinders broader adoption. Security and ethical concerns related to AI and the potential for bias in algorithms also present obstacles to the market's continued expansion. Despite these challenges, the long-term outlook for the machine learning framework market remains positive, with continuous innovation and the development of more user-friendly tools expected to drive further growth. The segmentation of the market encompasses open-source and commercial frameworks, cloud-based and on-premise solutions, and industry-specific applications. The competitive landscape is dominated by established technology giants like Google, Microsoft, Amazon, and IBM, alongside a growing number of specialized providers and open-source contributors. This competitive dynamic fosters innovation and accessibility.

  6. n

    neural network Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Aug 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). neural network Report [Dataset]. https://www.datainsightsmarket.com/reports/neural-network-1493802
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Aug 4, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    CA
    Variables measured
    Market Size
    Description

    The global neural network market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) across diverse sectors. The market's expansion is fueled by several key factors, including the rising availability of large datasets, advancements in computing power (particularly GPUs from NVIDIA and cloud services from AWS), and the development of more sophisticated neural network architectures capable of handling complex tasks. The period from 2019 to 2024 witnessed significant progress, laying the groundwork for accelerated growth in the coming years. While precise figures are unavailable, considering the average Compound Annual Growth Rate (CAGR) of similar AI segments, a reasonable estimate for the 2025 market size would be around $15 billion, with a projected CAGR of 25% from 2025 to 2033. This growth is further supported by the continuous innovation in deep learning techniques and their successful implementation in various applications, ranging from image and speech recognition to natural language processing and autonomous vehicles. Key players like Google (via TensorFlow), Microsoft, IBM, and Amazon (AWS) are heavily investing in research and development, expanding their neural network offerings, and driving widespread adoption. The market segmentation is likely diverse, encompassing various types of neural networks (convolutional, recurrent, etc.), deployment models (cloud, on-premise), and industry verticals (healthcare, finance, automotive, etc.). While specific segment breakdowns are not provided, the projected growth indicates strong potential across all these areas. Restraining factors include the high computational costs associated with training complex neural networks, the need for specialized expertise, and concerns around data privacy and security. However, these challenges are being addressed through advancements in hardware, software, and data management techniques, paving the way for continued market expansion. The forecast period of 2025-2033 promises significant opportunities for companies offering neural network solutions, with substantial growth anticipated across all regions, particularly in North America and Asia-Pacific regions known for their technological advancements and substantial investments in AI research.

  7. Data from: WaveFake: A data set to facilitate audio DeepFake detection

    • zenodo.org
    bin, zip
    Updated Nov 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Frank; Joel Frank; Lea Schönherr; Lea Schönherr (2021). WaveFake: A data set to facilitate audio DeepFake detection [Dataset]. http://doi.org/10.5281/zenodo.4904579
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Nov 3, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joel Frank; Joel Frank; Lea Schönherr; Lea Schönherr
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The main purpose of this data set is to facilitate research into audio DeepFakes.
    These generated media files have been increasingly used to commit impersonation attempts or online harassment.

    The data set consists of 88,600 generated audio clips (16-bit PCM wav).
    All of these samples were generated by four different neural network architectures:

    Additionally, we examined a bigger version of MelGAN and investigated a variant of Multi-Band MelGAN that computes its auxiliary loss over the full audio instead of its subbands.

    Collection Process

    For WaveGlow, we utilize the official implementation (commit 8afb643) in conjunction with the official pre-trained network on PyTorch Hub.
    We use a popular implementation available on GitHub (commit 12c677e) for the remaining networks.
    The repository also offers pre-trained models.
    We used the pre-trained networks to generate samples that are similar to their respective training distributions, LJ Speech and JSUT.
    When sampling the data set, we first extract Mel spectrograms from the original audio files, using the pre-processing scripts of the corresponding repositories.
    We then feed these Mel spectrograms to the respective models to obtain the data set.

    This data set is licensed with a CC-BY-SA 4.0 license.

    This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -- EXC-2092 CaSa -- 390781972.

  8. f

    Overview of deep learning terminology.

    • plos.figshare.com
    xls
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron E. Maxwell; Sarah Farhadpour; Srinjoy Das; Yalin Yang (2024). Overview of deep learning terminology. [Dataset]. http://doi.org/10.1371/journal.pone.0315127.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Aaron E. Maxwell; Sarah Farhadpour; Srinjoy Das; Yalin Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Convolutional neural network (CNN)-based deep learning (DL) methods have transformed the analysis of geospatial, Earth observation, and geophysical data due to their ability to model spatial context information at multiple scales. Such methods are especially applicable to pixel-level classification or semantic segmentation tasks. A variety of R packages have been developed for processing and analyzing geospatial data. However, there are currently no packages available for implementing geospatial DL in the R language and data science environment. This paper introduces the geodl R package, which supports pixel-level classification applied to a wide range of geospatial or Earth science data that can be represented as multidimensional arrays where each channel or band holds a predictor variable. geodl is built on the torch package, which supports the implementation of DL using the R and C++ languages without the need for installing a Python/PyTorch environment. This greatly simplifies the software environment needed to implement DL in R. Using geodl, geospatial raster-based data with varying numbers of bands, spatial resolutions, and coordinate reference systems are read and processed using the terra package, which makes use of C++ and allows for processing raster grids that are too large to fit into memory. Training loops are implemented with the luz package. The geodl package provides utility functions for creating raster masks or labels from vector-based geospatial data and image chips and associated masks from larger files and extents. It also defines a torch dataset subclass for geospatial data for use with torch dataloaders. UNet-based models are provided with a variety of optional ancillary modules or modifications. Common assessment metrics (i.e., overall accuracy, class-level recalls or producer’s accuracies, class-level precisions or user’s accuracies, and class-level F1-scores) are implemented along with a modified version of the unified focal loss framework, which allows for defining a variety of loss metrics using one consistent implementation and set of hyperparameters. Users can assess models using standard geospatial and remote sensing metrics and methods and use trained models to predict to large spatial extents. This paper introduces the geodl workflow, design philosophy, and goals for future development.

  9. D

    Deep Learning System Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Deep Learning System Software Report [Dataset]. https://www.datainsightsmarket.com/reports/deep-learning-system-software-1444412
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Deep Learning System Software market is experiencing robust growth, driven by the increasing adoption of AI across various industries. The market's expansion is fueled by the need for efficient and scalable solutions to handle the massive datasets required for training sophisticated deep learning models. Key factors contributing to this growth include the proliferation of cloud computing services offering readily accessible deep learning platforms, the development of more powerful and energy-efficient hardware (GPUs and specialized AI chips), and the rising demand for automated decision-making systems in sectors like healthcare, finance, and manufacturing. The market is segmented by software type (e.g., frameworks, libraries, tools), deployment model (cloud, on-premise), and industry vertical. Leading players like Microsoft, Nvidia, Google (Alphabet), and Intel are actively investing in R&D and strategic acquisitions to strengthen their market positions. Competition is intense, with companies focusing on providing specialized solutions tailored to specific industry needs and improving the ease of use and accessibility of their software. While challenges remain, such as the need for skilled data scientists and the ethical considerations surrounding AI deployment, the overall market outlook remains positive, projecting significant expansion over the forecast period. Despite the positive outlook, several restraints could potentially hinder market growth. These include the high cost of implementation, the complexity of deep learning systems requiring specialized expertise, and concerns regarding data security and privacy. The need for continuous updates and maintenance to keep pace with technological advancements also presents a challenge. However, ongoing research and development in areas such as automated machine learning (AutoML) and edge AI are expected to mitigate some of these challenges. The market is likely to witness increased consolidation as larger players acquire smaller companies with specialized technologies. Furthermore, the growing importance of data annotation and model explainability will create new market opportunities for specialized service providers. The future of the Deep Learning System Software market is characterized by innovation, competition, and the ongoing need to address ethical and practical concerns. We expect the market to demonstrate a steady and considerable increase in value throughout the forecast period.

  10. D

    Data Science And Ml Platforms Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Science And Ml Platforms Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/data-science-and-ml-platforms-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Science And ML Platforms Market Outlook



    The global market size for Data Science and ML Platforms was estimated to be approximately USD 78.9 billion in 2023, and it is projected to reach around USD 307.6 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 16.4% during the forecast period. This remarkable growth can be largely attributed to the increasing adoption of artificial intelligence (AI) and machine learning (ML) across various industries to enhance operational efficiency, predictive analytics, and decision-making processes.



    The surge in big data and the necessity to make sense of unstructured data is a substantial growth driver for the Data Science and ML Platforms market. Organizations are increasingly leveraging data science and machine learning to gain insights that can help them stay competitive. This is especially true in sectors like retail and e-commerce where customer behavior analytics can lead to more targeted marketing strategies, personalized shopping experiences, and improved customer retention rates. Additionally, the proliferation of IoT devices is generating massive amounts of data, which further fuels the need for advanced data analytics platforms.



    Another significant growth factor is the increasing adoption of cloud-based solutions. Cloud platforms offer scalable resources, flexibility, and substantial cost savings, making them attractive for enterprises of all sizes. Cloud-based data science and machine learning platforms also facilitate collaboration among distributed teams, enabling more efficient workflows and faster time-to-market for new products and services. Furthermore, advancements in cloud technologies, such as serverless computing and containerization, are making it easier for organizations to deploy and manage their data science models.



    Investment in AI and ML by key industry players also plays a crucial role in market growth. Tech giants like Google, Amazon, Microsoft, and IBM are making substantial investments in developing advanced AI and ML tools and platforms. These investments are not only driving innovation but also making these technologies more accessible to smaller enterprises. Additionally, mergers and acquisitions in this space are leading to more integrated and comprehensive solutions, which are further accelerating market growth.



    Machine Learning Tools are at the heart of this technological evolution, providing the necessary frameworks and libraries that empower developers and data scientists to create sophisticated models and algorithms. These tools, such as TensorFlow, PyTorch, and Scikit-learn, offer a range of functionalities from data preprocessing to model deployment, catering to both beginners and experts. The accessibility and versatility of these tools have democratized machine learning, enabling a wider audience to harness the power of AI. As organizations continue to embrace digital transformation, the demand for robust machine learning tools is expected to grow, driving further innovation and development in this space.



    From a regional perspective, North America is expected to hold the largest market share due to the early adoption of advanced technologies and the presence of major market players. However, the Asia Pacific region is anticipated to exhibit the highest growth rate during the forecast period. This is driven by increasing investments in AI and ML, a burgeoning start-up ecosystem, and supportive government policies aimed at digital transformation. Countries like China, India, and Japan are at the forefront of this growth, making significant strides in AI research and application.



    Component Analysis



    When analyzing the Data Science and ML Platforms market by component, it's essential to differentiate between software and services. The software segment includes platforms and tools designed for data ingestion, processing, visualization, and model building. These software solutions are crucial for organizations looking to harness the power of big data and machine learning. They provide the necessary infrastructure for data scientists to develop, test, and deploy ML models. The software segment is expected to grow significantly due to ongoing advancements in AI algorithms and the increasing need for more sophisticated data analysis tools.



    The services segment in the Data Science and ML Platforms market encompasses consulting, system integration, and support services. Consulting services help organizatio

  11. Sentence/Table Pair Data from Wikipedia for Pre-training with...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Oct 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. http://doi.org/10.5281/zenodo.5612316
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

    There are two files:

    sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

    table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

    The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

    For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

    Below is a sample code snippet to load the data

    import webdataset as wds
    
    # path to the uncompressed files, should be a directory with a set of tar files
    url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
    dataset = (
      wds.Dataset(url)
      .shuffle(1000) # cache 1000 samples and shuffle
      .decode()
      .to_tuple("json")
      .batched(20) # group every 20 examples into a batch
    )
    
    # Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
    # You can also iterate through all examples and dump them with your preferred data format

    Below we show how the data is organized with two examples.

    Text-only

    {'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
     's1_all_links': {
      'Sils,_Girona': [[0, 4]],
      'municipality': [[10, 22]],
      'Comarques_of_Catalonia': [[30, 37]],
      'Selva': [[41, 46]],
      'Catalonia': [[51, 60]]
     }, # list of entities and their mentions in the sentence (start, end location)
     'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
      {
        'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
        's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
        's2s': [ # list of other sentences that contain the common entity pair, or evidence
         {
           'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
           'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
           's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
           'pair_locs': [ # mentions of the entity pair in the evidence
            [[19, 27]], # mentions of entity 1
            [[0, 5], [288, 293]] # mentions of entity 2
           ],
           'all_links': {
            'Selva': [[0, 5], [288, 293]],
            'Comarques_of_Catalonia': [[19, 27]],
            'Catalonia': [[40, 49]]
           }
          }
        ,...] # there are multiple evidence sentences
       },
     ,...] # there are multiple entity pairs in the query
    }

    Hybrid

    {'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
     's1_all_links': {...}, # same as text-only
     'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
     'table_pairs': [
      'tid': 'Major_League_Baseball-1',
      'text':[
        ['World Series Records', 'World Series Records', ...],
        ['Team', 'Number of Series won', ...],
        ['St. Louis Cardinals (NL)', '11', ...],
      ...] # table content, list of rows
      'index':[
        [[0, 0], [0, 1], ...],
        [[1, 0], [1, 1], ...],
      ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
      'value_ranks':[
        [0, 0, ...],
        [0, 0, ...],
        [0, 10, ...],
      ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
      'value_inv_ranks': [], # inverse rank
      'all_links':{
        'St._Louis_Cardinals': {
         '2': [
          [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
         ] # list of mentions in the second row, the key is row_id
        },
        'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
      }
      'name': '', # table name, if exists
      'pairs': {
        'pair': ['American_League', 'National_League'],
        's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
        'table_pair_locs': {
         '17': [ # mention of entity pair in row 17
           [
            [[17, 0], [3, 18]],
            [[17, 1], [3, 18]],
            [[17, 2], [3, 18]],
            [[17, 3], [3, 18]]
           ], # mention of the first entity
           [
            [[17, 0], [21, 36]],
            [[17, 1], [21, 36]],
           ] # mention of the second entity
         ]
        }
       }
     ]
    }

  12. h

    SPIRE_EMA_CORPUS

    • huggingface.co
    Updated Jul 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sathvik Udupa (2025). SPIRE_EMA_CORPUS [Dataset]. https://huggingface.co/datasets/viks66/SPIRE_EMA_CORPUS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2025
    Authors
    Sathvik Udupa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This corpus contains paired data of speech, articulatory movements and phonemes. There are 38 speakers in the corpus, each with 460 utterances. The raw audio files are in audios.zip. The ema data and preprocessed data is stored in processed.zip. The processed data can be loaded with pytorch and has the following keys -

    ema_raw : The raw ema data

    ema_clipped : The ema data after trimming using being-end time stamps

    ema_trimmed_and_normalised_with_6_articulators: The ema data after trimming
 See the full description on the dataset page: https://huggingface.co/datasets/viks66/SPIRE_EMA_CORPUS.

  13. Data from: Data and scripts from: “Denoising autoencoder for reconstructing...

    • osti.gov
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bi, Xiangyu; Chou, Chunwei; Johnsen, Timothy; Ramakrishnan, Lavanya; Skone, Jonathan; Varadharajan, Charuleka; Wu, Yuxin (2025). Data and scripts from: “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification” [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2561511
    Explore at:
    Dataset updated
    Jan 1, 2025
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Watershed Function SFA
    Authors
    Bi, Xiangyu; Chou, Chunwei; Johnsen, Timothy; Ramakrishnan, Lavanya; Skone, Jonathan; Varadharajan, Charuleka; Wu, Yuxin
    Description

    This data package includes data and scripts from the manuscript “Denoising autoencoder for reconstructing sensor observation data and predicting evapotranspiration: noisy and missing values repair and uncertainty quantification”.The study addressed common challenges faced in environmental sensing and modeling, including uncertain input data, missing sensor observations, and high-dimensional datasets with interrelated but redundant variables. Point-scaled meteorological and soil sensor observations were perturbed with noises and missing values, and denoising autoencoder (DAE) neural networks were developed to reconstruct the perturbed data and further predict evapotranspiration. This study concluded that (1) the reconstruction quality of each variable depends on its cross-correlation and alignment to the underlying data structure, (2) uncertainties from the models were overall stronger than those from the data corruption, and (3) there was a tradeoff between reducing bias and reducing variance when evaluating the uncertainty of the machine learning models.This package includes:(1) Four ipython scripts (.ipynb): “DAE_train.ipynb” trains and evaluates DAE neural networks, “DAE_predict.ipynb” makes predictions from the trained DAE models, “ET_train.ipynb” trains and evaluates ET prediction neural networks, and “ET_predict.ipynb” makes predictions from trained ET models.(2) One python file (.py): “methods.py” includes all user-defined functions and python codes used in the ipython scripts.(3) A “sub_models” folder that includes fivemore » trained DAE neural networks (in pytorch format, .pt), which could be used to ingest input data before being fed to the downstream ET models in ‘ET_train.ipynb” or ‘ET_predict.ipynb’.(4) Two data files (.csv). Daily meteorological, vegetation, and soil data is in “df_data.csv”, where “df_meta.csv” contains the location and time information of “df_data.csv”. Each row (index) in “df_meta.csv” corresponds to each row in “df_data.csv”. These data files are formatted to follow the data structure requirements and be directly used in the ipython scripts, and they have been shuffled chronologically to train machine learning models. The meteorological and soil data was collected using point sensors between 2019-2023 at(4.a) Three shrub-dominated field sites in East River, Colorado (named “ph1”, “ph2” and “sg5” in “df_meta.csv”, where “ph1” and “ph2” were located at PumpHouse Hillslopes, and “sg5” was at Snodgrass Mountain meadow) and(4.b) One outdoor, mesoscale, and herbaceous-dominated experiment in Berkeley, California (named “tb” in “df_meta.csv”, short for Smartsoils Testbed at Lawrence Berkeley National Lab).- See "df_data_dd.csv" and "df_meta_dd.csv" for variable descriptions and the Methods section for additional data processing steps. See "flmd.csv" and "README.txt" for brief file descriptions.- All ipython scripts and python files are written in and require PYTHON language software.« less

  14. Z

    Dataset for class comment analysis

    • data.niaid.nih.gov
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
    Explore at:
    Dataset updated
    Feb 22, 2022
    Dataset authored and provided by
    Pooja Rani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

    Structure

    Projects/
      Java_projects/
        eclipse.zip
        guava.zip
        guice.zip
        hadoop.zip
        spark.zip
        vaadin.zip
    
      Pharo_projects/
        images/
          GToolkit.zip
          Moose.zip
          PetitParser.zip
          Pillar.zip
          PolyMath.zip
          Roassal2.zip
          Seaside.zip
    
        vm/
          70-x64/Pharo
    
        Scripts/
          ClassCommentExtraction.st
          SampleSelectionScript.st    
    
      Python_projects/
        django.zip
        ipython.zip
        Mailpile.zip
        pandas.zip
        pipenv.zip
        pytorch.zip   
        requests.zip 
      
    

    Contents of the Replication Package

    Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

    • Pharo_projects/

      • images/ -

        • GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
      • vm/ -

      • 70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

      • Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

      • ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

      • SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

    • Python_projects/

      • django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django
      • ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython
      • Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile
      • pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas
      • pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv
      • pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch
      • requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
  15. v

    Virginia Tech Natural Motion Dataset

    • data.lib.vt.edu
    xlsx
    Updated Jun 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Geissinger; Alan Asbeck; Mohammad Mehdi Alemi; S. Emily Chang (2021). Virginia Tech Natural Motion Dataset [Dataset]. http://doi.org/10.7294/2v3w-sb92
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2021
    Dataset provided by
    University Libraries, Virginia Tech
    Authors
    Jack Geissinger; Alan Asbeck; Mohammad Mehdi Alemi; S. Emily Chang
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    The Virginia Tech Natural Motion Dataset contains 40 hours of unscripted human motion (full body kinematics) collected in the open world using an XSens MVN Link system. In total, there are data from 17 participants (13 participants on a college campus and 4 at a home improvement store). Participants did a wide variety of activities, including: walking from one place to another; operating machinery; talking with others; manipulating objects; working at a desk; driving; eating; pushing/pulling carts and dollies; physical exercises such as jumping jacks, jogging, and pushups; sweeping; vacuuming; and emptying a dishwasher. The code for analyzing the data is freely available with this dataset and also at: https://github.com/ARLab-VT/VT-Natural-Motion-Processing. The portion of the dataset involving workers was funded by Lowe's, Inc.

  16. Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for...

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Data of "Self-consistency Reinforced minimal Gated Recurrent Unit for surrogate modeling of history-dependent non-linear problems: application to history-dependent homogenized response of heterogeneous materials" [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-10551272?locale=ro
    Explore at:
    unknown(26347)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Development of the Self-Consistency reinforced Minimum Recurrent Unit (SC-MRU) This directory contains the data and algorithms generated in publication1 Table of Contents Dependencies and Prerequisites Structure of Repository Part 1: Data preparation Part 2: RNN training Part 3: Multiscale analysis Part 4: Reproduce paper[^1] figures Dependencies and Prerequisites Python, pandas, matplotlib, texttabble and latextable are pre requisites for visualizing and navigating the data. For generating mesh and for vizualization, gmsh (www.gmsh.info) is required. For running simulations, cm3Libraries (http://www.ltas-cm3.ulg.ac.be/openSource.htm) is required. Instructions using apt & pip3 package manager Instructions for Debian/Ubuntu based workstations are as follows. python, pandas and dependencies sudo apt install python3 python3-scipy libpython3-dev python3-numpy python3-pandas matplotlib, texttabble and latextable pip3 install matplotlib texttable latextable Pytorch (only for run with cm3Libraries) Without GPU pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu With GPU pip3 install torch torchvision torchaudio Libtorch (for compiling the cells) Without GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch) wget https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.1.1%2Bcpu.zip unzip libtorch-shared-with-deps-2.1.1%2Bcpu.zip With GPU: In a local directory (e.g. ~/local with export TORCHDIR=$HOME/local/libtorch) wget https://download.pytorch.org/libtorch/cu121/libtorch-shared-with-deps-2.1.1%2Bcu121.zip unzip libtorch-shared-with-deps-2.1.1+cu121.zip Structure of Repository All_Path_Res: results of the direct numerical simulations used as training and testing data, see details in Part 1: Data preparation. ConstRVE: script to run direct numerical finite element simulations, see details in Part 1: Data preparation. MultiScale: scripts to run and visualise the multiscale analyses, see details in Part 3: Multiscale analysis. SC_MRU: implementation of the RNN and scripts to train them, see details in Part 2: RNN training. TrainingData: scripts to collect, normalise and truncate the RVEs direct simulation results as training and testing data, see details in Part 1: Data preparation. The director also contained the storred processed data used in 1. TrainingPaths: scripts to generate the different loading paths for the direct numerical simulations used as training and testing data, see details in Part 1: Data preparation. Part 1: Data preparation Generate the loading paths TrainingPaths/testGenerationData.py is used to generate random walk paths, with the options Rmax = 0.11 # bound on the final Green Lagrange strain TimeStep = 1. # in second EvalStep = [1e-4,5e-3] #Bounds on the Green Lagrange increments Nmax = 2500 #maximum length of the sequence k = 4000 # number of path to generate The path are storred by default in ConstRVE/Paths/. The path has to be existing before launching the script. You can change the name in line 123 saveDir = '../ConstRVE'+'/Paths/'. Examples of generated paths can be found in ConstRVE/PathsExamples/ The command to be run from the directory TrainingPaths is (mkdir ../ConstRVE/Paths) #if needed python3 testGenerationData.py TrainingPaths/generationData_Cyclic.py is used to generate random cylic paths, with the options Rmax = [np.random.uniform(0.,0.04),np.random.uniform(0.,0.06),np.random.uniform(0.0,0.09),0.12] # bound on the final Green Lagrange strain is random TimeStep = 1. # in second EvalStep = [1e-4,5e-3] #Bounds on the Green Lagrange increments Nmax = 2500 #maximum length of the sequence k = 2000 # number of path to generate The path are stored by default in ConstRVE/Paths/. You can change the name in line 123 saveDir = '../ConstRVE'+'/Paths/'. The command to be run from the directory TrainingPaths is (mkdir ../ConstRVE/Paths) #if needed python3 generationData_Cyclic.py TrainingPaths/countPathLength.py gives average, minimum and maximum lengths of the generated paths and the distribution of the \Delta R. By default the paths are read in ConstRVE/Paths/ but the directory can be given as an argument. The file can be used to read either the generated loading paths python3 countPathLength.py '../ConstRVE/PathsExamples' or the results of the simulations python3 countPathLength.py '../All_Path_Res/Path_Res9' TrainingPaths/graphData.py generates illustrations from randomly picked paths in ConstRVE/Paths/ and generate png figures. Generate the RVEs direct simulation results Uses the loading paths existing in ConstRVE/Paths/. ConstRVE/rve.geo is the RVE geometry file that can be read by gmsh (www.gmsh.info). ConstRVE/rve.msh is the RVE mesh file that can be read by gmsh (www.gmsh.info). ConstRVE/utilsFunc.py contains python tools to be used. ConstRVE/Rve_withoutInternalVars.py is used to run all the RVE simulations: This requires cm3Libraries (http://www.ltas-cm3.ulg.ac.be/openSource.htm). All the ouptus are st

  17. Z

    Personal Protective Equipment Dataset (PPED)

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2022). Personal Protective Equipment Dataset (PPED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6551757
    Explore at:
    Dataset updated
    May 17, 2022
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Personal Protective Equipment Dataset (PPED)

    This dataset serves as a benchmark for PPE in chemical plants We provide datasets and experimental results.

    1. The dataset

    We produced a data set based on the actual needs and relevant regulations in chemical plants. The standard GB 39800.1-2020 formulated by the Ministry of Emergency Management of the People’s Republic of China defines the protective requirements for plants and chemical laboratories. The complete dataset is contained in the folder PPED/data.

    1.1. Image collection

    We took more than 3300 pictures. We set the following different characteristics, including different environments, different distances, different lighting conditions, different angles, and the diversity of the number of people photographed.

    Backgrounds: There are 4 backgrounds, including office, near machines, factory and regular outdoor scenes.

    Scale: By taking pictures from different distances, the captured PPEs are classified in small, medium and large scales.

    Light: Good lighting conditions and poor lighting conditions were studied.

    Diversity: Some images contain a single person, and some contain multiple people.

    Angle: The pictures we took can be divided into front and side.

    A total of more than 3300 photos were taken in the raw data under all conditions. All images are located in the folder “PPED/data/JPEGImages”.

    1.2. Label

    We use Labelimg as the labeling tool, and we use the PASCAL-VOC labelimg format. Yolo use the txt format, we can use trans_voc2yolo.py to convert the XML file in PASCAL-VOC format to txt file. Annotations are stored in the folder PPED/data/Annotations

    1.3. Dataset Features

    The pictures are made by us according to the different conditions mentioned above. The file PPED/data/feature.csv is a CSV file which notes all the .os of all the image. It records every feature of the picture, including lighting conditions, angles, backgrounds, number of people and scale.

    1.4. Dataset Division

    The data set is divided into 9:1 training set and test set.

    1. Baseline Experiments

    We provide baseline results with five models, namely Faster R-CNN Âź, Faster R-CNN (M), SSD, YOLOv3-spp, and YOLOv5. All code and results is given in folder PPED/experiment.

    2.1. Environment and Configuration:

    Intel Core i7-8700 CPU

    NVIDIA GTX1060 GPU

    16 GB of RAM

    Python: 3.8.10

    pytorch: 1.9.0

    pycocotools: pycocotools-win

    Windows 10

    2.2. Applied Models

    The source codes and results of the applied models is given in folder PPED/experiment with sub-folders corresponding to the model names.

    2.2.1. Faster R-CNN

    Faster R-CNN

    backbone: resnet50+fpn

    We downloaded the pre-training weights from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth.

    We modified the dataset path, training classes and training parameters including batch size.

    We run train_res50_fpn.py start training.

    Then, the weights are trained by the training set.

    Finally, we validate the results on the test set.

    backbone: mobilenetv2

    the same training method as resnet50+fpn, but the effect is not as good as resnet50+fpn, so it is directly discarded.

    The Faster R-CNN source code used in our experiment is given in folder PPED/experiment/Faster R-CNN. The weights of the fully-trained Faster R-CNN (R), Faster R-CNN (M) model are stored in file PPED/experiment/trained_models/resNetFpn-model-19.pth and mobile-model.pth. The performance measurements of Faster R-CNN (R) Faster R-CNN (M) are stored in folder PPED/experiment/results/Faster RCNN(R)and Faster RCNN(M).

    2.2.2. SSD

    backbone: resnet50

    We downloaded pre-training weights from https://download.pytorch.org/models/resnet50-19c8e357.pth.

    The same training method as Faster R-CNN is applied.

    The SSD source code used in our experiment is given in folder PPED/experiment/ssd. The weights of the fully-trained SSD model are stored in file PPED/experiment/trained_models/SSD_19.pth. The performance measurements of SSD are stored in folder PPED/experiment/results/SSD.

    2.2.3. YOLOv3-spp

    backbone: DarkNet53

    We modified the type information of the XML file to match our application.

    We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

    The weights used are: yolov3-spp-ultralytics-608.pt.

    The YOLOv3-spp source code used in our experiment is given in folder PPED/experiment/YOLOv3-spp. The weights of the fully-trained YOLOv3-spp model are stored in file PPED/experiment/trained_models/YOLOvspp-19.pt. The performance measurements of YOLOv3-spp are stored in folder PPED/experiment/results/YOLOv3-spp.

    2.2.4. YOLOv5

    backbone: CSP_DarkNet

    We modified the type information of the XML file to match our application.

    We run trans_voc2yolo.py to convert the XML file in VOC format to a txt file.

    The weights used are: yolov5s.

    The YOLOv5 source code used in our experiment is given in folder PPED/experiment/yolov5. The weights of the fully-trained YOLOv5 model are stored in file PPED/experiment/trained_models/YOLOv5.pt. The performance measurements of YOLOv5 are stored in folder PPED/experiment/results/YOLOv5.

    2.3. Evaluation

    The computed evaluation metrics as well as the code needed to compute them from our dataset are provided in the folder PPED/experiment/eval.

    1. Code Sources

    Faster R-CNN (R and M)

    https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/faster_rcnn

    official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/faster_rcnn.py

    SSD

    https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/ssd

    official code: https://github.com/pytorch/vision/blob/main/torchvision/models/detection/ssd.py

    YOLOv3-spp

    https://github.com/WZMIAOMIAO/deep-learning-for-image-processing/tree/master/pytorch_object_detection/yolov3-spp

    YOLOv5

    https://github.com/ultralytics/yolov5

  18. d

    Using deep convolutional neural networks to forecast spatial patterns of...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Ball; Katerina Petrova; David Coomes; Seth Flaxman (2022). Using deep convolutional neural networks to forecast spatial patterns of Amazonian deforestation: supporting data and outputs [Dataset]. http://doi.org/10.5061/dryad.hdr7sqvjz
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    Dryad
    Authors
    James Ball; Katerina Petrova; David Coomes; Seth Flaxman
    Time period covered
    Jun 6, 2022
    Description

    Input raster data all freely available online Original input raster data from:

    Global Forest Change - https://glad.earthengine.app/view/global-forest-change ALOS JAXA - https://www.eorc.jaxa.jp/ALOS/en/dataset/aw3d30/aw3d30_e.htm

    Processed with code at https://github.com/PatBall1/DeepForestcast Dataset includes:

    Input shapefiles for each study site. Input geotiff files (.tif) for each study site. Input PyTorch tensors (.pt) for each study site. Model weights (.pt) for trained networks (for testing and forecasting). Output deforestation forecasts for each study site as geotiffs (.tif).

  19. Fruit Infection Disease Dataset

    • kaggle.com
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikit kashyap (2022). Fruit Infection Disease Dataset [Dataset]. https://www.kaggle.com/datasets/nikitkashyap/fruit-infection-disease-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikit kashyap
    Description

    This dataset was exported via Kaggle.com on Nov 21, 2022 It includes 5494 images. Diseases are annotated in YOLO v7 PyTorch format. The following pre-processing was applied to each image: Auto-orientation of pixel data (with EXIF-orientation stripping) Resize to 416x416 (Stretch) No image augmentation techniques were applied. Classes: 1. Strawberry ['Angular Leafspot' 'Anthracnose Fruit Rot' 'Blossom Blight' 'Gray Mold' 'Leaf Spot' 'Powdery Mildew Fruit' 'Powdery Mildew Leaf'] 2. Tomato ['disease' 'leaf mold' 'spider mites' Bean 'ALS' 'Bean Rust']

  20. f

    Training set data expansion.

    • plos.figshare.com
    xls
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu (2024). Training set data expansion. [Dataset]. http://doi.org/10.1371/journal.pone.0299471.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson (2022). PyTorch geometric datasets for morphVQ models [Dataset]. http://doi.org/10.5061/dryad.bvq83bkcr

PyTorch geometric datasets for morphVQ models

Explore at:
zipAvailable download formats
Dataset updated
Sep 29, 2022
Dataset provided by
City University of New York
University of Illinois Urbana-Champaign
American Museum of Natural History
Authors
Oshane Thomas; Hongyu Shen; Ryan L. Rauum; William E. H. Harcourt-Smith; John D. Polk; Mark Hasegawa-Johnson
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

The methods of geometric morphometrics are commonly used to quantify morphology in a broad range of biological sciences. The application of these methods to large datasets is constrained by manual landmark placement limiting the number of landmarks and introducing observer bias. To move the field forward, we need to automate morphological phenotyping in ways that capture comprehensive representations of morphological variation with minimal observer bias. Here, we present Morphological Variation Quantifier (morphVQ), a shape analysis pipeline for quantifying, analyzing, and exploring shape variation in the functional domain. morphVQ uses descriptor learning to estimate the functional correspondence between whole triangular meshes in lieu of landmark configurations. With functional maps between pairs of specimens in a dataset, we can analyze and explore shape variation. morphVQ uses Consistent ZoomOut refinement to improve these functional maps and produce a new representation of shape variation and area-based and conformal (angular) latent shape space differences (LSSDs). We compare this new representation of shape variation to shape variables obtained via manual digitization and auto3DGM, an existing approach to automated morphological phenotyping. We find that LSSDs compare favorably to modern 3DGM and auto3DGM while being more computationally efficient. By characterizing whole surfaces, our method incorporates more morphological detail in shape analysis. We can classify known biological groupings, such as Genus affiliation with comparable accuracy. The shape spaces produced by our method are similar to those produced by modern 3DGM and to auto3DGM, and distinctiveness functions derived from LSSDs show us how shape variation differs between groups. morphVQ can capture shape in an automated fashion while avoiding the limitations of manually digitized landmarks and thus represents a novel and computationally efficient addition to the geometric morphometrics toolkit. Methods The main dataset consists of 102 triangular meshes from laser surface scans of hominoid cuboid bones. These cuboids were from wild-collected individuals housed in the American Museum of Natural History, the National Museum of Natural History, the Harvard Museum of Comparative Biology, and the Field Museum. Hylobates, Pongo, Gorilla, Pan, and Homo are all well represented. Each triangular mesh is denoised, remeshed, and cleaned using the Geomagic Studio Wrap Software. The resulting meshes vary in vertex-count/resolution from 2,000 - 390,000. Each mesh is then upsampled or decimated to an even 12,000 vertices using the recursive subdivisions process and quadric decimation algorithm implemented in VTK python. The first of the two smaller datasets is comprised of 26 hominoid medial cuneiforms meshes isolated from laser surface scans obtained from the same museum collections listed above. The second dataset comprises 33 mouse humeri meshes from micro-CT data (34.5 ÎŒm resolution using a Skyscan 1172). These datasets were processed identically to the 102 hominoid cuboid meshes introduced above.

Search
Clear search
Close search
Google apps
Main menu