51 datasets found
  1. Z

    MSL Curiosity Rover Images with Science and Engineering Classes

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Lu (2020). MSL Curiosity Rover Images with Science and Engineering Classes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3892023
    Explore at:
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Steven Lu
    Kiri L. Wagstaff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please note that the file msl-labeled-data-set-v2.1.zip below contains the latest images and labels associated with this data set.

    Data Set Description

    The data set consists of 6,820 images that were collected by the Mars Science Laboratory (MSL) Curiosity Rover by three instruments: (1) the Mast Camera (Mastcam) Left Eye; (2) the Mast Camera Right Eye; (3) the Mars Hand Lens Imager (MAHLI). With the help from Dr. Raymond Francis, a member of the MSL operations team, we identified 19 classes with science and engineering interests (see the "Classes" section for more information), and each image is assigned with 1 class label. We split the data set into training, validation, and test sets in order to train and evaluate machine learning algorithms. The training set contains 5,920 images (including augmented images; see the "Image Augmentation" section for more information); the validation set contains 300 images; the test set contains 600 images. The training set images were randomly sampled from sol (Martian day) range 1 - 948; validation set images were randomly sampled from sol range 949 - 1920; test set images were randomly sampled from sol range 1921 - 2224. All images are resized to 227 x 227 pixels without preserving the original height/width aspect ratio.

    Directory Contents

    images - contains all 6,820 images

    class_map.csv - string-integer class mappings

    train-set-v2.1.txt - label file for the training set

    val-set-v2.1.txt - label file for the validation set

    test-set-v2.1.txt - label file for the test set

    The label files are formatted as below:

    "Image-file-name class_in_integer_representation"

    Labeling Process

    Each image was labeled with help from three different volunteers (see Contributor list). The final labels are determined using the following processes:

    If all three labels agree with each other, then use the label as the final label.

    If the three labels do not agree with each other, then we manually review the labels and decide the final label.

    We also performed error analysis to correct labels as a post-processing step in order to remove noisy/incorrect labels in the data set.

    Classes

    There are 19 classes identified in this data set. In order to simplify our training and evaluation algorithms, we mapped the class names from string to integer representations. The names of classes, string-integer mappings, distributions are shown below:

    Class name, counts (training set), counts (validation set), counts (test set), integer representation

    Arm cover, 10, 1, 4, 0

    Other rover part, 190, 11, 10, 1

    Artifact, 680, 62, 132, 2

    Nearby surface, 1554, 74, 187, 3

    Close-up rock, 1422, 50, 84, 4

    DRT, 8, 4, 6, 5

    DRT spot, 214, 1, 7, 6

    Distant landscape, 342, 14, 34, 7

    Drill hole, 252, 5, 12, 8

    Night sky, 40, 3, 4, 9

    Float, 190, 5, 1, 10

    Layers, 182, 21, 17, 11

    Light-toned veins, 42, 4, 27, 12

    Mastcam cal target, 122, 12, 29, 13

    Sand, 228, 19, 16, 14

    Sun, 182, 5, 19, 15

    Wheel, 212, 5, 5, 16

    Wheel joint, 62, 1, 5, 17

    Wheel tracks, 26, 3, 1, 18

    Image Augmentation

    Only the training set contains augmented images. 3,920 of the 5,920 images in the training set are augmented versions of the remaining 2000 original training images. Images taken by different instruments were augmented differently. As shown below, we employed 5 different methods to augment images. Images taken by the Mastcam left and right eye cameras were augmented using a horizontal flipping method, and images taken by the MAHLI camera were augmented using all 5 methods. Note that one can filter based on the file names listed in the train-set.txt file to obtain a set of non-augmented images.

    90 degrees clockwise rotation (file name ends with -r90.jpg)

    180 degrees clockwise rotation (file name ends with -r180.jpg)

    270 degrees clockwise rotation (file name ends with -r270.jpg)

    Horizontal flip (file name ends with -fh.jpg)

    Vertical flip (file name ends with -fv.jpg)

    Acknowledgment

    The authors would like to thank the volunteers (as in the Contributor list) who provided annotations for this data set. We would also like to thank the PDS Imaging Note for the continuous support of this work.

  2. h

    Data from: SRL4ORL: Improving Opinion Role Labeling Using Multi-Task...

    • heidata.uni-heidelberg.de
    • tudatalib.ulb.tu-darmstadt.de
    zip
    Updated Feb 4, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana Marasovic; Ana Marasovic (2019). SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling [Source Code] [Dataset]. http://doi.org/10.11588/DATA/LWN9XE
    Explore at:
    zip(14676065)Available download formats
    Dataset updated
    Feb 4, 2019
    Dataset provided by
    heiDATA
    Authors
    Ana Marasovic; Ana Marasovic
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/LWN9XEhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/LWN9XE

    Description

    This repository contains code for reproducing experiments done in Marasovic and Frank (2018). Paper abstract: For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e. Semantic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model, which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis, we determine what works and what might be done to make further improvements for ORL. Data for ORL Download MPQA 2.0 corpus. Check mpqa2-pytools for example usage. Splits can be found in the datasplit folder. Data for SRL The data is provided by: CoNLL-2005 Shared Task, but the original words are from the Penn Treebank dataset, which is not publicly available. How to train models? python main.py --adv_coef 0.0 --model fs --exp_setup_id new --n_layers_orl 0 --begin_fold 0 --end_fold 4 python main.py --adv_coef 0.0 --model html --exp_setup_id new --n_layers_orl 1 --n_layers_shared 2 --begin_fold 0 --end_fold 4 python main.py --adv_coef 0.0 --model sp --exp_setup_id new --n_layers_orl 3 --begin_fold 0 --end_fold 4 python main.py --adv_coef 0.1 --model asp --exp_setup_id prior --n_layers_orl 3 --begin_fold 0 --end_fold 10

  3. n

    LandCoverNet Asia

    • access.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). LandCoverNet Asia [Dataset]. http://doi.org/10.34911/rdnt.63fxe5
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    LandCoverNet is a global annual land cover classification training dataset with labels for the multi-spectral satellite imagery from Sentinel-1, Sentinel-2 and Landsat-8 missions in 2018. LandCoverNet Asia contains data across Asia, which accounts for ~31% of the global dataset. Each pixel is identified as one of the seven land cover classes based on its annual time series. These classes are water, natural bare ground, artificial bare ground, woody vegetation, cultivated vegetation, (semi) natural vegetation, and permanent snow/ice.
    There are a total of 2753 image chips of 256 x 256 pixels in LandCoverNet South America V1.0 spanning 92 tiles. Each image chip contains temporal observations from the following satellite products with an annual class label, all stored in raster format (GeoTIFF files):
    * Sentinel-1 ground range distance (GRD) with radiometric calibration and orthorectification at 10m spatial resolution
    * Sentinel-2 surface reflectance product (L2A) at 10m spatial resolution
    * Landsat-8 surface reflectance product from Collection 2 Level-2

    Radiant Earth Foundation designed and generated this dataset with a grant from Schmidt Futures with additional support from NASA ACCESS, Microsoft AI for Earth and in kind technology support from Sinergise.

  4. n

    LandCoverNet North America

    • cmr.earthdata.nasa.gov
    • access.earthdata.nasa.gov
    Updated Oct 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). LandCoverNet North America [Dataset]. http://doi.org/10.34911/rdnt.jx15e8
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    LandCoverNet is a global annual land cover classification training dataset with labels for the multi-spectral satellite imagery from Sentinel-1, Sentinel-2 and Landsat-8 missions in 2018. LandCoverNet North America contains data across North America, which accounts for ~13% of the global dataset. Each pixel is identified as one of the seven land cover classes based on its annual time series. These classes are water, natural bare ground, artificial bare ground, woody vegetation, cultivated vegetation, (semi) natural vegetation, and permanent snow/ice.

    There are a total of 1561 image chips of 256 x 256 pixels in LandCoverNet North America V1.0 spanning 40 tiles. Each image chip contains temporal observations from the following satellite products with an annual class label, all stored in raster format (GeoTIFF files):
    * Sentinel-1 ground range distance (GRD) with radiometric calibration and orthorectification at 10m spatial resolution
    * Sentinel-2 surface reflectance product (L2A) at 10m spatial resolution
    * Landsat-8 surface reflectance product from Collection 2 Level-2

    Radiant Earth Foundation designed and generated this dataset with a grant from Schmidt Futures with additional support from NASA ACCESS, Microsoft AI for Earth and in kind technology support from Sinergise.

  5. Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human...

    • data.nist.gov
    • cloud.csiss.gmu.edu
    • +1more
    Updated Oct 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian DeCost (2020). Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models [Dataset]. http://doi.org/10.18434/mds2-2301
    Explore at:
    Dataset updated
    Oct 23, 2020
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Brian DeCost
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations. Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.

  6. n

    LandCoverNet Australia

    • access.earthdata.nasa.gov
    • cmr.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). LandCoverNet Australia [Dataset]. http://doi.org/10.34911/rdnt.0vgi25
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    LandCoverNet is a global annual land cover classification training dataset with labels for the multi-spectral satellite imagery from Sentinel-1, Sentinel-2 and Landsat-8 missions in 2018. LandCoverNet Australia contains data across Australia, which accounts for ~7% of the global dataset. Each pixel is identified as one of the seven land cover classes based on its annual time series. These classes are water, natural bare ground, artificial bare ground, woody vegetation, cultivated vegetation, (semi) natural vegetation, and permanent snow/ice.
    There are a total of 600 image chips of 256 x 256 pixels in LandCoverNet Australia V1.0 spanning 20 tiles. Each image chip contains temporal observations from the following satellite products with an annual class label, all stored in raster format (GeoTIFF files):
    * Sentinel-1 ground range distance (GRD) with radiometric calibration and orthorectification at 10m spatial resolution
    * Sentinel-2 surface reflectance product (L2A) at 10m spatial resolution
    * Landsat-8 surface reflectance product from Collection 2 Level-2

    Radiant Earth Foundation designed and generated this dataset with a grant from Schmidt Futures with additional support from NASA ACCESS, Microsoft AI for Earth and in kind technology support from Sinergise.

  7. Multi-Label Classification Dataset

    • kaggle.com
    Updated Jan 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivanand (2021). Multi-Label Classification Dataset [Dataset]. https://www.kaggle.com/datasets/shivanandmn/multilabel-classification-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shivanand
    Description

    Context

    NLP: Multi-label Classification Dataset.

    Content

    The dataset contains 6 different labels(Computer Science, Physics, Mathematics, Statistics, Quantitative Biology, Quantitative Finance) to classify the research papers based on Abstract and Title. The value 1 in label columns represents that label belongs to that paper. Each paper has multiple labels as 1.

    Acknowledgements

    This dataset is from Analytics Vidhya Hackthon

    Inspiration

    Do you solve it to get the best score?

  8. l-sized Training and Evaluation Data for Publication "Using Supervised...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Apr 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobias Weber; Tobias Weber (2020). l-sized Training and Evaluation Data for Publication "Using Supervised Learning to Classify Metadata of Research Data by Field of Study" [Dataset]. http://doi.org/10.5281/zenodo.3490460
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tobias Weber; Tobias Weber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records. This is the cleaned and vectorized version with a feature selection of large size.

  9. d

    Data from: Fashion conversation data on Instagram

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ha, Yu-i; Kwon, Sejeong; Cha, Meeyoung; Joo, Jungseock (2023). Fashion conversation data on Instagram [Dataset]. https://search.dataone.org/view/sha256%3Ac3de287ab8b375881b5922ac14887dfb46780a2bb7434e64bc7f71f2da7868fa
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Ha, Yu-i; Kwon, Sejeong; Cha, Meeyoung; Joo, Jungseock
    Description

    Our fashion dataset is composed of information about 24,752 posts by 13,350 people on Instagram. The data collection was done over a month period in January, 2015. We searched for posts mentioning 48 internationally renowned fashion brand names as hashtag. Our data contain information about hashtags as well as image features based on deep learning (Convolutional Neural Network or CNN). The list of learned features include selfies, body snaps, marketing shots, non-fashion, faces, logo, etc. Please refer to our paper for the full description of how we built our deep learning model.

  10. Quantitative Content Analysis Data for Hand Labeling Road Surface Conditions...

    • zenodo-rdm.web.cern.ch
    • data.niaid.nih.gov
    zip
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carly Sutter; Carly Sutter; Kara Sulia; Kara Sulia; Nick P. Bassill; Nick P. Bassill; Christopher D. Thorncroft; Christopher D. Wirz; Christopher D. Wirz; Vanessa Przybylo; Vanessa Przybylo; Mariana G. Cains; Mariana G. Cains; Jacob Radford; Jacob Radford; David Aaron Evans; David Aaron Evans; Christopher D. Thorncroft (2023). Quantitative Content Analysis Data for Hand Labeling Road Surface Conditions in New York State Department of Transportation Camera Images [Dataset]. http://doi.org/10.5281/zenodo.8370665
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 27, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carly Sutter; Carly Sutter; Kara Sulia; Kara Sulia; Nick P. Bassill; Nick P. Bassill; Christopher D. Thorncroft; Christopher D. Wirz; Christopher D. Wirz; Vanessa Przybylo; Vanessa Przybylo; Mariana G. Cains; Mariana G. Cains; Jacob Radford; Jacob Radford; David Aaron Evans; David Aaron Evans; Christopher D. Thorncroft
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Traffic camera images from the New York State Department of Transportation (511ny.org) are used to create a hand-labeled dataset of images classified into to one of six road surface conditions: 1) severe snow, 2) snow, 3) wet, 4) dry, 5) poor visibility, or 6) obstructed. Six labelers (authors Sutter, Wirz, Przybylo, Cains, Radford, and Evans) went through a series of four labeling trials where reliability across all six labelers were assessed using the Krippendorff’s alpha (KA) metric (Krippendorff, 2007). The online tool by Dr. Freelon (Freelon, 2013; Freelon, 2010) was used to calculate reliability metrics after each trial, and the group achieved inter-coder reliability with KA of 0.888 on the 4th trial. This process is known as quantitative content analysis, and three pieces of data used in this process are shared, including: 1) a PDF of the codebook which serves as a set of rules for labeling images, 2) images from each of the four labeling trials, including the use of New York State Mesonet weather observation data (Brotzge et al., 2020), and 3) an Excel spreadsheet including the calculated inter-coder reliability metrics and other summaries used to asses reliability after each trial.

    The broader purpose of this work is that the six human labelers, after achieving inter-coder reliability, can then label large sets of images independently, each contributing to the creation of larger labeled dataset used for training supervised machine learning models to predict road surface conditions from camera images. The xCITE lab (xCITE, 2023) is used to store camera images from 511ny.org, and the lab provides computing resources for training machine learning models.

  11. H

    Calcite data set with boolean label

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Feb 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vlad Ioan Tomescu (2022). Calcite data set with boolean label [Dataset]. http://doi.org/10.7910/DVN/VDBQVV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Vlad Ioan Tomescu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A data set consisting of various literature metrics on the Apache Calcite. The presence of defects is marked with boolean labels.

  12. n

    Open Cities AI Challenge Dataset

    • access.earthdata.nasa.gov
    • cmr.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Open Cities AI Challenge Dataset [Dataset]. http://doi.org/10.34911/rdnt.f94cxb
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This dataset was developed as part of a challenge to segment building footprints from aerial imagery. The goal of the challenge was to accelerate the development of more accurate, relevant, and usable open-source AI models to support mapping for disaster risk management in African cities [Read more about the challenge]. The data consists of drone imagery from 10 different cities and regions across Africa

  13. Codeforces Dataset

    • kaggle.com
    zip
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phan Dinh Khoi (2023). Codeforces Dataset [Dataset]. https://www.kaggle.com/datasets/phandinhkhoi/codeforces-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(811002069 bytes)Available download formats
    Dataset updated
    Jun 19, 2023
    Authors
    Phan Dinh Khoi
    Description

    Dataset

    This dataset was created by Phan Dinh Khoi

    Contents

  14. n

    AgriFieldNet Competition Dataset

    • cmr.earthdata.nasa.gov
    • access.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). AgriFieldNet Competition Dataset [Dataset]. http://doi.org/10.34911/rdnt.wu92p1
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This dataset contains crop types of agricultural fields in four states of Uttar Pradesh, Rajasthan, Odisha and Bihar in northern India. There are 13 different classes in the dataset including Fallow land and 12 crop types of Wheat, Mustard, Lentil, Green pea, Sugarcane, Garlic, Maize, Gram, Coriander, Potato, Bersem, and Rice. The dataset is split to train and test collections as part of the AgriFieldNet India Competition. Ground reference data for this dataset is collected by IDinsight’s Data on Demand team. Radiant Earth Foundation carried out the training dataset curation and publication. This training dataset is generated through a grant from the Enabling Crop Analytics at Scale (ECAAS) Initiative funded by The Bill & Melinda Gates Foundation and implemented by Tetra Tech.

  15. o

    Data from: Large-scale study of speech acts' development using automatic...

    • osf.io
    Updated May 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitja Nikolaus; Abdellah Fourtassi (2021). Large-scale study of speech acts' development using automatic labelling [Dataset]. https://osf.io/hvzs2
    Explore at:
    Dataset updated
    May 10, 2021
    Dataset provided by
    Center For Open Science
    Authors
    Mitja Nikolaus; Abdellah Fourtassi
    Description

    No description was included in this Dataset collected from the OSF

  16. H

    Replication data for: Independent Labels? The Power behind Environmental...

    • data.niaid.nih.gov
    • dataverse.harvard.edu
    Updated May 19, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bullock, Graham (2015). Replication data for: Independent Labels? The Power behind Environmental Information about Products and Companies [Dataset]. http://doi.org/10.7910/DVN/28279
    Explore at:
    text/x-stata-syntax; charset=us-ascii, tsv, xlsx, txtAvailable download formats
    Dataset updated
    May 19, 2015
    Dataset authored and provided by
    Bullock, Graham
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Power is a ubiquitous term in political science, and yet the discipline lacks a metric of power that can be applied to both formal and informal political contexts. Building on past work on power and power resources, this paper develops a method to estimate the power of different actors over an organization. It uses this method to analyze the power of the public, private, and civil sectors within an original dataset of 245 cases of corporate sustainability ratings and product eco-labels, such as ENERGY STAR, LEED Certification, and Newsweek's Greenest Company Rankings. These initiatives have received limited attention from the political science literature, but they have become an increasingly prominent political phenomenon. The paper finds that the private and civil sectors have more power over these information-based governance initiatives than the public sector. It also reveals their lack of transparency and hybrid accountability relationships, which complicate their legitimacy and effectiveness.

  17. Pressurized Water Reactor

    • kaggle.com
    Updated Aug 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avinash Bagul (2020). Pressurized Water Reactor [Dataset]. https://www.kaggle.com/avibagul80/pressurized-water-reactor/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Avinash Bagul
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Avinash Bagul

    Released under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

    Contents

  18. Supplementary material 1 from: Penev L (2017) From Open Access to Open...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyubomir Penev; Lyubomir Penev (2023). Supplementary material 1 from: Penev L (2017) From Open Access to Open Science from the viewpoint of a scholarly publisher. Research Ideas and Outcomes 3: e12265. https://doi.org/10.3897/rio.3.e12265 [Dataset]. http://doi.org/10.3897/rio.3.e12265.suppl1
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lyubomir Penev; Lyubomir Penev
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A presentation held by Lyubomir Penev in the iDiv Seminar Seies at the Biodiversity Informatics Unit of the German Centre for Integrative Biodiversity Research (iDiv) Leipzig, 15 February 2017.

  19. n

    ramp Building Footprint Dataset - Jashore, Bangladesh

    • cmr.earthdata.nasa.gov
    • access.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ramp Building Footprint Dataset - Jashore, Bangladesh [Dataset]. http://doi.org/10.34911/rdnt.wn4jmp
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This chipped training dataset is over Jashore and includes high-resolution imagery (.tif format) and corresponding building footprint vector labels (.geojson format) in 256 x 256 pixel tile/label pairs. This dataset is a ramp Tier 1 dataset, meaning it has been thoroughly reviewed and improved. This dataset was used in the development and testing of a localized ramp model and contains 7,310 tiles and 80,050 individual buildings. The satellite imagery resolution is 35 cm and was sourced from Maxar ODP (104001003BA7C900). Dataset keywords: Urban, Peri-urban, Rural

  20. [Data, Stimuli] A label indicating an old year of establishment improves...

    • osf.io
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomoki Maezawa (2023). [Data, Stimuli] A label indicating an old year of establishment improves evaluations of restaurants and shops serving traditional foods [Dataset]. https://osf.io/nf8sw
    Explore at:
    Dataset updated
    Jul 4, 2023
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Tomoki Maezawa
    Description

    No description was included in this Dataset collected from the OSF

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Steven Lu (2020). MSL Curiosity Rover Images with Science and Engineering Classes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3892023

MSL Curiosity Rover Images with Science and Engineering Classes

Explore at:
Dataset updated
Sep 17, 2020
Dataset provided by
Steven Lu
Kiri L. Wagstaff
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please note that the file msl-labeled-data-set-v2.1.zip below contains the latest images and labels associated with this data set.

Data Set Description

The data set consists of 6,820 images that were collected by the Mars Science Laboratory (MSL) Curiosity Rover by three instruments: (1) the Mast Camera (Mastcam) Left Eye; (2) the Mast Camera Right Eye; (3) the Mars Hand Lens Imager (MAHLI). With the help from Dr. Raymond Francis, a member of the MSL operations team, we identified 19 classes with science and engineering interests (see the "Classes" section for more information), and each image is assigned with 1 class label. We split the data set into training, validation, and test sets in order to train and evaluate machine learning algorithms. The training set contains 5,920 images (including augmented images; see the "Image Augmentation" section for more information); the validation set contains 300 images; the test set contains 600 images. The training set images were randomly sampled from sol (Martian day) range 1 - 948; validation set images were randomly sampled from sol range 949 - 1920; test set images were randomly sampled from sol range 1921 - 2224. All images are resized to 227 x 227 pixels without preserving the original height/width aspect ratio.

Directory Contents

images - contains all 6,820 images

class_map.csv - string-integer class mappings

train-set-v2.1.txt - label file for the training set

val-set-v2.1.txt - label file for the validation set

test-set-v2.1.txt - label file for the test set

The label files are formatted as below:

"Image-file-name class_in_integer_representation"

Labeling Process

Each image was labeled with help from three different volunteers (see Contributor list). The final labels are determined using the following processes:

If all three labels agree with each other, then use the label as the final label.

If the three labels do not agree with each other, then we manually review the labels and decide the final label.

We also performed error analysis to correct labels as a post-processing step in order to remove noisy/incorrect labels in the data set.

Classes

There are 19 classes identified in this data set. In order to simplify our training and evaluation algorithms, we mapped the class names from string to integer representations. The names of classes, string-integer mappings, distributions are shown below:

Class name, counts (training set), counts (validation set), counts (test set), integer representation

Arm cover, 10, 1, 4, 0

Other rover part, 190, 11, 10, 1

Artifact, 680, 62, 132, 2

Nearby surface, 1554, 74, 187, 3

Close-up rock, 1422, 50, 84, 4

DRT, 8, 4, 6, 5

DRT spot, 214, 1, 7, 6

Distant landscape, 342, 14, 34, 7

Drill hole, 252, 5, 12, 8

Night sky, 40, 3, 4, 9

Float, 190, 5, 1, 10

Layers, 182, 21, 17, 11

Light-toned veins, 42, 4, 27, 12

Mastcam cal target, 122, 12, 29, 13

Sand, 228, 19, 16, 14

Sun, 182, 5, 19, 15

Wheel, 212, 5, 5, 16

Wheel joint, 62, 1, 5, 17

Wheel tracks, 26, 3, 1, 18

Image Augmentation

Only the training set contains augmented images. 3,920 of the 5,920 images in the training set are augmented versions of the remaining 2000 original training images. Images taken by different instruments were augmented differently. As shown below, we employed 5 different methods to augment images. Images taken by the Mastcam left and right eye cameras were augmented using a horizontal flipping method, and images taken by the MAHLI camera were augmented using all 5 methods. Note that one can filter based on the file names listed in the train-set.txt file to obtain a set of non-augmented images.

90 degrees clockwise rotation (file name ends with -r90.jpg)

180 degrees clockwise rotation (file name ends with -r180.jpg)

270 degrees clockwise rotation (file name ends with -r270.jpg)

Horizontal flip (file name ends with -fh.jpg)

Vertical flip (file name ends with -fv.jpg)

Acknowledgment

The authors would like to thank the volunteers (as in the Contributor list) who provided annotations for this data set. We would also like to thank the PDS Imaging Note for the continuous support of this work.

Search
Clear search
Close search
Google apps
Main menu