100+ datasets found
  1. R

    Python_class Dataset

    • universe.roboflow.com
    zip
    Updated Nov 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    atk (2023). Python_class Dataset [Dataset]. https://universe.roboflow.com/atk-s1vv8/python_class/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 18, 2023
    Dataset authored and provided by
    atk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Atk Bounding Boxes
    Description

    Python_class

    ## Overview
    
    Python_class is a dataset for object detection tasks - it contains Atk annotations for 967 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  2. T

    imagenette

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imagenette [Dataset]. https://www.tensorflow.org/datasets/catalog/imagenette
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    Imagenette is a subset of 10 easily classified classes from the Imagenet dataset. It was originally prepared by Jeremy Howard of FastAI. The objective behind putting together a small version of the Imagenet dataset was mainly because running new ideas/algorithms/experiments on the whole Imagenet take a lot of time.

    This version of the dataset allows researchers/practitioners to quickly try out ideas and share with others. The dataset comes in three variants:

    • Full size
    • 320 px
    • 160 px

    Note: The v2 config correspond to the new 70/30 train/valid split (released in Dec 6 2019).

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imagenette', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/imagenette-full-size-v2-1.0.0.png" alt="Visualization" width="500px">

  3. R

    Custom Yolov7 On Kaggle On Custom Dataset

    • universe.roboflow.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 29, 2023
    Dataset authored and provided by
    Owais Ahmad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Person Car Bounding Boxes
    Description

    Custom Training with YOLOv7 🔥

    Some Important links

    Contact Information

    Objective

    To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

    Data Acquisition

    The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

    from IPython.display import Markdown, display
    
    display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))
    

    Custom Training with YOLOv7 🔥

    In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

    • Export the dataset to YOLOv7
    • Train YOLOv7 to recognize the objects in our dataset
    • Evaluate our YOLOv7 model's performance
    • Run test inference to view performance of YOLOv7 model at work

    📦 YOLOv7

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

    Image Credit - jinfagang

    Step 1: Install Requirements

    !git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements
    %cd yolov7
    !pip install -qr requirements.txt
    !pip install -q roboflow
    

    Downloading YOLOV7 starting checkpoint

    !wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"
    
    import os
    import glob
    import wandb
    import torch
    from roboflow import Roboflow
    from kaggle_secrets import UserSecretsClient
    from IPython.display import Image, clear_output, display # to display images
    
    
    
    print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")
    

    https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

    I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

    YOLOv7-Car-Person-Custom

    try:
      user_secrets = UserSecretsClient()
      wandb_api_key = user_secrets.get_secret("wandb_api")
      wandb.login(key=wandb_api_key)
      anonymous = None
    except:
      wandb.login(anonymous='must')
      print('To use your W&B account,
    Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. 
    Get your W&B access token from here: https://wandb.ai/authorize')
      
      
      
    wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")
    

    Step 2: Assemble Our Dataset

    https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

    In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

    In Roboflow, We can choose between two paths:

    Version v2 Aug 12, 2022 Looks like this.

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

    user_secrets = UserSecretsClient()
    roboflow_api_key = user_secrets.get_secret("roboflow_api")
    
    rf = Roboflow(api_key=roboflow_api_key)
    project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq")
    dataset = project.version(2).download("yolov7")
    

    Step 3: Training Custom pretrained YOLOv7 model

    Here, I am able to pass a number of arguments: - img: define input image size - batch: determine

  4. Dataset for class comment analysis

    • zenodo.org
    zip
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja Rani; Pooja Rani (2022). Dataset for class comment analysis [Dataset]. http://doi.org/10.5281/zenodo.4311839
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 22, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pooja Rani; Pooja Rani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

    ## Structure
    ```
    Projects/
      Java_projects/
        eclipse.zip
        guava.zip
        guice.zip
        hadoop.zip
        spark.zip
        vaadin.zip
    
      Pharo_projects/
        images/
          GToolkit.zip
          Moose.zip
          PetitParser.zip
          Pillar.zip
          PolyMath.zip
          Roassal2.zip
          Seaside.zip
    
        vm/
          70-x64/Pharo
    
        Scripts/
          ClassCommentExtraction.st
          SampleSelectionScript.st    
    
      Python_projects/
        django.zip
        ipython.zip
        Mailpile.zip
        pandas.zip
        pipenv.zip
        pytorch.zip   
        requests.zip 
      
    ```
    
    ## Contents of the Replication Package
    ---
    
    **Projects/** contains the raw projects of each language that are used to analyze class comments.
    - **Java_projects/**
      - `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub [Eclipse](https://github.com/eclipse).
      - `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is available on GitHub [Guava](https://github.com/google/guava).
      - `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is available on GitHub [Guice](https://github.com/google/guice)
      - `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Hadoop](https://github.com/apache/hadoop)
      - `spark.zip` - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Spark](https://github.com/apache/spark)
      - `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub [Vaadin](https://github.com/vaadin/framework)
    
    - **Pharo_projects/**
     - **images/** - 
        - `GToolkit.zip` - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image. 
        - `Moose.zip` - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image. 
        - `PetitParser.zip` - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
        - `Pillar.zip` - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
        - `PolyMath.zip` - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
        - `Roassal2.zip` - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
        - `Seaside.zip` - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
    
     - **vm/** - 
      - **70-x64/Pharo** - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the `images/` folder. The user can run the vm on macOS and select any of the Pharo image. 
    
     - **Scripts/** - It contains the sample Smalltalk scripts to extract class comments from various projects. 
      - `ClassCommentExtraction.st` - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.
      - `SampleSelectionScript.st` - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.
    
    
    - **Python_projects/**
      - `django.zip` - Django project downloaded from the GitHub. More detail about the project is available on GitHub [Django](https://github.com/django)
      - `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on [IPython](https://github.com/ipython/ipython)
      - `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on [Mailpile](https://github.com/mailpile/Mailpile)
      - `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on [pandas](https://github.com/pandas-dev/pandas)
      - `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on [Pipenv](https://github.com/pypa/pipenv)
      - `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on [PyTorch](https://github.com/pytorch/pytorch)
      - `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on [Requests](https://github.com/psf/requests/)
    

  5. u

    Data from: CADDI: An in-Class Activity Detection Dataset using IMU data from...

    • observatorio-cientifico.ua.es
    • scidb.cn
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel; Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel (2025). CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc49bb9e7c03b01be251c
    Explore at:
    Dataset updated
    2025
    Authors
    Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel; Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel
    Description

    Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data

    /instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }

  6. T

    coco

    • tensorflow.org
    • huggingface.co
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). coco [Dataset]. https://www.tensorflow.org/datasets/catalog/coco
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    COCO is a large-scale object detection, segmentation, and captioning dataset.

    Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('coco', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">

  7. a

    Tiny ImageNet

    • datasets.activeloop.ai
    • huggingface.co
    deeplake
    Updated Apr 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ya Le and Xuan S. Yang (2022). Tiny ImageNet [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/tiny-imagenet-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Apr 2, 2022
    Authors
    Ya Le and Xuan S. Yang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Tiny ImageNet Dataset is a dataset of 100,000 tiny (64x64) images of objects. It is a popular dataset for image classification and object detection research. The dataset consists of 200 different classes, each of which has 500 images.

  8. Z

    Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nagappan, Meiyappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5907001
    Explore at:
    Dataset updated
    Jan 27, 2022
    Dataset provided by
    Keshavarz, Hossein
    Nagappan, Meiyappan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

    This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.

    The datasets are available under directory dataset. There are 4 datasets in this directory.

    1. apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system.
    2. apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).
    3. apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.
    4. apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.

    In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.

    The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.

    More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).

    References:

    1. GumTree

    Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324

    1. PyDriller
    • https://pydriller.readthedocs.io/en/latest/

    • Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911

  9. Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...

    • zenodo.org
    • observatorio-cientifico.ua.es
    text/x-python, zip
    Updated Jul 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik (2023). Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0) [Dataset]. http://doi.org/10.5281/zenodo.6941662
    Explore at:
    zip, text/x-pythonAvailable download formats
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).

    Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):

    • Land Cover Class ID: is the identification number of each LULC class
    • Land Cover Class Short Name: is the short name of each LULC class
    • Image ID: is the identification number of each image within its corresponding LULC class
    • Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products
    • GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image
    • Latitude: is the latitude of the center point of each image
    • Longitude: is the longitude of the center point of each image
    • Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes
    • Administrative Department Level1: is the administrative level 1 name to which each image belongs
    • Administrative Department Level2: is the administrative level 2 name to which each image belongs
    • Locality: is the name of the locality to which each image belongs
    • Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile

    For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:

    • A CSV file that contains all exported images for this class
    • A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images".

    To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.

    © Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)

  10. Tree Point Classification

    • community-climatesolutions.hub.arcgis.com
    • cacgeoportal.com
    • +1more
    Updated Oct 8, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2020). Tree Point Classification [Dataset]. https://community-climatesolutions.hub.arcgis.com/content/58d77b24469d4f30b5f68973deb65599
    Explore at:
    Dataset updated
    Oct 8, 2020
    Dataset authored and provided by
    Esrihttp://esri.com/
    Description

    Classifying trees from point cloud data is useful in applications such as high-quality 3D basemap creation, urban planning, and forestry workflows. Trees have a complex geometrical structure that is hard to capture using traditional means. Deep learning models are highly capable of learning these complex structures and giving superior results.Using the modelFollow the guide to use the model. The model can be used with the 3D Basemaps solution and ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning frameworks libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.InputThe model accepts unclassified point clouds with the attributes: X, Y, Z, and Number of Returns.Note: This model is trained to work on unclassified point clouds that are in a projected coordinate system, where the units of X, Y, and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The provided deep learning model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification.This model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time and compute resources while improving accuracy. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block, and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following 2 classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: 0 Background 5 Trees / High-vegetationApplicable geographiesThis model is expected to work well in all regions globally, with an exception of mountainous regions. However, results can vary for datasets that are statistically dissimilar to training data.Model architectureThis model uses the PointCNN model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. Class Precision Recall F1-score Trees / High-vegetation (5) 0.975374 0.965929 0.970628Training dataThis model is trained on a subset of UK Environment Agency's open dataset. The training data used has the following characteristics: X, Y and Z linear unit meter Z range -19.29 m to 314.23 m Number of Returns 1 to 5 Intensity 1 to 4092 Point spacing 0.6 ± 0.3 Scan angle -23 to +23 Maximum points per block 8192 Extra attributes Number of Returns Class structure [0, 5]Sample resultsHere are a few results from the model.

  11. a

    Caltech-101

    • datasets.activeloop.ai
    • huggingface.co
    deeplake
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caltech (2022). Caltech-101 [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/caltech-101-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Feb 3, 2022
    Dataset authored and provided by
    Caltech
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Dataset funded by
    National Science Foundation
    Description

    The Caltech-101 dataset is a dataset of 101 categories of objects, each with 30 to 800 images. The images are all 32 x 32 pixels in size and are in grayscale. The dataset is used to train and evaluate machine learning models for the task of object recognition.

  12. Power Line Classification

    • morocco.africageoportal.com
    • angola.africageoportal.com
    • +1more
    Updated Dec 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2020). Power Line Classification [Dataset]. https://morocco.africageoportal.com/content/6ce6dae2d62c4037afc3a3abd19afb11
    Explore at:
    Dataset updated
    Dec 15, 2020
    Dataset authored and provided by
    Esrihttp://esri.com/
    Description

    The classification of point cloud datasets to identify distribution wires is useful for identifying vegetation encroachment around power lines. Such workflows are important for preventing fires and power outages and are typically manual, recurring, and labor-intensive. This model is designed to extract distribution wires at the street level. Its predictions for high-tension transmission wires are less consistent with changes in geography as compared to street-level distribution wires. In the case of high-tension transmission wires, a lower ‘recall’ value is observed as compared to the value observed for low-lying street wires and poles.Using the modelFollow the guide to use the model. The model can be used with ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.InputThe model accepts unclassified point clouds with point geometry (X, Y and Z values). Note: The model is not dependent on any additional attributes such as Intensity, Number of Returns, etc. This model is trained to work on unclassified point clouds that are in a projected coordinate system, in which the units of X, Y and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification and scenarios with false positives.The model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time, and compute resources while improving accuracy. Another example where fine-tuning this model can be useful is when the object of interest is tram wires, railway wires, etc. which are geometrically similar to electricity wires. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: Classcode Class Description 0 Background Class 14 Distribution Wires 15 Distribution Tower/PolesApplicable geographiesThe model is expected to work within any geography. It's seen to produce favorable results as shown here in many regions. However, results can vary for datasets that are statistically dissimilar to training data.Model architectureThis model uses the RandLANet model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. - Precision Recall F1-score Background (0) 0.999679 0.999876 0.999778 Distribution Wires (14) 0.955085 0.936825 0.945867 Distribution Poles (15) 0.707983 0.553888 0.621527Training dataThis model is trained on manually classified training dataset provided to Esri by AAM group. The training data used has the following characteristics: X, Y, and Z linear unitmeter Z range-240.34 m to 731.17 m Number of Returns1 to 5 Intensity1 to 4095 Point spacing0.2 ± 0.1 Scan angle-42 to +35 Maximum points per block20000 Extra attributesNone Class structure[0, 14, 15]Sample resultsHere are a few results from the model.

  13. T

    clinc_oos

    • tensorflow.org
    • opendatalab.com
    Updated Dec 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). clinc_oos [Dataset]. https://www.tensorflow.org/datasets/catalog/clinc_oos
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope (OOS), i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. It offers a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('clinc_oos', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  14. h

    vqa

    • huggingface.co
    Updated Feb 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Korea Electronics Technology Institute Artificial Intelligence Research Center (2023). vqa [Dataset]. https://huggingface.co/datasets/KETI-AIR/vqa
    Explore at:
    Dataset updated
    Feb 21, 2023
    Dataset authored and provided by
    Korea Electronics Technology Institute Artificial Intelligence Research Center
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    VQA

    What is VQA?

    VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. - 265,016 images (COCO and abstract scenes) - At least 3 questions (5.4 questions on average) per image - 10 ground truth answers per question - 3 plausible (but likely incorrect) answers per question - Automatic evaluation metric

    Dataset

    Details on downloading the latest dataset may be found on the download webpage.

    Usage

    from datasets import load_dataset
    
    raw_datasets = load_dataset(
            "vqa.py", 
            "base",
            cache_dir="huggingface_datasets", 
            data_dir="data",
            ignore_verifications=True,
          )
    
    dataset_train = raw_datasets["train"]
    
    for item in dataset_train:
      print(item)
      exit()
    

    v2 = v2.real + v2.abstract (v2.abstract == v1.abstract) v1 = v1.real + v1.abstract v2.abstract.balanced.bin

  15. DATS 6401 - Final Project - Yon ho Cheong.zip

    • figshare.com
    zip
    Updated Dec 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 15, 2018
    Dataset provided by
    figshare
    Authors
    Yon ho Cheong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau

  16. Z

    A Semantically Annotated 15-Class Ground Truth Dataset for Substation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gomes, Andreas (2023). A Semantically Annotated 15-Class Ground Truth Dataset for Substation Equipment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7884269
    Explore at:
    Dataset updated
    May 5, 2023
    Dataset authored and provided by
    Gomes, Andreas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 1660 images of electric substations with 50705 annotated objects. The images were obtained using different cameras, including cameras mounted on Autonomous Guided Vehicles (AGVs), fixed location cameras and those captured by humans using a variety of cameras. A total of 15 classes of objects were identified in this dataset, and the number of instances for each class is provided in the following table:

    Object classes and how many times they appear in the dataset.
    
    
        Class
        Instances
    
    
    
    
        Open blade disconnect
        310
    
    
        Closed blade disconnect switch
        5243
    
    
        Open tandem disconnect switch
        1599
    
    
        Closed tandem disconnect switch
        966
    
    
        Breaker
        980
    
    
        Fuse disconnect switch
        355
    
    
        Glass disc insulator
        3185
    
    
        Porcelain pin insulator
        26499
    
    
        Muffle
        1354
    
    
        Lightning arrester
        1976
    
    
        Recloser
        2331
    
    
        Power transformer
        768
    
    
        Current transformer
        2136
    
    
        Potential transformer
        654
    
    
        Tripolar disconnect switch
        2349
    

    All images in this dataset were collected from a single electrical distribution substation in Brazil over a period of two years. The images were captured at various times of the day and under different weather and seasonal conditions, ensuring a diverse range of lighting conditions for the depicted objects. A team of experts in Electrical Engineering curated all the images to ensure that the angles and distances depicted in the images are suitable for automating inspections in an electrical substation.

    The file structure of this dataset contains the following directories and files:

    images: This directory contains 1660 electrical substation images in JPEG format.

    images: This directory contains 1660 electrical substation images in JPEG format.

    labels_json: This directory contains JSON files annotated in the VOC-style polygonal format. Each file shares the same filename as its respective image in the images directory.

    15_masks: This directory contains PNG segmentation masks for all 15 classes, including the porcelain pin insulator class. Each file shares the same name as its corresponding image in the images directory.

    14_masks: This directory contains PNG segmentation masks for all classes except the porcelain pin insulator. Each file shares the same name as its corresponding image in the images directory.

    porcelain_masks: This directory contains PNG segmentation masks for the porcelain pin insulator class. Each file shares the same name as its corresponding image in the images directory.

    classes.txt: This text file lists the 15 classes plus the background class used in LabelMe.

    json2png.py: This Python script can be used to generate segmentation masks using the VOC-style polygonal JSON annotations.

    The dataset aims to support the development of computer vision techniques and deep learning algorithms for automating the inspection process of electrical substations. The dataset is expected to be useful for researchers, practitioners, and engineers interested in developing and testing object detection and segmentation models for automating inspection and maintenance activities in electrical substations.

    The authors would like to thank UTFPR for the support and infrastructure made available for the development of this research and COPEL-DIS for the support through project PD-2866-0528/2020—Development of a Methodology for Automatic Analysis of Thermal Images. We also would like to express our deepest appreciation to the team of annotators who worked diligently to produce the semantic labels for our dataset. Their hard work, dedication and attention to detail were critical to the success of this project.

  17. UCI and OpenML Data Sets for Ordinal Quantification

    • zenodo.org
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

    With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

    We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

    Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

    Usage

    You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

    Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

    Data Extraction: In your terminal, you can call either

    make

    (recommended), or

    julia --project="." --eval "using Pkg; Pkg.instantiate()"
    julia --project="." extract-oq.jl

    Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

    Further Reading

    Implementation of our experiments: https://github.com/mirkobunse/regularized-oq

  18. Glass Binary Data

    • figshare.com
    application/gzip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Y. T. Sun; H. Y. Bai; M. Z. Li; W. H. Wang; Hacking Materials (2023). Glass Binary Data [Dataset]. http://doi.org/10.6084/m9.figshare.7268507.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Y. T. Sun; H. Y. Bai; M. Z. Li; W. H. Wang; Hacking Materials
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Metallic glass formation data for binary alloys, collected from various experimental techniques such as melt-spinning or mechanical alloying. This dataset covers all compositions with an interval of 5 at.% in 59 binary systems, containing a total of 5959 alloys in the dataset. The target property of this dataset is the glass forming ability (GFA), i.e. whether the composition can form monolithic glass or not, which is either 1 for glass forming or 0 for non-full glass forming.The V2 versions of this dataset have been cleaned to remove duplicate data points. Any entries with identical formula and both negative and positive GFA classes were combined to a single entry with a positive GFA class.Data is available as Monty Encoder encoded JSON and as the source CSV file. Recommended access method is with the matminer Python package using the datasets module.Note on citations: If you found this dataset useful and would like to cite it in your work, please be sure to cite its original sources below rather than or in addition to this page.Dataset discussed in:Machine Learning Approach for Prediction and Understanding of Glass-Forming AbilityY. T. Sun†§ , H. Y. Bai†§, M. Z. Li*‡, and W. H. Wang*†§† Institute of Physics, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China‡ Department of Physics, Beijing Key Laboratory of Optoelectronic Functional Materials & Micro-nano Devices, Renmin University of China, Beijing 100872, People’s Republic of China§ University of Chinese Academy of Science, Beijing 100049, People’s Republic of ChinaJ. Phys. Chem. Lett., 2017, 8 (14), pp 3434–3439DOI: 10.1021/acs.jpclett.7b01046Publication Date (Web): July 11, 2017

  19. p

    Tree Point Classification - New Zealand

    • pacificgeoportal.com
    • geoportal-pacificcore.hub.arcgis.com
    • +1more
    Updated Jul 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eagle Technology Group Ltd (2022). Tree Point Classification - New Zealand [Dataset]. https://www.pacificgeoportal.com/content/0e2e3d0d0ef843e690169cac2f5620f9
    Explore at:
    Dataset updated
    Jul 25, 2022
    Dataset authored and provided by
    Eagle Technology Group Ltd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This New Zealand Point Cloud Classification Deep Learning Package will classify point clouds into tree and background classes. This model is optimized to work with New Zealand aerial LiDAR data.The classification of point cloud datasets to identify Trees is useful in applications such as high-quality 3D basemap creation, urban planning, forestry workflows, and planning climate change response.Trees could have a complex irregular geometrical structure that is hard to capture using traditional means. Deep learning models are highly capable of learning these complex structures and giving superior results.This model is designed to extract Tree in both urban and rural area in New Zealand.The Training/Testing/Validation dataset are taken within New Zealand resulting of a high reliability to recognize the pattern of NZ common building architecture.Licensing requirementsArcGIS Desktop - ArcGIS 3D Analyst extension for ArcGIS ProUsing the modelThe model can be used in ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning frameworks libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Note: Deep learning is computationally intensive, and a powerful GPU is recommended to process large datasets.InputThe model is trained with classified LiDAR that follows the LINZ base specification. The input data should be similar to this specification.Note: The model is dependent on additional attributes such as Intensity, Number of Returns, etc, similar to the LINZ base specification. This model is trained to work on classified and unclassified point clouds that are in a projected coordinate system, in which the units of X, Y and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification and scenarios with false positives.The model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time, and compute resources while improving accuracy. Another example where fine-tuning this model can be useful is when the object of interest is tram wires, railway wires, etc. which are geometrically similar to electricity wires. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: 0 Background 5 Trees / High-vegetationApplicable geographiesThe model is expected to work well in the New Zealand. It's seen to produce favorable results as shown in many regions. However, results can vary for datasets that are statistically dissimilar to training data.Training dataset - Wellington CityTesting dataset - Tawa CityValidation/Evaluation dataset - Christchurch City Dataset City Training Wellington Testing Tawa Validating ChristchurchModel architectureThis model uses the PointCNN model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. - Precision Recall F1-score Never Classified 0.991200 0.975404 0.983239 High Vegetation 0.933569 0.975559 0.954102Training dataThis model is trained on classified dataset originally provided by Open TopoGraphy with < 1% of manual labelling and correction.Train-Test split percentage {Train: 80%, Test: 20%} Chosen this ratio based on the analysis from previous epoch statistics which appears to have a descent improvementThe training data used has the following characteristics: X, Y, and Z linear unitMeter Z range-121.69 m to 26.84 m Number of Returns1 to 5 Intensity16 to 65520 Point spacing0.2 ± 0.1 Scan angle-15 to +15 Maximum points per block8192 Block Size20 Meters Class structure[0, 5]Sample resultsModel to classify a dataset with 5pts/m density Christchurch city dataset. The model's performance are directly proportional to the dataset point density and noise exlcuded point clouds.To learn how to use this model, see this story

  20. Data from: YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, bin +1
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Till Grenzdörffer; Martin Günther; Martin Günther; Joachim Hertzberg; Joachim Hertzberg; Till Grenzdörffer (2020). YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation [Dataset]. http://doi.org/10.5281/zenodo.2579173
    Explore at:
    application/gzip, mp4, binAvailable download formats
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Till Grenzdörffer; Martin Günther; Martin Günther; Joachim Hertzberg; Joachim Hertzberg; Till Grenzdörffer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. This dataset consists of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera and the development of more robust algorithms that are more independent of the camera model. Vice versa, our dataset enables researchers to perform a quantitative comparison of the data from several different cameras and depth sensing technologies and evaluate their algorithms before selecting a camera for their specific task. The scenes in our dataset contain 20 different objects from the common benchmark YCB object and model set. We provide full ground truth 6DoF poses for each object, per-pixel segmentation, 2D and 3D bounding boxes and a measure of the amount of occlusion of each object.

    If you use this dataset in your research, please cite the following publication:

    T. Grenzdörffer, M. Günther, and J. Hertzberg, “YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31-June 4, 2020. IEEE, 2020.

    @InProceedings{Grenzdoerffer2020ycbm,
     title = {{YCB-M}: A Multi-Camera {RGB-D} Dataset for Object Recognition and {6DoF} Pose Estimation},
     author = {Grenzd{\"{o}}rffer, Till and G{\"{u}}nther, Martin and Hertzberg, Joachim},
     booktitle = {2020 {IEEE} International Conference on Robotics and Automation, {ICRA} 2020, Paris, France, May 31-June 4, 2020},
     year = {2020},
     publisher = {{IEEE}}
    }

    This paper is also available on arXiv: https://arxiv.org/abs/2004.11657

    To visualize the dataset, follow these instructions (tested on Ubuntu Xenial 16.04):

    # IMPORTANT: the ROS setup.bash must NOT be sourced, otherwise the following error occurs:
    # ImportError: /opt/ros/kinetic/lib/python2.7/dist-packages/cv2.so: undefined symbol: PyCObject_Type
    
    # nvdu requires Python 3.5 or 3.6
    sudo add-apt-repository -y ppa:deadsnakes/ppa  # to get python3.6 on Ubuntu Xenial
    sudo apt-get update
    sudo apt-get install -y python3.6 libsm6 libxext6 libxrender1 python-virtualenv python-pip
    
    # create a new virtual environment
    virtualenv -p python3.6 venv_nvdu
    cd venv_nvdu/
    source bin/activate
    
    # clone our fork of NVIDIA's Dataset Utilities that incorporates some essential fixes
    pip install -e 'git+https://github.com/mintar/Dataset_Utilities.git#egg=nvdu'
    
    # download and transform the meshes
    # (alternatively, unzip the meshes contained in the dataset
    # to 

    For further details, see README.md.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
atk (2023). Python_class Dataset [Dataset]. https://universe.roboflow.com/atk-s1vv8/python_class/model/1

Python_class Dataset

python_class

python_class-dataset

Explore at:
61 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Nov 18, 2023
Dataset authored and provided by
atk
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured
Atk Bounding Boxes
Description

Python_class

## Overview

Python_class is a dataset for object detection tasks - it contains Atk annotations for 967 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Search
Clear search
Close search
Google apps
Main menu