100+ datasets found

R
Python_class Dataset
universe.roboflow.com
zip
Updated Nov 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
atk (2023). Python_class Dataset [Dataset]. https://universe.roboflow.com/atk-s1vv8/python_class/model/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 18, 2023
Dataset authored and provided by
atk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Atk Bounding Boxes
Description
Python_class

## Overview Python_class is a dataset for object detection tasks - it contains Atk annotations for 967 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
T
imagenette
tensorflow.org
opendatalab.com
+1more
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imagenette [Dataset]. https://www.tensorflow.org/datasets/catalog/imagenette
Explore at:
Dataset updated
Jun 1, 2024
Description
Imagenette is a subset of 10 easily classified classes from the Imagenet dataset. It was originally prepared by Jeremy Howard of FastAI. The objective behind putting together a small version of the Imagenet dataset was mainly because running new ideas/algorithms/experiments on the whole Imagenet take a lot of time.

This version of the dataset allows researchers/practitioners to quickly try out ideas and share with others. The dataset comes in three variants:

Full size

320 px

160 px

Note: The v2 config correspond to the new 70/30 train/valid split (released in Dec 6 2019).

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imagenette', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/imagenette-full-size-v2-1.0.0.png" alt="Visualization" width="500px">
R
Custom Yolov7 On Kaggle On Custom Dataset
universe.roboflow.com
zip
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2023
Dataset authored and provided by
Owais Ahmad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Car Bounding Boxes
Description
Custom Training with YOLOv7 🔥

Some Important links

Model Inference🤖

🚀Training Yolov7 on Kaggle

Weight and Biases 🐝

HuggingFace 🤗 Model Repo

Contact Information

Name - Owais Ahmad

Phone - +91-9515884381

Email - owaiskhan9654@gmail.com

Portfolio - https://owaiskhan9654.github.io/

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

Link to the Downloadable Dataset

from IPython.display import Markdown, display display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))

Custom Training with YOLOv7 🔥

In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

Export the dataset to YOLOv7

Train YOLOv7 to recognize the objects in our dataset

Evaluate our YOLOv7 model's performance

Run test inference to view performance of YOLOv7 model at work

📦 YOLOv7

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

Image Credit - jinfagang

Step 1: Install Requirements

!git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements %cd yolov7 !pip install -qr requirements.txt !pip install -q roboflow

Downloading YOLOV7 starting checkpoint

!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"

import os import glob import wandb import torch from roboflow import Roboflow from kaggle_secrets import UserSecretsClient from IPython.display import Image, clear_output, display # to display images print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

YOLOv7-Car-Person-Custom

try: user_secrets = UserSecretsClient() wandb_api_key = user_secrets.get_secret("wandb_api") wandb.login(key=wandb_api_key) anonymous = None except: wandb.login(anonymous='must') print('To use your W&B account, Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. Get your W&B access token from here: https://wandb.ai/authorize') wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")

Step 2: Assemble Our Dataset

https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

In Roboflow, We can choose between two paths:

Convert an existing Coco dataset to YOLOv7 format. In Roboflow it supports over 30 formats object detection formats for conversion.

Uploading only these raw images and annotate them in Roboflow with Roboflow Annotate.

Version v2 Aug 12, 2022 Looks like this.

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

user_secrets = UserSecretsClient() roboflow_api_key = user_secrets.get_secret("roboflow_api")

rf = Roboflow(api_key=roboflow_api_key) project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq") dataset = project.version(2).download("yolov7")

Step 3: Training Custom pretrained YOLOv7 model

Here, I am able to pass a number of arguments: - img: define input image size - batch: determine

Dataset for class comment analysis

zenodo.org

zip

Updated Feb 22, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Pooja Rani; Pooja Rani (2022). Dataset for class comment analysis [Dataset]. http://doi.org/10.5281/zenodo.4311839

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4311839

Dataset updated

Feb 22, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Pooja Rani; Pooja Rani

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

## Structure
```
Projects/
  Java_projects/
    eclipse.zip
    guava.zip
    guice.zip
    hadoop.zip
    spark.zip
    vaadin.zip

  Pharo_projects/
    images/
      GToolkit.zip
      Moose.zip
      PetitParser.zip
      Pillar.zip
      PolyMath.zip
      Roassal2.zip
      Seaside.zip

    vm/
      70-x64/Pharo

    Scripts/
      ClassCommentExtraction.st
      SampleSelectionScript.st    

  Python_projects/
    django.zip
    ipython.zip
    Mailpile.zip
    pandas.zip
    pipenv.zip
    pytorch.zip   
    requests.zip 
  
```

## Contents of the Replication Package
---

**Projects/** contains the raw projects of each language that are used to analyze class comments.
- **Java_projects/**
  - `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub [Eclipse](https://github.com/eclipse).
  - `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is available on GitHub [Guava](https://github.com/google/guava).
  - `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is available on GitHub [Guice](https://github.com/google/guice)
  - `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Hadoop](https://github.com/apache/hadoop)
  - `spark.zip` - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub [Apache Spark](https://github.com/apache/spark)
  - `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub [Vaadin](https://github.com/vaadin/framework)

- **Pharo_projects/**
 - **images/** - 
    - `GToolkit.zip` - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image. 
    - `Moose.zip` - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image. 
    - `PetitParser.zip` - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
    - `Pillar.zip` - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
    - `PolyMath.zip` - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
    - `Roassal2.zip` - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.
    - `Seaside.zip` - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the `vm/` folder. The script to extract the comments is already provided in the image.

 - **vm/** - 
  - **70-x64/Pharo** - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the `images/` folder. The user can run the vm on macOS and select any of the Pharo image. 

 - **Scripts/** - It contains the sample Smalltalk scripts to extract class comments from various projects. 
  - `ClassCommentExtraction.st` - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.
  - `SampleSelectionScript.st` - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.


- **Python_projects/**
  - `django.zip` - Django project downloaded from the GitHub. More detail about the project is available on GitHub [Django](https://github.com/django)
  - `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on [IPython](https://github.com/ipython/ipython)
  - `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on [Mailpile](https://github.com/mailpile/Mailpile)
  - `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on [pandas](https://github.com/pandas-dev/pandas)
  - `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on [Pipenv](https://github.com/pypa/pipenv)
  - `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on [PyTorch](https://github.com/pytorch/pytorch)
  - `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on [Requests](https://github.com/psf/requests/)

u
Data from: CADDI: An in-Class Activity Detection Dataset using IMU data from...
observatorio-cientifico.ua.es
scidb.cn
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel; Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel (2025). CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc49bb9e7c03b01be251c
Explore at:
Dataset updated
2025
Authors
Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel; Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel
Description
Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data

/instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }
T
coco
tensorflow.org
huggingface.co
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). coco [Dataset]. https://www.tensorflow.org/datasets/catalog/coco
Explore at:
Dataset updated
Jun 1, 2024
Description
COCO is a large-scale object detection, segmentation, and captioning dataset.

Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('coco', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">
a
Tiny ImageNet
datasets.activeloop.ai
huggingface.co
deeplake
Updated Apr 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ya Le and Xuan S. Yang (2022). Tiny ImageNet [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/tiny-imagenet-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Apr 2, 2022
Authors
Ya Le and Xuan S. Yang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Tiny ImageNet Dataset is a dataset of 100,000 tiny (64x64) images of objects. It is a popular dataset for image classification and object detection research. The dataset consists of 200 different classes, each of which has 500 images.
Z
Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction
data.niaid.nih.gov
zenodo.org
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nagappan, Meiyappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5907001
Explore at:
Dataset updated
Jan 27, 2022
Dataset provided by
Keshavarz, Hossein
Nagappan, Meiyappan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.

The datasets are available under directory dataset. There are 4 datasets in this directory.

apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system.

apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).

apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.

apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.

In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.

The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.

More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).

References:

GumTree

https://github.com/GumTreeDiff/gumtree

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324

PyDriller

https://pydriller.readthedocs.io/en/latest/

Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911
Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...
zenodo.org
observatorio-cientifico.ua.es
text/x-python, zip
Updated Jul 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik (2023). Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0) [Dataset]. http://doi.org/10.5281/zenodo.6941662
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6941662
Dataset updated
Jul 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).

Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):

Land Cover Class ID: is the identification number of each LULC class

Land Cover Class Short Name: is the short name of each LULC class

Image ID: is the identification number of each image within its corresponding LULC class

Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products

GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image

Latitude: is the latitude of the center point of each image

Longitude: is the longitude of the center point of each image

Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes

Administrative Department Level1: is the administrative level 1 name to which each image belongs

Administrative Department Level2: is the administrative level 2 name to which each image belongs

Locality: is the name of the locality to which each image belongs

Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile

For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:

A CSV file that contains all exported images for this class

A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images".

To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.

© Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)
Tree Point Classification
community-climatesolutions.hub.arcgis.com
cacgeoportal.com
+1more
Updated Oct 8, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). Tree Point Classification [Dataset]. https://community-climatesolutions.hub.arcgis.com/content/58d77b24469d4f30b5f68973deb65599
Explore at:
Dataset updated
Oct 8, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Description
Classifying trees from point cloud data is useful in applications such as high-quality 3D basemap creation, urban planning, and forestry workflows. Trees have a complex geometrical structure that is hard to capture using traditional means. Deep learning models are highly capable of learning these complex structures and giving superior results.Using the modelFollow the guide to use the model. The model can be used with the 3D Basemaps solution and ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning frameworks libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.InputThe model accepts unclassified point clouds with the attributes: X, Y, Z, and Number of Returns.Note: This model is trained to work on unclassified point clouds that are in a projected coordinate system, where the units of X, Y, and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The provided deep learning model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification.This model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time and compute resources while improving accuracy. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block, and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following 2 classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: 0 Background 5 Trees / High-vegetationApplicable geographiesThis model is expected to work well in all regions globally, with an exception of mountainous regions. However, results can vary for datasets that are statistically dissimilar to training data.Model architectureThis model uses the PointCNN model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. Class Precision Recall F1-score Trees / High-vegetation (5) 0.975374 0.965929 0.970628Training dataThis model is trained on a subset of UK Environment Agency's open dataset. The training data used has the following characteristics: X, Y and Z linear unit meter Z range -19.29 m to 314.23 m Number of Returns 1 to 5 Intensity 1 to 4092 Point spacing 0.6 ± 0.3 Scan angle -23 to +23 Maximum points per block 8192 Extra attributes Number of Returns Class structure [0, 5]Sample resultsHere are a few results from the model.
a
Caltech-101
datasets.activeloop.ai
huggingface.co
deeplake
Updated Feb 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caltech (2022). Caltech-101 [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/caltech-101-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Feb 3, 2022
Dataset authored and provided by
Caltech
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Dataset funded by
National Science Foundation
Description
The Caltech-101 dataset is a dataset of 101 categories of objects, each with 30 to 800 images. The images are all 32 x 32 pixels in size and are in grayscale. The dataset is used to train and evaluate machine learning models for the task of object recognition.
Power Line Classification
morocco.africageoportal.com
angola.africageoportal.com
+1more
Updated Dec 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). Power Line Classification [Dataset]. https://morocco.africageoportal.com/content/6ce6dae2d62c4037afc3a3abd19afb11
Explore at:
Dataset updated
Dec 15, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Description
The classification of point cloud datasets to identify distribution wires is useful for identifying vegetation encroachment around power lines. Such workflows are important for preventing fires and power outages and are typically manual, recurring, and labor-intensive. This model is designed to extract distribution wires at the street level. Its predictions for high-tension transmission wires are less consistent with changes in geography as compared to street-level distribution wires. In the case of high-tension transmission wires, a lower ‘recall’ value is observed as compared to the value observed for low-lying street wires and poles.Using the modelFollow the guide to use the model. The model can be used with ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.InputThe model accepts unclassified point clouds with point geometry (X, Y and Z values). Note: The model is not dependent on any additional attributes such as Intensity, Number of Returns, etc. This model is trained to work on unclassified point clouds that are in a projected coordinate system, in which the units of X, Y and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification and scenarios with false positives.The model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time, and compute resources while improving accuracy. Another example where fine-tuning this model can be useful is when the object of interest is tram wires, railway wires, etc. which are geometrically similar to electricity wires. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: Classcode Class Description 0 Background Class 14 Distribution Wires 15 Distribution Tower/PolesApplicable geographiesThe model is expected to work within any geography. It's seen to produce favorable results as shown here in many regions. However, results can vary for datasets that are statistically dissimilar to training data.Model architectureThis model uses the RandLANet model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. - Precision Recall F1-score Background (0) 0.999679 0.999876 0.999778 Distribution Wires (14) 0.955085 0.936825 0.945867 Distribution Poles (15) 0.707983 0.553888 0.621527Training dataThis model is trained on manually classified training dataset provided to Esri by AAM group. The training data used has the following characteristics: X, Y, and Z linear unitmeter Z range-240.34 m to 731.17 m Number of Returns1 to 5 Intensity1 to 4095 Point spacing0.2 ± 0.1 Scan angle-42 to +35 Maximum points per block20000 Extra attributesNone Class structure[0, 14, 15]Sample resultsHere are a few results from the model.
T
clinc_oos
tensorflow.org
opendatalab.com
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). clinc_oos [Dataset]. https://www.tensorflow.org/datasets/catalog/clinc_oos
Explore at:
Dataset updated
Dec 6, 2022
Description
Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope (OOS), i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. It offers a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('clinc_oos', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
vqa
huggingface.co
Updated Feb 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Korea Electronics Technology Institute Artificial Intelligence Research Center (2023). vqa [Dataset]. https://huggingface.co/datasets/KETI-AIR/vqa
Explore at:
Dataset updated
Feb 21, 2023
Dataset authored and provided by
Korea Electronics Technology Institute Artificial Intelligence Research Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
VQA

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. - 265,016 images (COCO and abstract scenes) - At least 3 questions (5.4 questions on average) per image - 10 ground truth answers per question - 3 plausible (but likely incorrect) answers per question - Automatic evaluation metric

Dataset

Details on downloading the latest dataset may be found on the download webpage.

Usage

from datasets import load_dataset raw_datasets = load_dataset( "vqa.py", "base", cache_dir="huggingface_datasets", data_dir="data", ignore_verifications=True, ) dataset_train = raw_datasets["train"] for item in dataset_train: print(item) exit()

v2 = v2.real + v2.abstract (v2.abstract == v1.abstract) v1 = v1.real + v1.abstract v2.abstract.balanced.bin
DATS 6401 - Final Project - Yon ho Cheong.zip
figshare.com
zip
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7471007.v1
Dataset updated
Dec 15, 2018
Dataset provided by
figshare
Authors
Yon ho Cheong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Z
A Semantically Annotated 15-Class Ground Truth Dataset for Substation...
data.niaid.nih.gov
zenodo.org
Updated May 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gomes, Andreas (2023). A Semantically Annotated 15-Class Ground Truth Dataset for Substation Equipment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7884269
Explore at:
Dataset updated
May 5, 2023
Dataset authored and provided by
Gomes, Andreas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 1660 images of electric substations with 50705 annotated objects. The images were obtained using different cameras, including cameras mounted on Autonomous Guided Vehicles (AGVs), fixed location cameras and those captured by humans using a variety of cameras. A total of 15 classes of objects were identified in this dataset, and the number of instances for each class is provided in the following table:

Object classes and how many times they appear in the dataset. Class Instances Open blade disconnect 310 Closed blade disconnect switch 5243 Open tandem disconnect switch 1599 Closed tandem disconnect switch 966 Breaker 980 Fuse disconnect switch 355 Glass disc insulator 3185 Porcelain pin insulator 26499 Muffle 1354 Lightning arrester 1976 Recloser 2331 Power transformer 768 Current transformer 2136 Potential transformer 654 Tripolar disconnect switch 2349

All images in this dataset were collected from a single electrical distribution substation in Brazil over a period of two years. The images were captured at various times of the day and under different weather and seasonal conditions, ensuring a diverse range of lighting conditions for the depicted objects. A team of experts in Electrical Engineering curated all the images to ensure that the angles and distances depicted in the images are suitable for automating inspections in an electrical substation.

The file structure of this dataset contains the following directories and files:

images: This directory contains 1660 electrical substation images in JPEG format.

images: This directory contains 1660 electrical substation images in JPEG format.

labels_json: This directory contains JSON files annotated in the VOC-style polygonal format. Each file shares the same filename as its respective image in the images directory.

15_masks: This directory contains PNG segmentation masks for all 15 classes, including the porcelain pin insulator class. Each file shares the same name as its corresponding image in the images directory.

14_masks: This directory contains PNG segmentation masks for all classes except the porcelain pin insulator. Each file shares the same name as its corresponding image in the images directory.

porcelain_masks: This directory contains PNG segmentation masks for the porcelain pin insulator class. Each file shares the same name as its corresponding image in the images directory.

classes.txt: This text file lists the 15 classes plus the background class used in LabelMe.

json2png.py: This Python script can be used to generate segmentation masks using the VOC-style polygonal JSON annotations.

The dataset aims to support the development of computer vision techniques and deep learning algorithms for automating the inspection process of electrical substations. The dataset is expected to be useful for researchers, practitioners, and engineers interested in developing and testing object detection and segmentation models for automating inspection and maintenance activities in electrical substations.

The authors would like to thank UTFPR for the support and infrastructure made available for the development of this research and COPEL-DIS for the support through project PD-2866-0528/2020—Development of a Methodology for Automatic Analysis of Thermal Images. We also would like to express our deepest appreciation to the team of annotators who worked diligently to produce the semantic labels for our dataset. Their hard work, dedication and attention to detail were critical to the success of this project.
UCI and OpenML Data Sets for Ordinal Quantification
zenodo.org
zip
Updated Jul 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). UCI and OpenML Data Sets for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8177302
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8177302
Dataset updated
Jul 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.

With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.

We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.

Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.

Usage

You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.

Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.

Data Extraction: In your terminal, you can call either

make

(recommended), or

julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl

Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.

Further Reading

Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
Glass Binary Data
figshare.com
application/gzip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Y. T. Sun; H. Y. Bai; M. Z. Li; W. H. Wang; Hacking Materials (2023). Glass Binary Data [Dataset]. http://doi.org/10.6084/m9.figshare.7268507.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7268507.v2
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Authors
Y. T. Sun; H. Y. Bai; M. Z. Li; W. H. Wang; Hacking Materials
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Metallic glass formation data for binary alloys, collected from various experimental techniques such as melt-spinning or mechanical alloying. This dataset covers all compositions with an interval of 5 at.% in 59 binary systems, containing a total of 5959 alloys in the dataset. The target property of this dataset is the glass forming ability (GFA), i.e. whether the composition can form monolithic glass or not, which is either 1 for glass forming or 0 for non-full glass forming.The V2 versions of this dataset have been cleaned to remove duplicate data points. Any entries with identical formula and both negative and positive GFA classes were combined to a single entry with a positive GFA class.Data is available as Monty Encoder encoded JSON and as the source CSV file. Recommended access method is with the matminer Python package using the datasets module.Note on citations: If you found this dataset useful and would like to cite it in your work, please be sure to cite its original sources below rather than or in addition to this page.Dataset discussed in:Machine Learning Approach for Prediction and Understanding of Glass-Forming AbilityY. T. Sun†§ , H. Y. Bai†§, M. Z. Li*‡, and W. H. Wang*†§† Institute of Physics, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China‡ Department of Physics, Beijing Key Laboratory of Optoelectronic Functional Materials & Micro-nano Devices, Renmin University of China, Beijing 100872, People’s Republic of China§ University of Chinese Academy of Science, Beijing 100049, People’s Republic of ChinaJ. Phys. Chem. Lett., 2017, 8 (14), pp 3434–3439DOI: 10.1021/acs.jpclett.7b01046Publication Date (Web): July 11, 2017
p
Tree Point Classification - New Zealand
pacificgeoportal.com
geoportal-pacificcore.hub.arcgis.com
+1more
Updated Jul 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eagle Technology Group Ltd (2022). Tree Point Classification - New Zealand [Dataset]. https://www.pacificgeoportal.com/content/0e2e3d0d0ef843e690169cac2f5620f9
Explore at:
Dataset updated
Jul 25, 2022
Dataset authored and provided by
Eagle Technology Group Ltd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This New Zealand Point Cloud Classification Deep Learning Package will classify point clouds into tree and background classes. This model is optimized to work with New Zealand aerial LiDAR data.The classification of point cloud datasets to identify Trees is useful in applications such as high-quality 3D basemap creation, urban planning, forestry workflows, and planning climate change response.Trees could have a complex irregular geometrical structure that is hard to capture using traditional means. Deep learning models are highly capable of learning these complex structures and giving superior results.This model is designed to extract Tree in both urban and rural area in New Zealand.The Training/Testing/Validation dataset are taken within New Zealand resulting of a high reliability to recognize the pattern of NZ common building architecture.Licensing requirementsArcGIS Desktop - ArcGIS 3D Analyst extension for ArcGIS ProUsing the modelThe model can be used in ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning frameworks libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Note: Deep learning is computationally intensive, and a powerful GPU is recommended to process large datasets.InputThe model is trained with classified LiDAR that follows the LINZ base specification. The input data should be similar to this specification.Note: The model is dependent on additional attributes such as Intensity, Number of Returns, etc, similar to the LINZ base specification. This model is trained to work on classified and unclassified point clouds that are in a projected coordinate system, in which the units of X, Y and Z are based on the metric system of measurement. If the dataset is in degrees or feet, it needs to be re-projected accordingly. The model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification and scenarios with false positives.The model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time, and compute resources while improving accuracy. Another example where fine-tuning this model can be useful is when the object of interest is tram wires, railway wires, etc. which are geometrically similar to electricity wires. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: 0 Background 5 Trees / High-vegetationApplicable geographiesThe model is expected to work well in the New Zealand. It's seen to produce favorable results as shown in many regions. However, results can vary for datasets that are statistically dissimilar to training data.Training dataset - Wellington CityTesting dataset - Tawa CityValidation/Evaluation dataset - Christchurch City Dataset City Training Wellington Testing Tawa Validating ChristchurchModel architectureThis model uses the PointCNN model architecture implemented in ArcGIS API for Python.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. - Precision Recall F1-score Never Classified 0.991200 0.975404 0.983239 High Vegetation 0.933569 0.975559 0.954102Training dataThis model is trained on classified dataset originally provided by Open TopoGraphy with < 1% of manual labelling and correction.Train-Test split percentage {Train: 80%, Test: 20%} Chosen this ratio based on the analysis from previous epoch statistics which appears to have a descent improvementThe training data used has the following characteristics: X, Y, and Z linear unitMeter Z range-121.69 m to 26.84 m Number of Returns1 to 5 Intensity16 to 65520 Point spacing0.2 ± 0.1 Scan angle-15 to +15 Maximum points per block8192 Block Size20 Meters Class structure[0, 5]Sample resultsModel to classify a dataset with 5pts/m density Christchurch city dataset. The model's performance are directly proportional to the dataset point density and noise exlcuded point clouds.To learn how to use this model, see this story
Data from: YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and...
zenodo.org
data.niaid.nih.gov
+1more
application/gzip, bin +1
Updated Apr 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Till Grenzdörffer; Martin Günther; Martin Günther; Joachim Hertzberg; Joachim Hertzberg; Till Grenzdörffer (2020). YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation [Dataset]. http://doi.org/10.5281/zenodo.2579173
Explore at:
application/gzip, mp4, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2579173
Dataset updated
Apr 27, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Till Grenzdörffer; Martin Günther; Martin Günther; Joachim Hertzberg; Joachim Hertzberg; Till Grenzdörffer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. This dataset consists of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera and the development of more robust algorithms that are more independent of the camera model. Vice versa, our dataset enables researchers to perform a quantitative comparison of the data from several different cameras and depth sensing technologies and evaluate their algorithms before selecting a camera for their specific task. The scenes in our dataset contain 20 different objects from the common benchmark YCB object and model set. We provide full ground truth 6DoF poses for each object, per-pixel segmentation, 2D and 3D bounding boxes and a measure of the amount of occlusion of each object.

If you use this dataset in your research, please cite the following publication:

T. Grenzdörffer, M. Günther, and J. Hertzberg, “YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31-June 4, 2020. IEEE, 2020.

@InProceedings{Grenzdoerffer2020ycbm, title = {{YCB-M}: A Multi-Camera {RGB-D} Dataset for Object Recognition and {6DoF} Pose Estimation}, author = {Grenzd{\"{o}}rffer, Till and G{\"{u}}nther, Martin and Hertzberg, Joachim}, booktitle = {2020 {IEEE} International Conference on Robotics and Automation, {ICRA} 2020, Paris, France, May 31-June 4, 2020}, year = {2020}, publisher = {{IEEE}} }

This paper is also available on arXiv: https://arxiv.org/abs/2004.11657

To visualize the dataset, follow these instructions (tested on Ubuntu Xenial 16.04):

# IMPORTANT: the ROS setup.bash must NOT be sourced, otherwise the following error occurs: # ImportError: /opt/ros/kinetic/lib/python2.7/dist-packages/cv2.so: undefined symbol: PyCObject_Type # nvdu requires Python 3.5 or 3.6 sudo add-apt-repository -y ppa:deadsnakes/ppa # to get python3.6 on Ubuntu Xenial sudo apt-get update sudo apt-get install -y python3.6 libsm6 libxext6 libxrender1 python-virtualenv python-pip # create a new virtual environment virtualenv -p python3.6 venv_nvdu cd venv_nvdu/ source bin/activate # clone our fork of NVIDIA's Dataset Utilities that incorporates some essential fixes pip install -e 'git+https://github.com/mintar/Dataset_Utilities.git#egg=nvdu' # download and transform the meshes # (alternatively, unzip the meshes contained in the dataset # to

For further details, see README.md.

Facebook

Twitter

Click to copy link

Link copied

Cite

atk (2023). Python_class Dataset [Dataset]. https://universe.roboflow.com/atk-s1vv8/python_class/model/1

Python_class Dataset

python_class

python_class-dataset

Explore at:

61 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Dataset updated

Nov 18, 2023

Dataset authored and provided by

atk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Atk Bounding Boxes

Description

Python_class

## Overview

Python_class is a dataset for object detection tasks - it contains Atk annotations for 967 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Python_class Dataset

Python_class

imagenette

Custom Yolov7 On Kaggle On Custom Dataset

Custom Training with YOLOv7 🔥

Some Important links

Contact Information

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

Custom Training with YOLOv7 🔥

📦 YOLOv7

Step 1: Install Requirements

Downloading YOLOV7 starting checkpoint

Step 2: Assemble Our Dataset

Version v2 Aug 12, 2022 Looks like this.

Step 3: Training Custom pretrained YOLOv7 model

Dataset for class comment analysis

Data from: CADDI: An in-Class Activity Detection Dataset using IMU data from...

coco

Tiny ImageNet

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...

Tree Point Classification

Caltech-101

Power Line Classification

clinc_oos

vqa

VQA

What is VQA?

Dataset

Usage

DATS 6401 - Final Project - Yon ho Cheong.zip

A Semantically Annotated 15-Class Ground Truth Dataset for Substation...

UCI and OpenML Data Sets for Ordinal Quantification

Glass Binary Data

Tree Point Classification - New Zealand

Data from: YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and...

Python_class DatasetSee More Versions

python_class

python_class-dataset

Python_class

Python_class Dataset