100+ datasets found

f
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Simple datasets for Data Science learners
kaggle.com
Updated Jun 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mira Küçük (2020). Simple datasets for Data Science learners [Dataset]. https://www.kaggle.com/datasets/mirakk/simple-datasets-for-data-science-learners
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 2, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mira Küçük
Description
Dataset

This dataset was created by Mira Küçük

Contents
N
Dataset for Kiawah Island, SC Census Bureau Demographics and Population...
neilsberg.com
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Dataset for Kiawah Island, SC Census Bureau Demographics and Population Distribution Across Age // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b79be6a5-5460-11ee-804b-3860777c1fe6/
Explore at:
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Kiawah Island, South Carolina
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Kiawah Island population by age. The dataset can be utilized to understand the age distribution and demographics of Kiawah Island.

Content

The dataset constitues the following three datasets

Kiawah Island, SC Age Group Population Dataset: A complete breakdown of Kiawah Island age demographics from 0 to 85 years, distributed across 18 age groups

Kiawah Island, SC Age Cohorts Dataset: Children, Working Adults, and Seniors in Kiawah Island - Population and Percentage Analysis

Kiawah Island, SC Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Data sets
figshare.com
xlsx
Updated Aug 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McKay Cavanaugh (2020). Data sets [Dataset]. http://doi.org/10.6084/m9.figshare.12783944.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12783944.v1
Dataset updated
Aug 21, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
McKay Cavanaugh
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
All raw data sets
d
Original Vector Datasets for Hawaii StreamStats
catalog.data.gov
datadiscoverystudio.org
+3more
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Original Vector Datasets for Hawaii StreamStats [Dataset]. https://catalog.data.gov/dataset/original-vector-datasets-for-hawaii-streamstats
Explore at:
Dataset updated
Nov 30, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Hawaii
Description
These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a means to document exactly which lines were used to develop the HydroDEMs. Each folder contains a line shapefile named for the 8-digit HUC code, containing the NHD flowlines that comprise the coastline for that island. The “hydrolines.shp” shapefile contains the lines that were burned into the DEM. These lines were selected from the NHD flowlines, with some minor editing in places. The “wbpolys.shp” shapefile contains the water-body polygons that were selected from the NHD and used in the bathymetric gradient process. The folders for HUCs 20010000 (Hawaii) and 20020000 (Maui) also contain a “walls.shp” shapefile, which contains the lines that were superimposed on the surface as “walls.”
h
RLAIF-V-Dataset
huggingface.co
Updated Oct 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unsloth AI (2024). RLAIF-V-Dataset [Dataset]. https://huggingface.co/datasets/unsloth/RLAIF-V-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2024
Dataset authored and provided by
Unsloth AI
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for RLAIF-V-Dataset

GitHub | Paper

News:

[2024.05.28] 📃 Our paper is accesible at arxiv now! [2024.05.20] 🔥 Our data is used in MiniCPM-Llama3-V 2.5, which represents the first end-side MLLM achieving GPT-4V level performance!

Dataset Summary

RLAIF-V-Dataset is a large-scale multimodal feedback dataset. The dataset provides high-quality feedback with a total number of 83,132 preference pairs, where the instructions are collected from a diverse… See the full description on the dataset page: https://huggingface.co/datasets/unsloth/RLAIF-V-Dataset.
h
AI-Generated-vs-Real-Images-Datasets
huggingface.co
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hem Bahadur Gurung (2025). AI-Generated-vs-Real-Images-Datasets [Dataset]. https://huggingface.co/datasets/Hemg/AI-Generated-vs-Real-Images-Datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2025
Authors
Hem Bahadur Gurung
Description
Dataset Card for "AI-Generated-vs-Real-Images-Datasets"

More Information needed
f
Data from: Wiki-Reliability: A Large Scale Dataset for Content Reliability...
figshare.com
txt
Updated Mar 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KayYen Wong; Diego Saez-Trumper; Miriam Redi (2021). Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia [Dataset]. http://doi.org/10.6084/m9.figshare.14113799.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14113799.v4
Dataset updated
Mar 14, 2021
Dataset provided by
figshare
Authors
KayYen Wong; Diego Saez-Trumper; Miriam Redi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wiki-Reliability: Machine Learning datasets for measuring content reliability on WikipediaConsists of metadata features and content text datasets, with the formats:- {template_name}_features.csv - {template_name}_difftxt.csv.gz - {template_name}_fulltxt.csv.gz For more details on the project, dataset schema, and links to data usage and benchmarking:https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia
e
Inspire data set BPL “Field path No. 129 — construction line”
data.europa.eu
gimi9.com
wfs, wms
Updated Jan 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Inspire data set BPL “Field path No. 129 — construction line” [Dataset]. https://data.europa.eu/data/datasets/a913ce31-9bc3-4d82-994f-044b3ea6e84d?locale=en
Explore at:
wms, wfsAvailable download formats
Dataset updated
Jan 10, 2021
Description
According to INSPIRE transformed development plan “Field Path No. 129 — Construction Line” of the city of Großbottwar based on an XPlanung dataset in version 5.0.
w
County-level Data Sets
data.wu.ac.at
datadiscoverystudio.org
+3more
html, xls
Updated Mar 19, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Agriculture (2014). County-level Data Sets [Dataset]. https://data.wu.ac.at/schema/data_gov/NmZkYWQ5MzQtNzVhNC00NGQzLWFjZWQtMmE2OWEyODkzNTZk
Explore at:
html, xlsAvailable download formats
Dataset updated
Mar 19, 2014
Dataset provided by
Department of Agriculture
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
c056e41d4875571d6a50c66832f696d7914fa6ae
Description
Socioeconomic indicators like the poverty rate, population change, unemployment rate, and education levels vary across the nation. ERS has compiled the latest data on these measures into a mapping and data display/download application that allows users to identify and compare States and counties on these indicators.
m
Datasets for HGS paper
data.mendeley.com
Updated Aug 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He Zhang (2019). Datasets for HGS paper [Dataset]. http://doi.org/10.17632/bymz6hdsfh.1
Explore at:
Unique identifier
https://doi.org/10.17632/bymz6hdsfh.1
Dataset updated
Aug 16, 2019
Authors
He Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here are the 143 datasets in "arff" format used in the HGS paper.
m
Data from: BananaSet: A Dataset of Banana Varieties in Bangladesh
data.mendeley.com
Updated Jan 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Masudul Islam (2024). BananaSet: A Dataset of Banana Varieties in Bangladesh [Dataset]. http://doi.org/10.17632/35gb4v72dr.4
Explore at:
Unique identifier
https://doi.org/10.17632/35gb4v72dr.4
Dataset updated
Jan 29, 2024
Authors
Md Masudul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
This dataset presents an assortment of high-resolution images that exhibit six well-known banana varieties procured from two distinct regions in Bangladesh. These bananas were thoughtfully selected from rural orchards and local markets, providing a diverse and comprehensive representation. The dataset serves as a visual reference, offering a thorough portrayal of the distinct characteristics of these banana types, which aids in their precise classification. It encompasses six distinct categories, namely, Shagor, Shabri, Champa, Anaji, Deshi, and Bichi, with a total of 1166 original images and 6000 augmented JPG images. These images were diligently captured during the period from August 01 to August 15, 2023. The dataset includes two variations: one with raw images and the other with augmented images. Each variation is further categorized into six separate folders, each dedicated to a specific banana variety. The images are of non-uniform dimensions and have a resolution of 4608 × 3456 pixels. Due to the high resolution, the initial file size amounted to 4.08 GB. Subsequently, data augmentation techniques were applied, as machine vision deep learning models require a substantial number of images for effective training. Augmentation involves transformations like scaling, shifting, shearing, zooming, and random rotation. Specific augmentation parameters included rotations within a range of 1° to 40°, width and height shifts, zoom range, and shear ranges set at 0.2. As a result, an additional 1000 augmented images were generated from the original images in each category, resulting in a dataset comprising a total of 6000 augmented images (1000 per category) with a data size of 4.73 GB.
P
CBC Dataset
paperswithcode.com
Updated Sep 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Mahmudul Alam; Mohammad Tariqul Islam (2019). CBC Dataset [Dataset]. https://paperswithcode.com/dataset/complete-blood-count-cbc-dataset
Explore at:
Dataset updated
Sep 23, 2019
Authors
Mohammad Mahmudul Alam; Mohammad Tariqul Islam
Description
The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.
R
Head Data Set 2 Dataset
universe.roboflow.com
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Innovateitt (2024). Head Data Set 2 Dataset [Dataset]. https://universe.roboflow.com/innovateitt/head-data-set-2
Explore at:
zipAvailable download formats
Dataset updated
Oct 1, 2024
Dataset authored and provided by
Innovateitt
Variables measured
Heads QiDz Bounding Boxes
Description
Head Data Set 2

## Overview Head Data Set 2 is a dataset for object detection tasks - it contains Heads QiDz annotations for 2,342 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
30 Semi-Synthetic Data Sets for X-ray Diffraction Error Analysis
zenodo.org
bz2
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). 30 Semi-Synthetic Data Sets for X-ray Diffraction Error Analysis [Dataset]. http://doi.org/10.5281/zenodo.15473289
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.15473289
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets as described several years ago in

https://journals.iucr.org/d/issues/2016/04/00/gm5043/index.html#SEC10

30 low-dose data sets (i.e. a `large' number) were recorded sequentially on beamline I04 at Diamond Light Source with identical data-collection parameters (1027 images per data set from a Pilatus 6M detector with an image width of 0.15° and 1% beam transmission at a wavelength of 1.2 Å). Scaling all of the data together indicated that radiation damage across the 30 sweeps was small; however, some systematic differences between them remained owing to factors such as beam-intensity variation.

The photon counts from these 30 `original' sweeps were then `reshuffled' to create a population of 30 equivalent `new' data sets (for convenience, to allow reuse of the image headers) by considering every active pixel in the data set (i.e. each of around six billion) independently using the following procedure. Firstly, create a summed data set, which we call `total':

for image in range(1027): for pixel in image: total[image][pixel] = sum(images[j][image][pixel] for j in range(30))

Then create 30 `new' data sets with every pixel set initially to 0, and randomly redistribute each photon count from every pixel of the `total' set to the same pixel position in one of the 30 `new' sets:

for image in range(1027): for pixel in image: for k in range(total[image, pixel]): rnd = random(30) rebin[rnd, image, pixel] += 1
d
Biodiversity by County - Distribution of Animals, Plants and Natural...
catalog.data.gov
datasets.ai
+2more
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2025). Biodiversity by County - Distribution of Animals, Plants and Natural Communities [Dataset]. https://catalog.data.gov/dataset/biodiversity-by-county-distribution-of-animals-plants-and-natural-communities
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
State of New York
Description
The NYS Department of Environmental Conservation (DEC) collects and maintains several datasets on the locations, distribution and status of species of plants and animals. Information on distribution by county from the following three databases was extracted and compiled into this dataset. First, the New York Natural Heritage Program biodiversity database: Rare animals, rare plants, and significant natural communities. Significant natural communities are rare or high-quality wetlands, forests, grasslands, ponds, streams, and other types of habitats. Next, the 2nd NYS Breeding Bird Atlas Project database: Birds documented as breeding during the atlas project from 2000-2005. And last, DEC’s NYS Reptile and Amphibian Database: Reptiles and amphibians; most records are from the NYS Amphibian & Reptile Atlas Project (Herp Atlas) from 1990-1999.
P
LAS&T: Large Shape & Texture Dataset Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAS&T: Large Shape & Texture Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/las-t-large-shape-texture-dataset
Explore at:
Description
Large Shape and Texture dataset (LAS&T) is a giant dataset of shapes and textures for tasks of visual shapes and textures identification and retrieval from single image.

LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures

Overview The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

The dataset divided to several parts 3D shape recognition and retrieval.

2D shape recognition and retrieval.

3D Materials recognition and retrieval.

2D Texture recognition and retrieval.

Each can be used independently for training and testing.

Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images

3D shape recognition real-world images benchmark
Immigration statistics data tables, year ending March 2020 second edition
gov.uk
Updated Jul 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2020). Immigration statistics data tables, year ending March 2020 second edition [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-statistics-data-tables-year-ending-march-2020
Explore at:
Dataset updated
Jul 27, 2020
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Home Office
Description
The Home Office has changed the format of the published data tables for a number of areas (asylum and resettlement, entry clearance visas, extensions, citizenship, returns, detention, and sponsorship). These now include summary tables, and more detailed datasets (available on a separate page, link below). A list of all available datasets on a given topic can be found in the ‘Contents’ sheet in the ‘summary’ tables. Information on where to find historic data in the ‘old’ format is in the ‘Notes’ page of the ‘summary’ tables. The Home Office intends to make these changes in other areas in the coming publications. If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

Related content

Immigration statistics, year ending March 2020
Immigration Statistics Quarterly Release
Immigration Statistics User Guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives

Asylum and resettlement

https://assets.publishing.service.gov.uk/media/5f1e9c14e90e0745691135e9/asylum-summary-mar-2020-tables.xlsx">Asylum and resettlement summary tables, year ending March 2020 second edition (MS Excel Spreadsheet, 123 KB)

Detailed asylum and resettlement datasets

Sponsorship

https://assets.publishing.service.gov.uk/media/5ebe9d9786650c2791ec7166/sponsorship-summary-mar-2020-tables.xlsx">Sponsorship summary tables, year ending March 2020 (MS Excel Spreadsheet, 72.7 KB)

Detailed sponsorship datasets

Entry clearance visas granted outside the UK

https://assets.publishing.service.gov.uk/media/5ebe9d77d3bf7f5d37fa0d9f/visas-summary-mar-2020-tables.xlsx">Entry clearance visas summary tables, year ending March 2020 (MS Excel Spreadsheet, 66.1 KB)

Detailed entry clearance visas datasets

Passenger arrivals (admissions)

https://assets.publishing.service.gov.uk/media/5ebe9e4b86650c279626e5f2/passenger-arrivals-admissions-summary-mar-2020-tables.xlsx">Passenger arrivals (admissions) summary tables, year ending March 2020 (MS Excel Spreadsheet, 76.1 KB)

Detailed Passengers initially refused entry at port datasets

Extensions

https://assets.publishing.service.gov.uk/media/5ebe9edb86650c2791ec7167/extentions-summary-mar-2020-tables.xlsx">Extensions summary tables, year ending March 2020 (MS Excel Spreadsheet, 41.8 KB)

<a href="https://www.gov.uk/government/statistical-da
P
Sound Events for Surveillance Applications Dataset
paperswithcode.com
explore.openaire.eu
+2more
Updated Feb 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spadini (2021). Sound Events for Surveillance Applications Dataset [Dataset]. https://paperswithcode.com/dataset/sound-events-for-surveillance-applications
Explore at:
Dataset updated
Feb 19, 2021
Authors
Spadini
Description
The Sound Events for Surveillance Applications (SESA) dataset files were obtained from Freesound. The dataset was divided between train (480 files) and test (105 files) folders. All audio files are WAV, Mono-Channel, 16 kHz, and 8-bit with up to 33 seconds. # Classes: 0 - Casual (not a threat) 1 - Gunshot 2 - Explosion 3 - Siren (also contains alarms).
CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine...
data.csiro.au
researchdata.edu.au
Updated Dec 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li (2022). CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine learning ( Deep Learning ) [Dataset]. http://doi.org/10.25919/4v55-dn16
Explore at:
Unique identifier
https://doi.org/10.25919/4v55-dn16
Dataset updated
Dec 15, 2022
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
May 1, 2015 - Aug 31, 2022
Area covered

Dataset funded by
CSIROhttp://www.csiro.au/
ESA
Description
What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.

This binary dataset contains chips labelled as: - "0" for chips not containing any oil features (look-alikes or clean seas)
- "1" for those containing oil features.

This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.

Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.

Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905

Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)

Facebook

Twitter

Click to copy link

Link copied

Cite

Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1

Orange dataset table

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.19146410.v1

Dataset updated

Mar 4, 2022

Dataset provided by

figshare

Authors

Rui Simões

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

Clear search

Close search

Google apps

Main menu

Orange dataset table

Simple datasets for Data Science learners

Dataset

Contents

Dataset for Kiawah Island, SC Census Bureau Demographics and Population...

About this dataset

Content

Inspiration

Data sets

Original Vector Datasets for Hawaii StreamStats

RLAIF-V-Dataset

AI-Generated-vs-Real-Images-Datasets

Data from: Wiki-Reliability: A Large Scale Dataset for Content Reliability...

Inspire data set BPL “Field path No. 129 — construction line”

County-level Data Sets

Datasets for HGS paper

Data from: BananaSet: A Dataset of Banana Varieties in Bangladesh

CBC Dataset

Head Data Set 2 Dataset

Head Data Set 2

30 Semi-Synthetic Data Sets for X-ray Diffraction Error Analysis

Biodiversity by County - Distribution of Animals, Plants and Natural...

LAS&T: Large Shape & Texture Dataset Dataset

Immigration statistics data tables, year ending March 2020 second edition

Related content

Asylum and resettlement

Sponsorship

Entry clearance visas granted outside the UK

Passenger arrivals (admissions)

Extensions

Sound Events for Surveillance Applications Dataset

CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine...

Orange dataset table