Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
This dataset was created by Mira Küçük
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Kiawah Island population by age. The dataset can be utilized to understand the age distribution and demographics of Kiawah Island.
The dataset constitues the following three datasets
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All raw data sets
These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a means to document exactly which lines were used to develop the HydroDEMs. Each folder contains a line shapefile named for the 8-digit HUC code, containing the NHD flowlines that comprise the coastline for that island. The “hydrolines.shp” shapefile contains the lines that were burned into the DEM. These lines were selected from the NHD flowlines, with some minor editing in places. The “wbpolys.shp” shapefile contains the water-body polygons that were selected from the NHD and used in the bathymetric gradient process. The folders for HUCs 20010000 (Hawaii) and 20020000 (Maui) also contain a “walls.shp” shapefile, which contains the lines that were superimposed on the surface as “walls.”
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for RLAIF-V-Dataset
GitHub | Paper
News:
[2024.05.28] 📃 Our paper is accesible at arxiv now! [2024.05.20] 🔥 Our data is used in MiniCPM-Llama3-V 2.5, which represents the first end-side MLLM achieving GPT-4V level performance!
Dataset Summary
RLAIF-V-Dataset is a large-scale multimodal feedback dataset. The dataset provides high-quality feedback with a total number of 83,132 preference pairs, where the instructions are collected from a diverse… See the full description on the dataset page: https://huggingface.co/datasets/unsloth/RLAIF-V-Dataset.
Dataset Card for "AI-Generated-vs-Real-Images-Datasets"
More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wiki-Reliability: Machine Learning datasets for measuring content reliability on WikipediaConsists of metadata features and content text datasets, with the formats:- {template_name}_features.csv - {template_name}_difftxt.csv.gz - {template_name}_fulltxt.csv.gz For more details on the project, dataset schema, and links to data usage and benchmarking:https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia
According to INSPIRE transformed development plan “Field Path No. 129 — Construction Line” of the city of Großbottwar based on an XPlanung dataset in version 5.0.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Socioeconomic indicators like the poverty rate, population change, unemployment rate, and education levels vary across the nation. ERS has compiled the latest data on these measures into a mapping and data display/download application that allows users to identify and compare States and counties on these indicators.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the 143 datasets in "arff" format used in the HGS paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents an assortment of high-resolution images that exhibit six well-known banana varieties procured from two distinct regions in Bangladesh. These bananas were thoughtfully selected from rural orchards and local markets, providing a diverse and comprehensive representation. The dataset serves as a visual reference, offering a thorough portrayal of the distinct characteristics of these banana types, which aids in their precise classification. It encompasses six distinct categories, namely, Shagor, Shabri, Champa, Anaji, Deshi, and Bichi, with a total of 1166 original images and 6000 augmented JPG images. These images were diligently captured during the period from August 01 to August 15, 2023. The dataset includes two variations: one with raw images and the other with augmented images. Each variation is further categorized into six separate folders, each dedicated to a specific banana variety. The images are of non-uniform dimensions and have a resolution of 4608 × 3456 pixels. Due to the high resolution, the initial file size amounted to 4.08 GB. Subsequently, data augmentation techniques were applied, as machine vision deep learning models require a substantial number of images for effective training. Augmentation involves transformations like scaling, shifting, shearing, zooming, and random rotation. Specific augmentation parameters included rotations within a range of 1° to 40°, width and height shifts, zoom range, and shear ranges set at 0.2. As a result, an additional 1000 augmented images were generated from the original images in each category, resulting in a dataset comprising a total of 6000 augmented images (1000 per category) with a data size of 4.73 GB.
The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.
## Overview
Head Data Set 2 is a dataset for object detection tasks - it contains Heads QiDz annotations for 2,342 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets as described several years ago in
https://journals.iucr.org/d/issues/2016/04/00/gm5043/index.html#SEC10
30 low-dose data sets (i.e. a `large' number) were recorded sequentially on beamline I04 at Diamond Light Source with identical data-collection parameters (1027 images per data set from a Pilatus 6M detector with an image width of 0.15° and 1% beam transmission at a wavelength of 1.2 Å). Scaling all of the data together indicated that radiation damage across the 30 sweeps was small; however, some systematic differences between them remained owing to factors such as beam-intensity variation.
The photon counts from these 30 `original' sweeps were then `reshuffled' to create a population of 30 equivalent `new' data sets (for convenience, to allow reuse of the image headers) by considering every active pixel in the data set (i.e. each of around six billion) independently using the following procedure. Firstly, create a summed data set, which we call `total':
for image in range(1027):
for pixel in image:
total[image][pixel] = sum(images[j][image][pixel] for j in range(30))
Then create 30 `new' data sets with every pixel set initially to 0, and randomly redistribute each photon count from every pixel of the `total' set to the same pixel position in one of the 30 `new' sets:
for image in range(1027):
for pixel in image:
for k in range(total[image, pixel]):
rnd = random(30)
rebin[rnd, image, pixel] += 1
The NYS Department of Environmental Conservation (DEC) collects and maintains several datasets on the locations, distribution and status of species of plants and animals. Information on distribution by county from the following three databases was extracted and compiled into this dataset. First, the New York Natural Heritage Program biodiversity database: Rare animals, rare plants, and significant natural communities. Significant natural communities are rare or high-quality wetlands, forests, grasslands, ponds, streams, and other types of habitats. Next, the 2nd NYS Breeding Bird Atlas Project database: Birds documented as breeding during the atlas project from 2000-2005. And last, DEC’s NYS Reptile and Amphibian Database: Reptiles and amphibians; most records are from the NYS Amphibian & Reptile Atlas Project (Herp Atlas) from 1990-1999.
Large Shape and Texture dataset (LAS&T) is a giant dataset of shapes and textures for tasks of visual shapes and textures identification and retrieval from single image.
LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures
Overview The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.
The dataset divided to several parts 3D shape recognition and retrieval.
2D shape recognition and retrieval.
3D Materials recognition and retrieval.
2D Texture recognition and retrieval.
Each can be used independently for training and testing.
Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images
3D shape recognition real-world images benchmark
The Home Office has changed the format of the published data tables for a number of areas (asylum and resettlement, entry clearance visas, extensions, citizenship, returns, detention, and sponsorship). These now include summary tables, and more detailed datasets (available on a separate page, link below). A list of all available datasets on a given topic can be found in the ‘Contents’ sheet in the ‘summary’ tables. Information on where to find historic data in the ‘old’ format is in the ‘Notes’ page of the ‘summary’ tables. The Home Office intends to make these changes in other areas in the coming publications. If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
Immigration statistics, year ending March 2020
Immigration Statistics Quarterly Release
Immigration Statistics User Guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/5f1e9c14e90e0745691135e9/asylum-summary-mar-2020-tables.xlsx">Asylum and resettlement summary tables, year ending March 2020 second edition (MS Excel Spreadsheet, 123 KB)
Detailed asylum and resettlement datasets
https://assets.publishing.service.gov.uk/media/5ebe9d9786650c2791ec7166/sponsorship-summary-mar-2020-tables.xlsx">Sponsorship summary tables, year ending March 2020 (MS Excel Spreadsheet, 72.7 KB)
https://assets.publishing.service.gov.uk/media/5ebe9d77d3bf7f5d37fa0d9f/visas-summary-mar-2020-tables.xlsx">Entry clearance visas summary tables, year ending March 2020 (MS Excel Spreadsheet, 66.1 KB)
Detailed entry clearance visas datasets
https://assets.publishing.service.gov.uk/media/5ebe9e4b86650c279626e5f2/passenger-arrivals-admissions-summary-mar-2020-tables.xlsx">Passenger arrivals (admissions) summary tables, year ending March 2020 (MS Excel Spreadsheet, 76.1 KB)
Detailed Passengers initially refused entry at port datasets
https://assets.publishing.service.gov.uk/media/5ebe9edb86650c2791ec7167/extentions-summary-mar-2020-tables.xlsx">Extensions summary tables, year ending March 2020 (MS Excel Spreadsheet, 41.8 KB)
The Sound Events for Surveillance Applications (SESA) dataset files were obtained from Freesound. The dataset was divided between train (480 files) and test (105 files) folders. All audio files are WAV, Mono-Channel, 16 kHz, and 8-bit with up to 33 seconds. # Classes: 0 - Casual (not a threat) 1 - Gunshot 2 - Explosion 3 - Siren (also contains alarms).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.
This binary dataset contains chips labelled as:
- "0" for chips not containing any oil features (look-alikes or clean seas)
- "1" for those containing oil features.
This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.
Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.
Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905
Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.