100+ datasets found
  1. f

    Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  2. Simple datasets for Data Science learners

    • kaggle.com
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mira Küçük (2020). Simple datasets for Data Science learners [Dataset]. https://www.kaggle.com/datasets/mirakk/simple-datasets-for-data-science-learners
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mira Küçük
    Description

    Dataset

    This dataset was created by Mira Küçük

    Contents

  3. N

    Dataset for Kiawah Island, SC Census Bureau Demographics and Population...

    • neilsberg.com
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for Kiawah Island, SC Census Bureau Demographics and Population Distribution Across Age // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b79be6a5-5460-11ee-804b-3860777c1fe6/
    Explore at:
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Kiawah Island, South Carolina
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Kiawah Island population by age. The dataset can be utilized to understand the age distribution and demographics of Kiawah Island.

    Content

    The dataset constitues the following three datasets

    • Kiawah Island, SC Age Group Population Dataset: A complete breakdown of Kiawah Island age demographics from 0 to 85 years, distributed across 18 age groups
    • Kiawah Island, SC Age Cohorts Dataset: Children, Working Adults, and Seniors in Kiawah Island - Population and Percentage Analysis
    • Kiawah Island, SC Population Pyramid Dataset: Age Groups, Male and Female Population, and Total Population for Demographics Analysis

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

  4. Data sets

    • figshare.com
    xlsx
    Updated Aug 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McKay Cavanaugh (2020). Data sets [Dataset]. http://doi.org/10.6084/m9.figshare.12783944.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 21, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    McKay Cavanaugh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All raw data sets

  5. d

    Original Vector Datasets for Hawaii StreamStats

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Original Vector Datasets for Hawaii StreamStats [Dataset]. https://catalog.data.gov/dataset/original-vector-datasets-for-hawaii-streamstats
    Explore at:
    Dataset updated
    Nov 30, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Hawaii
    Description

    These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a means to document exactly which lines were used to develop the HydroDEMs. Each folder contains a line shapefile named for the 8-digit HUC code, containing the NHD flowlines that comprise the coastline for that island. The “hydrolines.shp” shapefile contains the lines that were burned into the DEM. These lines were selected from the NHD flowlines, with some minor editing in places. The “wbpolys.shp” shapefile contains the water-body polygons that were selected from the NHD and used in the bathymetric gradient process. The folders for HUCs 20010000 (Hawaii) and 20020000 (Maui) also contain a “walls.shp” shapefile, which contains the lines that were superimposed on the surface as “walls.”

  6. h

    RLAIF-V-Dataset

    • huggingface.co
    Updated Oct 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unsloth AI (2024). RLAIF-V-Dataset [Dataset]. https://huggingface.co/datasets/unsloth/RLAIF-V-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2024
    Dataset authored and provided by
    Unsloth AI
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for RLAIF-V-Dataset

    GitHub | Paper

      News:
    

    [2024.05.28] 📃 Our paper is accesible at arxiv now! [2024.05.20] 🔥 Our data is used in MiniCPM-Llama3-V 2.5, which represents the first end-side MLLM achieving GPT-4V level performance!

      Dataset Summary
    

    RLAIF-V-Dataset is a large-scale multimodal feedback dataset. The dataset provides high-quality feedback with a total number of 83,132 preference pairs, where the instructions are collected from a diverse… See the full description on the dataset page: https://huggingface.co/datasets/unsloth/RLAIF-V-Dataset.

  7. h

    AI-Generated-vs-Real-Images-Datasets

    • huggingface.co
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hem Bahadur Gurung (2025). AI-Generated-vs-Real-Images-Datasets [Dataset]. https://huggingface.co/datasets/Hemg/AI-Generated-vs-Real-Images-Datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2025
    Authors
    Hem Bahadur Gurung
    Description

    Dataset Card for "AI-Generated-vs-Real-Images-Datasets"

    More Information needed

  8. f

    Data from: Wiki-Reliability: A Large Scale Dataset for Content Reliability...

    • figshare.com
    txt
    Updated Mar 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KayYen Wong; Diego Saez-Trumper; Miriam Redi (2021). Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia [Dataset]. http://doi.org/10.6084/m9.figshare.14113799.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 14, 2021
    Dataset provided by
    figshare
    Authors
    KayYen Wong; Diego Saez-Trumper; Miriam Redi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wiki-Reliability: Machine Learning datasets for measuring content reliability on WikipediaConsists of metadata features and content text datasets, with the formats:- {template_name}_features.csv - {template_name}_difftxt.csv.gz - {template_name}_fulltxt.csv.gz For more details on the project, dataset schema, and links to data usage and benchmarking:https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia

  9. e

    Inspire data set BPL “Field path No. 129 — construction line”

    • data.europa.eu
    • gimi9.com
    wfs, wms
    Updated Jan 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Inspire data set BPL “Field path No. 129 — construction line” [Dataset]. https://data.europa.eu/data/datasets/a913ce31-9bc3-4d82-994f-044b3ea6e84d?locale=en
    Explore at:
    wms, wfsAvailable download formats
    Dataset updated
    Jan 10, 2021
    Description

    According to INSPIRE transformed development plan “Field Path No. 129 — Construction Line” of the city of Großbottwar based on an XPlanung dataset in version 5.0.

  10. w

    County-level Data Sets

    • data.wu.ac.at
    • datadiscoverystudio.org
    • +3more
    html, xls
    Updated Mar 19, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Agriculture (2014). County-level Data Sets [Dataset]. https://data.wu.ac.at/schema/data_gov/NmZkYWQ5MzQtNzVhNC00NGQzLWFjZWQtMmE2OWEyODkzNTZk
    Explore at:
    html, xlsAvailable download formats
    Dataset updated
    Mar 19, 2014
    Dataset provided by
    Department of Agriculture
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    c056e41d4875571d6a50c66832f696d7914fa6ae
    Description

    Socioeconomic indicators like the poverty rate, population change, unemployment rate, and education levels vary across the nation. ERS has compiled the latest data on these measures into a mapping and data display/download application that allows users to identify and compare States and counties on these indicators.

  11. m

    Datasets for HGS paper

    • data.mendeley.com
    Updated Aug 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    He Zhang (2019). Datasets for HGS paper [Dataset]. http://doi.org/10.17632/bymz6hdsfh.1
    Explore at:
    Dataset updated
    Aug 16, 2019
    Authors
    He Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here are the 143 datasets in "arff" format used in the HGS paper.

  12. m

    Data from: BananaSet: A Dataset of Banana Varieties in Bangladesh

    • data.mendeley.com
    Updated Jan 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Masudul Islam (2024). BananaSet: A Dataset of Banana Varieties in Bangladesh [Dataset]. http://doi.org/10.17632/35gb4v72dr.4
    Explore at:
    Dataset updated
    Jan 29, 2024
    Authors
    Md Masudul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    This dataset presents an assortment of high-resolution images that exhibit six well-known banana varieties procured from two distinct regions in Bangladesh. These bananas were thoughtfully selected from rural orchards and local markets, providing a diverse and comprehensive representation. The dataset serves as a visual reference, offering a thorough portrayal of the distinct characteristics of these banana types, which aids in their precise classification. It encompasses six distinct categories, namely, Shagor, Shabri, Champa, Anaji, Deshi, and Bichi, with a total of 1166 original images and 6000 augmented JPG images. These images were diligently captured during the period from August 01 to August 15, 2023. The dataset includes two variations: one with raw images and the other with augmented images. Each variation is further categorized into six separate folders, each dedicated to a specific banana variety. The images are of non-uniform dimensions and have a resolution of 4608 × 3456 pixels. Due to the high resolution, the initial file size amounted to 4.08 GB. Subsequently, data augmentation techniques were applied, as machine vision deep learning models require a substantial number of images for effective training. Augmentation involves transformations like scaling, shifting, shearing, zooming, and random rotation. Specific augmentation parameters included rotations within a range of 1° to 40°, width and height shifts, zoom range, and shear ranges set at 0.2. As a result, an additional 1000 augmented images were generated from the original images in each category, resulting in a dataset comprising a total of 6000 augmented images (1000 per category) with a data size of 4.73 GB.

  13. P

    CBC Dataset

    • paperswithcode.com
    Updated Sep 23, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Mahmudul Alam; Mohammad Tariqul Islam (2019). CBC Dataset [Dataset]. https://paperswithcode.com/dataset/complete-blood-count-cbc-dataset
    Explore at:
    Dataset updated
    Sep 23, 2019
    Authors
    Mohammad Mahmudul Alam; Mohammad Tariqul Islam
    Description

    The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.

  14. R

    Head Data Set 2 Dataset

    • universe.roboflow.com
    zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Innovateitt (2024). Head Data Set 2 Dataset [Dataset]. https://universe.roboflow.com/innovateitt/head-data-set-2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset authored and provided by
    Innovateitt
    Variables measured
    Heads QiDz Bounding Boxes
    Description

    Head Data Set 2

    ## Overview
    
    Head Data Set 2 is a dataset for object detection tasks - it contains Heads QiDz annotations for 2,342 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
  15. 30 Semi-Synthetic Data Sets for X-ray Diffraction Error Analysis

    • zenodo.org
    bz2
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). 30 Semi-Synthetic Data Sets for X-ray Diffraction Error Analysis [Dataset]. http://doi.org/10.5281/zenodo.15473289
    Explore at:
    bz2Available download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets as described several years ago in

    https://journals.iucr.org/d/issues/2016/04/00/gm5043/index.html#SEC10

    30 low-dose data sets (i.e. a `large' number) were recorded sequentially on beamline I04 at Diamond Light Source with identical data-collection parameters (1027 images per data set from a Pilatus 6M detector with an image width of 0.15° and 1% beam transmission at a wavelength of 1.2 Å). Scaling all of the data together indicated that radiation damage across the 30 sweeps was small; however, some systematic differences between them remained owing to factors such as beam-intensity variation.

    The photon counts from these 30 `original' sweeps were then `reshuffled' to create a population of 30 equivalent `new' data sets (for convenience, to allow reuse of the image headers) by considering every active pixel in the data set (i.e. each of around six billion) independently using the following procedure. Firstly, create a summed data set, which we call `total':

    for image in range(1027):
     for pixel in image:
      total[image][pixel] = sum(images[j][image][pixel] for j in range(30))

    Then create 30 `new' data sets with every pixel set initially to 0, and randomly redistribute each photon count from every pixel of the `total' set to the same pixel position in one of the 30 `new' sets:

    for image in range(1027):
     for pixel in image:
      for k in range(total[image, pixel]):
       rnd = random(30)
       rebin[rnd, image, pixel] += 1

  16. d

    Biodiversity by County - Distribution of Animals, Plants and Natural...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of New York (2025). Biodiversity by County - Distribution of Animals, Plants and Natural Communities [Dataset]. https://catalog.data.gov/dataset/biodiversity-by-county-distribution-of-animals-plants-and-natural-communities
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    State of New York
    Description

    The NYS Department of Environmental Conservation (DEC) collects and maintains several datasets on the locations, distribution and status of species of plants and animals. Information on distribution by county from the following three databases was extracted and compiled into this dataset. First, the New York Natural Heritage Program biodiversity database: Rare animals, rare plants, and significant natural communities. Significant natural communities are rare or high-quality wetlands, forests, grasslands, ponds, streams, and other types of habitats. Next, the 2nd NYS Breeding Bird Atlas Project database: Birds documented as breeding during the atlas project from 2000-2005. And last, DEC’s NYS Reptile and Amphibian Database: Reptiles and amphibians; most records are from the NYS Amphibian & Reptile Atlas Project (Herp Atlas) from 1990-1999.

  17. P

    LAS&T: Large Shape & Texture Dataset Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAS&T: Large Shape & Texture Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/las-t-large-shape-texture-dataset
    Explore at:
    Description

    Large Shape and Texture dataset (LAS&T) is a giant dataset of shapes and textures for tasks of visual shapes and textures identification and retrieval from single image.

    LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures

    Overview The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

    The dataset divided to several parts 3D shape recognition and retrieval.

    2D shape recognition and retrieval.

    3D Materials recognition and retrieval.

    2D Texture recognition and retrieval.

    Each can be used independently for training and testing.

    Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images

    3D shape recognition real-world images benchmark

  18. Immigration statistics data tables, year ending March 2020 second edition

    • gov.uk
    Updated Jul 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2020). Immigration statistics data tables, year ending March 2020 second edition [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-statistics-data-tables-year-ending-march-2020
    Explore at:
    Dataset updated
    Jul 27, 2020
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    The Home Office has changed the format of the published data tables for a number of areas (asylum and resettlement, entry clearance visas, extensions, citizenship, returns, detention, and sponsorship). These now include summary tables, and more detailed datasets (available on a separate page, link below). A list of all available datasets on a given topic can be found in the ‘Contents’ sheet in the ‘summary’ tables. Information on where to find historic data in the ‘old’ format is in the ‘Notes’ page of the ‘summary’ tables. The Home Office intends to make these changes in other areas in the coming publications. If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Related content

    Immigration statistics, year ending March 2020
    Immigration Statistics Quarterly Release
    Immigration Statistics User Guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Asylum and resettlement

    https://assets.publishing.service.gov.uk/media/5f1e9c14e90e0745691135e9/asylum-summary-mar-2020-tables.xlsx">Asylum and resettlement summary tables, year ending March 2020 second edition (MS Excel Spreadsheet, 123 KB)

    Detailed asylum and resettlement datasets

    Sponsorship

    https://assets.publishing.service.gov.uk/media/5ebe9d9786650c2791ec7166/sponsorship-summary-mar-2020-tables.xlsx">Sponsorship summary tables, year ending March 2020 (MS Excel Spreadsheet, 72.7 KB)

    Detailed sponsorship datasets

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/5ebe9d77d3bf7f5d37fa0d9f/visas-summary-mar-2020-tables.xlsx">Entry clearance visas summary tables, year ending March 2020 (MS Excel Spreadsheet, 66.1 KB)

    Detailed entry clearance visas datasets

    Passenger arrivals (admissions)

    https://assets.publishing.service.gov.uk/media/5ebe9e4b86650c279626e5f2/passenger-arrivals-admissions-summary-mar-2020-tables.xlsx">Passenger arrivals (admissions) summary tables, year ending March 2020 (MS Excel Spreadsheet, 76.1 KB)

    Detailed Passengers initially refused entry at port datasets

    Extensions

    https://assets.publishing.service.gov.uk/media/5ebe9edb86650c2791ec7167/extentions-summary-mar-2020-tables.xlsx">Extensions summary tables, year ending March 2020 (MS Excel Spreadsheet, 41.8 KB)

    <a href="https://www.gov.uk/government/statistical-da

  19. P

    Sound Events for Surveillance Applications Dataset

    • paperswithcode.com
    • explore.openaire.eu
    • +2more
    Updated Feb 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spadini (2021). Sound Events for Surveillance Applications Dataset [Dataset]. https://paperswithcode.com/dataset/sound-events-for-surveillance-applications
    Explore at:
    Dataset updated
    Feb 19, 2021
    Authors
    Spadini
    Description

    The Sound Events for Surveillance Applications (SESA) dataset files were obtained from Freesound. The dataset was divided between train (480 files) and test (105 files) folders. All audio files are WAV, Mono-Channel, 16 kHz, and 8-bit with up to 33 seconds. # Classes: 0 - Casual (not a threat) 1 - Gunshot 2 - Explosion 3 - Siren (also contains alarms).

  20. CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine...

    • data.csiro.au
    • researchdata.edu.au
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li (2022). CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine learning ( Deep Learning ) [Dataset]. http://doi.org/10.25919/4v55-dn16
    Explore at:
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    May 1, 2015 - Aug 31, 2022
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    ESA
    Description

    What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.

    This binary dataset contains chips labelled as: - "0" for chips not containing any oil features (look-alikes or clean seas)
    - "1" for those containing oil features.

    This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.

    Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.

    Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905

    Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1

Orange dataset table

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsxAvailable download formats
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

Search
Clear search
Close search
Google apps
Main menu