12 datasets found
  1. Boston-Housing-Dataset

    • kaggle.com
    zip
    Updated Dec 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohan Saha (2021). Boston-Housing-Dataset [Dataset]. https://www.kaggle.com/datasets/simpleparadox/bostonhousingdataset/discussion
    Explore at:
    zip(13140 bytes)Available download formats
    Dataset updated
    Dec 25, 2021
    Authors
    Rohan Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This is a copy of the original Boston Housing Dataset. As of December 2021, the original link doesn't contain the dataset so I'm uploading it if anyone wants to use it. I'll implement a linear regression model to predict the output 'MEDV' variable using PyTorch (check the companion notebook).

    I took the data given in this link and processed it to include the column names as well.

    Acknowledgements

    https://www.kaggle.com/prasadperera/the-boston-housing-dataset/data

    Inspiration

    Good luck on your data science career :)

  2. d

    Secondary Input Data used in Developing Stochastically Generated Climate and...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Secondary Input Data used in Developing Stochastically Generated Climate and Streamflow Conditions in the Souris River Basin, United States and Canada, [Dataset]. https://catalog.data.gov/dataset/secondary-input-data-used-in-developing-stochastically-generated-climate-and-streamflow-co
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Canada, Souris River, United States
    Description

    i. .\File_Mapping.csv: This file relates historical reconstructed hydrology streamflow from the U.S. Army Corps of Engineers () to the appropriate stochastic streamflow file for disaggregation of streamflow. Column A is an assigned ID, column B is named “Stochastic” and is the stochastic streamflow file needed for disaggregation, column c is called “RH_Ratio_Col” and is the name of the column in the reconstructed hydrology dataset associated with a stochastic streamflow file, and column D is named “Col_Num” and is the column number in the reconstructed hydrology dataset with the name given in column C. ii. .\Original_Draw_YearDat.csv: This file contains the historical year from 1930 to 2017 with the closest total streamflow for the Souris River Basin to each year in the stochastic streamflow dataset. Column A is an index number, column B is named “V1” and is the year in a simulation, column C is called “V2” and is the stochastic simulation number, column D is an integer that can be related to historical years by adding 1929, and column D is named “year” and is the historical year with the closest total Souris River Basin streamflow volume to the associated year in the stochastic traces. iii. .\revdrawyr.csv: This file is setup the same way that .\Original_Draw_YearDat.csv was except that, when a year had over 400 occurrences, it was randomly replaced with one of the 20 other closest years. The replacement process was completed until there were less than 400 occurrences of each reconstructed hydrology year associated with stochastic simulation years. Column A is an index number, column B is named “V1” and is the year in a simulation, column C is called “V2” and is the stochastic simulation number, column D is called “V3” and is the historical year who’s streamflow ratios will be multiplied by stochastic streamflow, and column E is called “Stoch_yr” and is the total of 2999 and the year in column B. iv. .\RH_1930_2017.csv: This file contains the daily streamflow from the U.S. Army Corps of Engineers (2020), reconstructed hydrology for the Souris River Basin for the period of 1930 to 2017. Column A is the date and columns B through AA are the daily streamflow in cubic feet per second. v. .\rhmoflow_1930Present.csv: This file was created based on .\RH_1930_2017.csv and provides streamflow for each site in cubic meters for a given month. Column A is an unnamed index column, column B is historical year, column C is the historical month associated with the historical year, column D provides a day equal to 1 but does not have particular significance and columns E through AD are monthly streamflow volume for each site location. vi. .\Stoch_Annual_TotVol_CubicDecameters.csv: This file contains the total volume of streamflow for each of the 26 sites for each month in the stochastic streamflow time timeseries and provides a total streamflow volume divided by 100,000 on a monthly basis for the entire Souris River Basin. Column A is unnamed and contains an index number, column B is month and is named “V1”, column C is the year in a simulation, column D is the simulation number, columns E (V4 through V29) through AD are streamflow volume in cubic meters, and column AE (V30) is total Souris River Basin monthly streamflow volume in cubic decameters/1,000.

  3. i

    Vertical 1-dbar averaged temperature and salinity profiles in Hornsund Fjord...

    • dataportal.igf.edu.pl
    Updated Jun 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Vertical 1-dbar averaged temperature and salinity profiles in Hornsund Fjord - Dataset - IG PAS Data Portal [Dataset]. https://dataportal.igf.edu.pl/dataset/inter-calibrated-temperature-and-salinity-in-depth-profiles-in-hornsund-fjord
    Explore at:
    Dataset updated
    Jun 9, 2022
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Hornsund
    Description

    Dataset of vertical temperature and salinity profiles obtained at various locations across the Hornsund fjord. Several CTD instruments have been used for data collection: Valeport miniCTD, two separate SAIV A/S 208 STD/CTDs and two separate RBR concerto CTDs. The data are stored in folders organized by the year (YYYY) of measurements. Each vertical profile is stored as an individual, tab-separated ASCII file. The filenames are formed from the date (and time) of measurement followed by the instrument and station names: YYYYMMDD_instrument_station.txt or YYYYMMDDhhmmss_instrument_station.txt. Each file includes eight headerlines with information on station name, geographical location (decimal degrees), bottom depth at the location (m), date (and time) of measurement (YYYY-MM-DDThh:mm:ss), instrument and its serial number, source of financial support and data column names. There are seven data columns: pressure (dbar), depth (m), temperature (°C), potential temperature (°C), practical salinity (PSU), SigmaT density (kg/m**3) and sound velocity (m/s). The data are averaged to 1-dbar vertical bins. Before averaging, data are visually inspected and suspicious data are removed. Based on inter-calibration between the instruments, a linear correction has been calculated for temperature and conductivity and added to the measurements by SAIV A/S 208 CTD. In general, both down- and up-profiles are used for averaging. Finally, the data is interpolated and smoothed.

  4. c

    Vertical Temperature, Turbidity and Dissolved Oxygen profiles in Revvatnet...

    • polar.cenagis.edu.pl
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Vertical Temperature, Turbidity and Dissolved Oxygen profiles in Revvatnet (raw data) - Dataset - POLAR-PL Catalog [Dataset]. https://polar.cenagis.edu.pl/dataset/temperature-turbidity-and-dissolved-oxygen-profiles-revvatnet
    Explore at:
    Dataset updated
    Dec 18, 2024
    Description

    Dataset of vertical temperature, turbidity and dissolved oxygen profiles obtained from the Revvatnet, lake close to Hornsund fjord. The measurements are made with SAIV A/S 208 STD/CTD (until 2023) and two separate RBR concerto CTDs (since 2024). The data are stored in folders organized by the year (YYYY) of measurements. Each vertical profile is stored as an individual, tab-separated ASCII file. The filenames are formed from the date and time of measurement followed by the instrument, potential additional sensors and station names: YYYYMMDDhhmmss_instrument-sensors_station.txt. Each file includes eight headerlines with information on station name, geographical location (UTM), date and time of measurement (YYYY/MM/DD hh:mm), instrument and its serial number, source of financial support and data column names. The data columns include pressure (dbar), temperature (°C), turbidity (FTU/NTU), dissolved oxygen saturation (%) and dissolve oxygen concentration (mg/l). Measurements by RBR concerto CTDs have additional columns for Chlorophyll a fluorescence (μg/l) and Photosynthetically Active Radiation (PAR, μmol/m^2/s). Note that this is a raw dataset without quality control.

  5. Plankton and environmental monitoring dataset from the Iroise Marine Natural...

    • seanoe.org
    bin, csv
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laetitia Drago; Caroline Cailliau; Patrick Pouline; Beatriz Beker; Laëtitia Jalabert; Jean-Baptiste Romagnan; Sakina-Dorothée Ayata (2025). Plankton and environmental monitoring dataset from the Iroise Marine Natural Park (NE Atlantic, 2010-2023) [Dataset]. http://doi.org/10.17882/105465
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Apr 4, 2025
    Dataset provided by
    SEANOE
    Authors
    Laetitia Drago; Caroline Cailliau; Patrick Pouline; Beatriz Beker; Laëtitia Jalabert; Jean-Baptiste Romagnan; Sakina-Dorothée Ayata
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 1, 2010 - Oct 8, 2023
    Area covered
    Variables measured
    Zooplankton biovolume, Salinity of the water column, Temperature of the water column, Phytoplankton taxonomic abundance in water bodies, Zooplankton taxonomy-related abundance per unit volume of the water column
    Description

    this dataset presents a long-term monitoring record of phytoplankton (2010-2022) and zooplankton (2010-2023) taxonomic groups, alongside associated environmental parameters (surface and bottom temperature and salinity measurements) from the iroise marine natural park, france's first marine protected area. the dataset integrates traditional microscopy-based phytoplankton counts with zooplankton imaging data obtained using the zooscan (gorsky et al., 2010), as well as zooplankton biovolume and concentration data.sampling was conducted seasonally along two main coastal-offshore transects (b and d) and at three coastal stations (molène, sein, and douarnenez), capturing the spatial and temporal dynamics of plankton communities in this unique ecosystem located at the intersection of the english channel and the atlantic ocean. the region is characterized by the seasonal ushant thermal front, which creates diverse habitats supporting rich plankton communities.phytoplankton identification was performed consistently by the same taxonomist throughout the study period, resulting in a high-resolution dataset with 573 distinct taxa across the 785 phytoplankton samples. zooplankton samples (total number of samples = 650) were digitized using the zooscan imaging system (gorsky et al., 2010), with organisms automatically sorted using built-in semi automatic algorithms (random forest and convolutional neural networks) of the ecotaxa platform (picheral et al., 2017). expert taxonomists then reviewed and validated the classifications resulting in 103 taxonomic and morphological groups. individual zooplankton images are accessible through the ecotaxa web platform for further morphometric analyses.bibliographygorsky, g., ohman, m.d., picheral, m., gasparini, s., stemmann, l., romagnan, j.-b., cawood, a., pesant, s., garcia-comas, c., prejger, f., 2010. digital zooplankton image analysis using the zooscan integrated system. j. plankton res. 32, 285–303. https://doi.org/10.1093/plankt/fbp124picheral, m., colin, s., irisson, j.-o., 2017. ecotaxa, a tool for the taxonomic classification of images.worms editorial board, 2025. world register of marine species https://doi.org/10.14284/170dataset contentthe dataset contains three distinct tables all containing both text and numerical data. the first dataset integrates zooplankton measurements with their corresponding environmental parameters and is organised as follows (see also units_pnmi_data_paper.csv):metadata information (columns 1-8):station name (column 1)transect name (column 2)coordinates: longitude and latitude (columns 3-4, in dd.dddd)sampling time: date, year, month, and julian day (columns 5-8)environmental measurements:surface and bottom temperature (columns 9-10, in °c)surface and bottom salinity (columns 11-12, in psu)biological data for each taxonomic group:sample abundance in individuals/m³ (columns 13-116, prefix "conc_" + taxa name)total biovolume in mm³/m³ (columns 117-220, prefix "tot_biov_" + taxa name)mean individual biovolume in mm³ (columns 221-324, prefix "mean_biov_" + taxa name)the second dataset contains phytoplankton data and follows a similar organizational structure:metadata information (columns 1-8):station name (column 1)transect name (column 2)coordinates: longitude and latitude (columns 3-4, in dd.dddd)sampling time: date, year, month, and julian day (columns 5-8)environmental measurements:surface and bottom temperature (columns 9-10, in °c)surface and bottom salinity (columns 11-12, in psu)phytoplankton taxa concentrations:surface abundance in individuals/l (columns 13-580, prefix "surface_" + taxa name)bottom abundance in individuals/l (columns 581-1148, prefix "bottom_" + taxa name)each taxa is provided in the third dataset with the corresponding unique identifier called aphiaid from the world register of marine species (worms editorial board, 2025), which enables unambiguous species identification across databases. for the transect stations (d1 through d6 and b1 through b7), phytoplankton was initially sampled at sub-surface and bottom depths before 2017 (see table 2). following the introduction of ctd profiling in 2017, vertical profiles from 2017-2018 revealed that at offshore stations (b5-b7 and d5-d6), the chlorophyll a maximum, when present, consistently occurred between 15-18 m depth. at coastal stations (up to 40 m deep), strong vertical mixing typically maintained a homogeneous water column with no deep chlorophyll maximum, though when present, it also occurred at approximately 15 m depth. based on these observations, bottom sampling was discontinued in 2019 and replaced with sampling at 15 m depth to better capture phytoplankton biomass.

  6. Data from: A consensus compound/bioactivity dataset for data-driven drug...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information

    The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

    The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

    Structure and content of the dataset

    Dataset structure

    ChEMBL

    ID

    PubChem

    ID

    IUPHAR

    ID

    Target

    Activity

    type

    Assay typeUnitMean C (0)...Mean PC (0)...Mean B (0)...Mean I (0)...Mean PD (0)...Activity check annotationLigand namesCanonical SMILES C...Structure checkSource

    The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

    Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

    Column content:

    • ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
    • Target: biological target of the molecule expressed as the HGNC gene symbol
    • Activity type: for example, pIC50
    • Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
    • Unit: unit of bioactivity measurement
    • Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
    • Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
      • no comment: bioactivity values are within one log unit;
      • check activity data: bioactivity values are not within one log unit;
      • only one data point: only one value was available, no comparison and no range calculated;
      • no activity value: no precise numeric activity value was available;
      • no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
    • Ligand names: all unique names contained in the five source databases are listed
    • Canonical SMILES columns: Molecular structure of the compound from each database
    • Structure check: To denote matching or differing compound structures in different source databases
      • match: molecule structures are the same between different sources;
      • no match: the structures differ;
      • 1 source: no structure comparison is possible, because the molecule comes from only one source database.
    • Source: From which databases the data come from

  7. T

    The City of Edinburgh Council trees dataset

    • dtechtive.com
    • find.data.gov.scot
    xls
    Updated May 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The City of Edinburgh Council (uSmart) (2024). The City of Edinburgh Council trees dataset [Dataset]. https://dtechtive.com/datasets/39291
    Explore at:
    xls(1.499 MB), xls(2.6211 MB), xls(2.6777 MB), xls(2.6421 MB), xls(3.2524 MB), xls(1.9702 MB)Available download formats
    Dataset updated
    May 12, 2024
    Dataset provided by
    The City of Edinburgh Council (uSmart)
    Description

    The data lists trees maintained by the City of Edinburgh Council. The data set breaks down into the following fields: Column A - Primary Key Column B - Location or Tag no. Column C - Ward Column D - Site Column E - Latin name Column F - Common Name Column G - Owner Column H - NT ref Column I - Height Column J - Spread Column K - Age group Column L - DBH The data is updated on a regular basis, please contact the Open Data team if you are looking for the most up to date version. Additional metadata: - Licence: http://creativecommons.org/licenses/by-nc/2.0/

  8. Z

    Data from: A 24-hour dynamic population distribution dataset based on mobile...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
    Explore at:
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    Department of Built Environment, Aalto University / Centre for Advanced Spatial Analysis, University College London
    Unit of Urban Research and Statistics, City of Helsinki / Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
    Elisa Corporation
    Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
    Authors
    Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Helsinki Metropolitan Area, Finland
    Description

    Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

    In this dataset:

    We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

    Please cite this dataset as:

    Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

    Organization of data

    The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

    HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

    HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

    HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

    target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

    Column names

    YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

    H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

    In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

    License Creative Commons Attribution 4.0 International.

    Related datasets

    Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

    Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564

  9. Hand Sign Dataset

    • kaggle.com
    zip
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harshit Pathak (2024). Hand Sign Dataset [Dataset]. https://www.kaggle.com/datasets/harshitpathak18/hand-sign-dataset
    Explore at:
    zip(331462936 bytes)Available download formats
    Dataset updated
    Aug 26, 2024
    Authors
    Harshit Pathak
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Detailed Description of the Dataset:

    The dataset, saved as sign_data.csv, is designed for hand sign recognition and contains comprehensive data captured from hand gestures using real-time video processing. Below is a detailed description of the dataset:

    1. Dataset Composition:

    • File Name: sign_data.csv
    • Data Format: CSV (Comma-Separated Values)

    2. Data Capture Process:

    Tools Used: - Mediapipe: For detecting hand landmarks and estimating their positions. - OpenCV: For capturing video frames from a camera.

    Functionality: - Gesture Data Capture: The capture_gesture_data function records hand gestures by processing video frames in real-time. It captures data for a predefined number of rows per gesture, with distances calculated between all pairs of 21 detected hand landmarks. - Distance Calculation: For each frame, the Euclidean distance between every pair of landmarks is computed, resulting in a comprehensive feature vector for each gesture.

    3. Data Structure:

    Columns: - Distance Columns: Each distance column represents the calculated distance between a pair of hand landmarks. With 21 landmarks, there are a total of 210 unique distances (computed as ( \frac{21 \times 20}{2} )). - Gesture Label: The final column in the dataset specifies the hand sign label associated with each row of distance measurements (e.g., A, B, C, ..., Z, Space).

    Example: - Column Headers: Distance_0, Distance_1, ..., Distance_209, Sign - Rows: Each row contains the computed distances followed by the corresponding gesture label.

    4. Data Collection Details:

    Gestures Included: - Alphabet: Signs for letters A-Z. - Space: Represents the space gesture.

    Number of Samples: Data is collected for each gesture with 100 samples per sign.

    5. Purpose and Usage:

    The dataset provides detailed spatial information about hand gestures, enabling the training and evaluation of hand sign recognition models. By offering a rich set of distance measurements between hand landmarks, it supports the development of accurate and reliable sign language recognition systems. This dataset is crucial for machine learning applications that aim to bridge communication gaps for individuals with hearing or speech impairments.

  10. A Data Set to Compare Feature Extractors

    • kaggle.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat IŞIK (2024). A Data Set to Compare Feature Extractors [Dataset]. http://doi.org/10.34740/kaggle/ds/4493370
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Murat IŞIK
    Description

    This dataset is currently associated with an article that is in the process of being published. Once the publication process is completed, a reference link will be added separately. Until that time, this dataset cannot be used for any academic purposes.

    The dataset contains 196.926 images and 10 csv files.

    The images derived from "the Image Matching Challenge PhotoTourism 2020 dataset"

    https://www.cs.ubc.ca/~kmyi/imw2020/data.html

    The csv files obtained from our work to show a comprehensive comparison of well-known conventional feature extractors/descriptors, including SIFT, SURF, BRIEF, ORB, BRISK, KAZE, AKAZE, FREAK, DAISY, FAST, and STAR.

    Just for gaussian blur there is another file to see.

    The images folder contains the images utilized for this study and derived ones originated from these images. (as a total 196.926 images)

    To use results or codes from this study to nite to cite:

    Please cite this to use anything from this dataset or codes: ISIK M. 2024. Comprehensive empirical evaluation of feature extractors in computer vision. PeerJ Computer Science 10:e2415 https://doi.org/10.7717/peerj-cs.2415

    THE COLUMN NAMES: img-1 and img-2 stands for the compared image names KP stands for keyPoints, goodMatches_normal stands for matching count with Brute Force Matcher GM stands for percentage goodMatches_knn stands for matching count with kNN Matcher img-1-D-time shows duration time for keyPoints extraction for img-1 img-2-D-time shows duration time for keyPoints extraction for img-2 (compared one) img-1-C-time shows duration time for comparing keyPoints for img-1 img-2-C-time shows duration time for comparing keyPoints for img-2 (compared one) total-D-time is the total of img-1-D-time and img-2-D-time. total-C-time is the total of img-1-C-time and img-2-C-time. matcher-time_normal stands for time duration for matching process with Brute Force Matcher

    matcher-time_knn stands for time duration for matching process with kNN Matcher

    More explanation will here soon.

  11. companies Dataset

    • kaggle.com
    zip
    Updated Sep 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    omar mohmed (2023). companies Dataset [Dataset]. https://www.kaggle.com/datasets/omarmohmed/companys-dataset
    Explore at:
    zip(3961303 bytes)Available download formats
    Dataset updated
    Sep 2, 2023
    Authors
    omar mohmed
    Description

    File A is a big data file: File B is a file with already registered users, files C and D are OPT OUT files Goal is to delete everybody from File A that has opt-out or is already registered.

    SO, from file A we remove (automatically) ALL lines that contain email addresses that are present in files B, C or D.

    After this we will change the columns names a bit to fit into the right format and we are done.

  12. Titanic Dataset

    • kaggle.com
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakshi Satre (2024). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/sakshisatre/titanic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sakshi Satre
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The dataset containing information about passengers aboard the Titanic is one of the most famous datasets used in data science and machine learning. It was created to analyze and understand the factors that influenced survival rates among passengers during the tragic sinking of the RMS Titanic on April 15, 1912.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19517213%2Fd4016c159f1ad17cb30d8905192fe9d7%2Ftitanic-ship_1027017-11.avif?generation=1711562371875068&alt=media" alt="">

    Data Description :-

    The dataset is often used for predictive modeling and statistical analysis to determine which factors (such as socio-economic status, age, gender, etc.) were associated with a higher likelihood of survival. It contains 1309 rows and 14 columns.

    Columns : -

    • Pclass: Ticket class indicating the socio-economic status of the passenger. It is categorized into three classes: 1 = Upper, 2 = Middle, 3 = Lower.

    • Survived: A binary indicator that shows whether the passenger survived (1) or not (0) during the Titanic disaster. This is the target variable for analysis.

    • Name: The full name of the passenger, including title (e.g., Mr., Mrs., etc.).

    • Sex: The gender of the passenger, denoted as either male or female.

    • Age: The age of the passenger in years.

    • SibSp: The number of siblings or spouses aboard the Titanic for the respective passenger.

    • Parch: The number of parents or children aboard the Titanic for the respective passenger.

    • Ticket: The ticket number assigned to the passenger.

    • Fare: The fare paid by the passenger for the ticket.

    • Cabin: The cabin number assigned to the passenger, if available.

    • Embarked: The port of embarkation for the passenger. It can take one of three values: C = Cherbourg, Q = Queenstown, S = Southampton.

    • Boat: If the passenger survived, this column contains the identifier of the lifeboat they were rescued in.

    • Body: If the passenger did not survive, this column contains the identification number of their recovered body, if applicable.

    • Home.dest: The destination or place of residence of the passenger.

    These descriptions provide a detailed understanding of each column in the Titanic dataset subset, offering insights into the demographic, travel, and survival-related information recorded for each passenger.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rohan Saha (2021). Boston-Housing-Dataset [Dataset]. https://www.kaggle.com/datasets/simpleparadox/bostonhousingdataset/discussion
Organization logo

Boston-Housing-Dataset

The boston housing dataset with column names.

Explore at:
zip(13140 bytes)Available download formats
Dataset updated
Dec 25, 2021
Authors
Rohan Saha
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This is a copy of the original Boston Housing Dataset. As of December 2021, the original link doesn't contain the dataset so I'm uploading it if anyone wants to use it. I'll implement a linear regression model to predict the output 'MEDV' variable using PyTorch (check the companion notebook).

I took the data given in this link and processed it to include the column names as well.

Acknowledgements

https://www.kaggle.com/prasadperera/the-boston-housing-dataset/data

Inspiration

Good luck on your data science career :)

Search
Clear search
Close search
Google apps
Main menu