12 datasets found

Boston-Housing-Dataset
kaggle.com
zip
Updated Dec 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohan Saha (2021). Boston-Housing-Dataset [Dataset]. https://www.kaggle.com/datasets/simpleparadox/bostonhousingdataset/discussion
Explore at:
zip(13140 bytes)Available download formats
Dataset updated
Dec 25, 2021
Authors
Rohan Saha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This is a copy of the original Boston Housing Dataset. As of December 2021, the original link doesn't contain the dataset so I'm uploading it if anyone wants to use it. I'll implement a linear regression model to predict the output 'MEDV' variable using PyTorch (check the companion notebook).

I took the data given in this link and processed it to include the column names as well.

Acknowledgements

https://www.kaggle.com/prasadperera/the-boston-housing-dataset/data

Inspiration

Good luck on your data science career :)
d
Secondary Input Data used in Developing Stochastically Generated Climate and...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Secondary Input Data used in Developing Stochastically Generated Climate and Streamflow Conditions in the Souris River Basin, United States and Canada, [Dataset]. https://catalog.data.gov/dataset/secondary-input-data-used-in-developing-stochastically-generated-climate-and-streamflow-co
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Canada, Souris River, United States
Description
i. .\File_Mapping.csv: This file relates historical reconstructed hydrology streamflow from the U.S. Army Corps of Engineers () to the appropriate stochastic streamflow file for disaggregation of streamflow. Column A is an assigned ID, column B is named “Stochastic” and is the stochastic streamflow file needed for disaggregation, column c is called “RH_Ratio_Col” and is the name of the column in the reconstructed hydrology dataset associated with a stochastic streamflow file, and column D is named “Col_Num” and is the column number in the reconstructed hydrology dataset with the name given in column C. ii. .\Original_Draw_YearDat.csv: This file contains the historical year from 1930 to 2017 with the closest total streamflow for the Souris River Basin to each year in the stochastic streamflow dataset. Column A is an index number, column B is named “V1” and is the year in a simulation, column C is called “V2” and is the stochastic simulation number, column D is an integer that can be related to historical years by adding 1929, and column D is named “year” and is the historical year with the closest total Souris River Basin streamflow volume to the associated year in the stochastic traces. iii. .\revdrawyr.csv: This file is setup the same way that .\Original_Draw_YearDat.csv was except that, when a year had over 400 occurrences, it was randomly replaced with one of the 20 other closest years. The replacement process was completed until there were less than 400 occurrences of each reconstructed hydrology year associated with stochastic simulation years. Column A is an index number, column B is named “V1” and is the year in a simulation, column C is called “V2” and is the stochastic simulation number, column D is called “V3” and is the historical year who’s streamflow ratios will be multiplied by stochastic streamflow, and column E is called “Stoch_yr” and is the total of 2999 and the year in column B. iv. .\RH_1930_2017.csv: This file contains the daily streamflow from the U.S. Army Corps of Engineers (2020), reconstructed hydrology for the Souris River Basin for the period of 1930 to 2017. Column A is the date and columns B through AA are the daily streamflow in cubic feet per second. v. .\rhmoflow_1930Present.csv: This file was created based on .\RH_1930_2017.csv and provides streamflow for each site in cubic meters for a given month. Column A is an unnamed index column, column B is historical year, column C is the historical month associated with the historical year, column D provides a day equal to 1 but does not have particular significance and columns E through AD are monthly streamflow volume for each site location. vi. .\Stoch_Annual_TotVol_CubicDecameters.csv: This file contains the total volume of streamflow for each of the 26 sites for each month in the stochastic streamflow time timeseries and provides a total streamflow volume divided by 100,000 on a monthly basis for the entire Souris River Basin. Column A is unnamed and contains an index number, column B is month and is named “V1”, column C is the year in a simulation, column D is the simulation number, columns E (V4 through V29) through AD are streamflow volume in cubic meters, and column AE (V30) is total Souris River Basin monthly streamflow volume in cubic decameters/1,000.
i
Vertical 1-dbar averaged temperature and salinity profiles in Hornsund Fjord...
dataportal.igf.edu.pl
Updated Jun 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Vertical 1-dbar averaged temperature and salinity profiles in Hornsund Fjord - Dataset - IG PAS Data Portal [Dataset]. https://dataportal.igf.edu.pl/dataset/inter-calibrated-temperature-and-salinity-in-depth-profiles-in-hornsund-fjord
Explore at:
Dataset updated
Jun 9, 2022
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Hornsund
Description
Dataset of vertical temperature and salinity profiles obtained at various locations across the Hornsund fjord. Several CTD instruments have been used for data collection: Valeport miniCTD, two separate SAIV A/S 208 STD/CTDs and two separate RBR concerto CTDs. The data are stored in folders organized by the year (YYYY) of measurements. Each vertical profile is stored as an individual, tab-separated ASCII file. The filenames are formed from the date (and time) of measurement followed by the instrument and station names: YYYYMMDD_instrument_station.txt or YYYYMMDDhhmmss_instrument_station.txt. Each file includes eight headerlines with information on station name, geographical location (decimal degrees), bottom depth at the location (m), date (and time) of measurement (YYYY-MM-DDThh:mm:ss), instrument and its serial number, source of financial support and data column names. There are seven data columns: pressure (dbar), depth (m), temperature (°C), potential temperature (°C), practical salinity (PSU), SigmaT density (kg/m**3) and sound velocity (m/s). The data are averaged to 1-dbar vertical bins. Before averaging, data are visually inspected and suspicious data are removed. Based on inter-calibration between the instruments, a linear correction has been calculated for temperature and conductivity and added to the measurements by SAIV A/S 208 CTD. In general, both down- and up-profiles are used for averaging. Finally, the data is interpolated and smoothed.
c
Vertical Temperature, Turbidity and Dissolved Oxygen profiles in Revvatnet...
polar.cenagis.edu.pl
Updated Dec 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Vertical Temperature, Turbidity and Dissolved Oxygen profiles in Revvatnet (raw data) - Dataset - POLAR-PL Catalog [Dataset]. https://polar.cenagis.edu.pl/dataset/temperature-turbidity-and-dissolved-oxygen-profiles-revvatnet
Explore at:
Dataset updated
Dec 18, 2024
Description
Dataset of vertical temperature, turbidity and dissolved oxygen profiles obtained from the Revvatnet, lake close to Hornsund fjord. The measurements are made with SAIV A/S 208 STD/CTD (until 2023) and two separate RBR concerto CTDs (since 2024). The data are stored in folders organized by the year (YYYY) of measurements. Each vertical profile is stored as an individual, tab-separated ASCII file. The filenames are formed from the date and time of measurement followed by the instrument, potential additional sensors and station names: YYYYMMDDhhmmss_instrument-sensors_station.txt. Each file includes eight headerlines with information on station name, geographical location (UTM), date and time of measurement (YYYY/MM/DD hh:mm), instrument and its serial number, source of financial support and data column names. The data columns include pressure (dbar), temperature (°C), turbidity (FTU/NTU), dissolved oxygen saturation (%) and dissolve oxygen concentration (mg/l). Measurements by RBR concerto CTDs have additional columns for Chlorophyll a fluorescence (μg/l) and Photosynthetically Active Radiation (PAR, μmol/m^2/s). Note that this is a raw dataset without quality control.
Plankton and environmental monitoring dataset from the Iroise Marine Natural...
seanoe.org
bin, csv
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laetitia Drago; Caroline Cailliau; Patrick Pouline; Beatriz Beker; Laëtitia Jalabert; Jean-Baptiste Romagnan; Sakina-Dorothée Ayata (2025). Plankton and environmental monitoring dataset from the Iroise Marine Natural Park (NE Atlantic, 2010-2023) [Dataset]. http://doi.org/10.17882/105465
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.17882/105465
Dataset updated
Apr 4, 2025
Dataset provided by
SEANOE
Authors
Laetitia Drago; Caroline Cailliau; Patrick Pouline; Beatriz Beker; Laëtitia Jalabert; Jean-Baptiste Romagnan; Sakina-Dorothée Ayata
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 1, 2010 - Oct 8, 2023
Area covered

Variables measured
Zooplankton biovolume, Salinity of the water column, Temperature of the water column, Phytoplankton taxonomic abundance in water bodies, Zooplankton taxonomy-related abundance per unit volume of the water column
Description
this dataset presents a long-term monitoring record of phytoplankton (2010-2022) and zooplankton (2010-2023) taxonomic groups, alongside associated environmental parameters (surface and bottom temperature and salinity measurements) from the iroise marine natural park, france's first marine protected area. the dataset integrates traditional microscopy-based phytoplankton counts with zooplankton imaging data obtained using the zooscan (gorsky et al., 2010), as well as zooplankton biovolume and concentration data.sampling was conducted seasonally along two main coastal-offshore transects (b and d) and at three coastal stations (molène, sein, and douarnenez), capturing the spatial and temporal dynamics of plankton communities in this unique ecosystem located at the intersection of the english channel and the atlantic ocean. the region is characterized by the seasonal ushant thermal front, which creates diverse habitats supporting rich plankton communities.phytoplankton identification was performed consistently by the same taxonomist throughout the study period, resulting in a high-resolution dataset with 573 distinct taxa across the 785 phytoplankton samples. zooplankton samples (total number of samples = 650) were digitized using the zooscan imaging system (gorsky et al., 2010), with organisms automatically sorted using built-in semi automatic algorithms (random forest and convolutional neural networks) of the ecotaxa platform (picheral et al., 2017). expert taxonomists then reviewed and validated the classifications resulting in 103 taxonomic and morphological groups. individual zooplankton images are accessible through the ecotaxa web platform for further morphometric analyses.bibliographygorsky, g., ohman, m.d., picheral, m., gasparini, s., stemmann, l., romagnan, j.-b., cawood, a., pesant, s., garcia-comas, c., prejger, f., 2010. digital zooplankton image analysis using the zooscan integrated system. j. plankton res. 32, 285–303. https://doi.org/10.1093/plankt/fbp124picheral, m., colin, s., irisson, j.-o., 2017. ecotaxa, a tool for the taxonomic classification of images.worms editorial board, 2025. world register of marine species https://doi.org/10.14284/170dataset contentthe dataset contains three distinct tables all containing both text and numerical data. the first dataset integrates zooplankton measurements with their corresponding environmental parameters and is organised as follows (see also units_pnmi_data_paper.csv):metadata information (columns 1-8):station name (column 1)transect name (column 2)coordinates: longitude and latitude (columns 3-4, in dd.dddd)sampling time: date, year, month, and julian day (columns 5-8)environmental measurements:surface and bottom temperature (columns 9-10, in °c)surface and bottom salinity (columns 11-12, in psu)biological data for each taxonomic group:sample abundance in individuals/m³ (columns 13-116, prefix "conc_" + taxa name)total biovolume in mm³/m³ (columns 117-220, prefix "tot_biov_" + taxa name)mean individual biovolume in mm³ (columns 221-324, prefix "mean_biov_" + taxa name)the second dataset contains phytoplankton data and follows a similar organizational structure:metadata information (columns 1-8):station name (column 1)transect name (column 2)coordinates: longitude and latitude (columns 3-4, in dd.dddd)sampling time: date, year, month, and julian day (columns 5-8)environmental measurements:surface and bottom temperature (columns 9-10, in °c)surface and bottom salinity (columns 11-12, in psu)phytoplankton taxa concentrations:surface abundance in individuals/l (columns 13-580, prefix "surface_" + taxa name)bottom abundance in individuals/l (columns 581-1148, prefix "bottom_" + taxa name)each taxa is provided in the third dataset with the corresponding unique identifier called aphiaid from the world register of marine species (worms editorial board, 2025), which enables unambiguous species identification across databases. for the transect stations (d1 through d6 and b1 through b7), phytoplankton was initially sampled at sub-surface and bottom depths before 2017 (see table 2). following the introduction of ctd profiling in 2017, vertical profiles from 2017-2018 revealed that at offshore stations (b5-b7 and d5-d6), the chlorophyll a maximum, when present, consistently occurred between 15-18 m depth. at coastal stations (up to 40 m deep), strong vertical mixing typically maintained a homogeneous water column with no deep chlorophyll maximum, though when present, it also occurred at approximately 15 m depth. based on these observations, bottom sampling was discontinued in 2019 and replaced with sampling at 15 m depth to better capture phytoplankton biomass.

Data from: A consensus compound/bioactivity dataset for data-driven drug...

zenodo.org
data.niaid.nih.gov
+1more

zip

Updated May 13, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6320761

Dataset updated

May 13, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Information

The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

Structure and content of the dataset

**Dataset structure**
ChEMBL ID	PubChem ID	IUPHAR ID	Target	Activity type	Assay type	Unit	Mean C (0)	...	Mean PC (0)	...	Mean B (0)	...	Mean I (0)	...	Mean PD (0)	...	Activity check annotation	Ligand names	Canonical SMILES C	...	Structure check	Source

The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

Column content:

ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
Target: biological target of the molecule expressed as the HGNC gene symbol
Activity type: for example, pIC₅₀
Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
Unit: unit of bioactivity measurement
Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
- no comment: bioactivity values are within one log unit;
- check activity data: bioactivity values are not within one log unit;
- only one data point: only one value was available, no comparison and no range calculated;
- no activity value: no precise numeric activity value was available;
- no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
Ligand names: all unique names contained in the five source databases are listed
Canonical SMILES columns: Molecular structure of the compound from each database
Structure check: To denote matching or differing compound structures in different source databases
- match: molecule structures are the same between different sources;
- no match: the structures differ;
- 1 source: no structure comparison is possible, because the molecule comes from only one source database.
Source: From which databases the data come from

T
The City of Edinburgh Council trees dataset
dtechtive.com
find.data.gov.scot
xls
Updated May 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The City of Edinburgh Council (uSmart) (2024). The City of Edinburgh Council trees dataset [Dataset]. https://dtechtive.com/datasets/39291
Explore at:
xls(1.499 MB), xls(2.6211 MB), xls(2.6777 MB), xls(2.6421 MB), xls(3.2524 MB), xls(1.9702 MB)Available download formats
Dataset updated
May 12, 2024
Dataset provided by
The City of Edinburgh Council (uSmart)
Description
The data lists trees maintained by the City of Edinburgh Council. The data set breaks down into the following fields: Column A - Primary Key Column B - Location or Tag no. Column C - Ward Column D - Site Column E - Latin name Column F - Common Name Column G - Owner Column H - NT ref Column I - Height Column J - Spread Column K - Age group Column L - DBH The data is updated on a regular basis, please contact the Open Data team if you are looking for the most up to date version. Additional metadata: - Licence: http://creativecommons.org/licenses/by-nc/2.0/
Z
Data from: A 24-hour dynamic population distribution dataset based on mobile...
data.niaid.nih.gov
zenodo.org
Updated Feb 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
Explore at:
Dataset updated
Feb 16, 2022
Dataset provided by
Department of Built Environment, Aalto University / Centre for Advanced Spatial Analysis, University College London
Unit of Urban Research and Statistics, City of Helsinki / Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
Elisa Corporation
Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
Authors
Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki Metropolitan Area, Finland
Description
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

In this dataset:

We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

Please cite this dataset as:

Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

Organization of data

The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

Column names

YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

License Creative Commons Attribution 4.0 International.

Related datasets

Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
Hand Sign Dataset
kaggle.com
zip
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harshit Pathak (2024). Hand Sign Dataset [Dataset]. https://www.kaggle.com/datasets/harshitpathak18/hand-sign-dataset
Explore at:
zip(331462936 bytes)Available download formats
Dataset updated
Aug 26, 2024
Authors
Harshit Pathak
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Detailed Description of the Dataset:

The dataset, saved as sign_data.csv, is designed for hand sign recognition and contains comprehensive data captured from hand gestures using real-time video processing. Below is a detailed description of the dataset:

1. Dataset Composition:

File Name: sign_data.csv

Data Format: CSV (Comma-Separated Values)

2. Data Capture Process:

Tools Used: - Mediapipe: For detecting hand landmarks and estimating their positions. - OpenCV: For capturing video frames from a camera.

Functionality: - Gesture Data Capture: The capture_gesture_data function records hand gestures by processing video frames in real-time. It captures data for a predefined number of rows per gesture, with distances calculated between all pairs of 21 detected hand landmarks. - Distance Calculation: For each frame, the Euclidean distance between every pair of landmarks is computed, resulting in a comprehensive feature vector for each gesture.

3. Data Structure:

Columns: - Distance Columns: Each distance column represents the calculated distance between a pair of hand landmarks. With 21 landmarks, there are a total of 210 unique distances (computed as ( \frac{21 \times 20}{2} )). - Gesture Label: The final column in the dataset specifies the hand sign label associated with each row of distance measurements (e.g., A, B, C, ..., Z, Space).

Example: - Column Headers: Distance_0, Distance_1, ..., Distance_209, Sign - Rows: Each row contains the computed distances followed by the corresponding gesture label.

4. Data Collection Details:

Gestures Included: - Alphabet: Signs for letters A-Z. - Space: Represents the space gesture.

Number of Samples: Data is collected for each gesture with 100 samples per sign.

5. Purpose and Usage:

The dataset provides detailed spatial information about hand gestures, enabling the training and evaluation of hand sign recognition models. By offering a rich set of distance measurements between hand landmarks, it supports the development of accurate and reliable sign language recognition systems. This dataset is crucial for machine learning applications that aim to bridge communication gaps for individuals with hearing or speech impairments.
A Data Set to Compare Feature Extractors
kaggle.com
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murat IŞIK (2024). A Data Set to Compare Feature Extractors [Dataset]. http://doi.org/10.34740/kaggle/ds/4493370
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/4493370
Dataset updated
Apr 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Murat IŞIK
Description
This dataset is currently associated with an article that is in the process of being published. Once the publication process is completed, a reference link will be added separately. Until that time, this dataset cannot be used for any academic purposes.

The dataset contains 196.926 images and 10 csv files.

The images derived from "the Image Matching Challenge PhotoTourism 2020 dataset"

https://www.cs.ubc.ca/~kmyi/imw2020/data.html

The csv files obtained from our work to show a comprehensive comparison of well-known conventional feature extractors/descriptors, including SIFT, SURF, BRIEF, ORB, BRISK, KAZE, AKAZE, FREAK, DAISY, FAST, and STAR.

Just for gaussian blur there is another file to see.

The images folder contains the images utilized for this study and derived ones originated from these images. (as a total 196.926 images)

To use results or codes from this study to nite to cite:

Please cite this to use anything from this dataset or codes: ISIK M. 2024. Comprehensive empirical evaluation of feature extractors in computer vision. PeerJ Computer Science 10:e2415 https://doi.org/10.7717/peerj-cs.2415

THE COLUMN NAMES: img-1 and img-2 stands for the compared image names KP stands for keyPoints, goodMatches_normal stands for matching count with Brute Force Matcher GM stands for percentage goodMatches_knn stands for matching count with kNN Matcher img-1-D-time shows duration time for keyPoints extraction for img-1 img-2-D-time shows duration time for keyPoints extraction for img-2 (compared one) img-1-C-time shows duration time for comparing keyPoints for img-1 img-2-C-time shows duration time for comparing keyPoints for img-2 (compared one) total-D-time is the total of img-1-D-time and img-2-D-time. total-C-time is the total of img-1-C-time and img-2-C-time. matcher-time_normal stands for time duration for matching process with Brute Force Matcher

matcher-time_knn stands for time duration for matching process with kNN Matcher

More explanation will here soon.
companies Dataset
kaggle.com
zip
Updated Sep 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
omar mohmed (2023). companies Dataset [Dataset]. https://www.kaggle.com/datasets/omarmohmed/companys-dataset
Explore at:
zip(3961303 bytes)Available download formats
Dataset updated
Sep 2, 2023
Authors
omar mohmed
Description
File A is a big data file: File B is a file with already registered users, files C and D are OPT OUT files Goal is to delete everybody from File A that has opt-out or is already registered.

SO, from file A we remove (automatically) ALL lines that contain email addresses that are present in files B, C or D.

After this we will change the columns names a bit to fit into the right format and we are done.
Titanic Dataset
kaggle.com
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sakshi Satre (2024). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/sakshisatre/titanic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sakshi Satre
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset containing information about passengers aboard the Titanic is one of the most famous datasets used in data science and machine learning. It was created to analyze and understand the factors that influenced survival rates among passengers during the tragic sinking of the RMS Titanic on April 15, 1912.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19517213%2Fd4016c159f1ad17cb30d8905192fe9d7%2Ftitanic-ship_1027017-11.avif?generation=1711562371875068&alt=media" alt="">

Data Description :-

The dataset is often used for predictive modeling and statistical analysis to determine which factors (such as socio-economic status, age, gender, etc.) were associated with a higher likelihood of survival. It contains 1309 rows and 14 columns.

Columns : -

Pclass: Ticket class indicating the socio-economic status of the passenger. It is categorized into three classes: 1 = Upper, 2 = Middle, 3 = Lower.

Survived: A binary indicator that shows whether the passenger survived (1) or not (0) during the Titanic disaster. This is the target variable for analysis.

Name: The full name of the passenger, including title (e.g., Mr., Mrs., etc.).

Sex: The gender of the passenger, denoted as either male or female.

Age: The age of the passenger in years.

SibSp: The number of siblings or spouses aboard the Titanic for the respective passenger.

Parch: The number of parents or children aboard the Titanic for the respective passenger.

Ticket: The ticket number assigned to the passenger.

Fare: The fare paid by the passenger for the ticket.

Cabin: The cabin number assigned to the passenger, if available.

Embarked: The port of embarkation for the passenger. It can take one of three values: C = Cherbourg, Q = Queenstown, S = Southampton.

Boat: If the passenger survived, this column contains the identifier of the lifeboat they were rescued in.

Body: If the passenger did not survive, this column contains the identification number of their recovered body, if applicable.

Home.dest: The destination or place of residence of the passenger.

These descriptions provide a detailed understanding of each column in the Titanic dataset subset, offering insights into the demographic, travel, and survival-related information recorded for each passenger.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rohan Saha (2021). Boston-Housing-Dataset [Dataset]. https://www.kaggle.com/datasets/simpleparadox/bostonhousingdataset/discussion

Boston-Housing-Dataset

The boston housing dataset with column names.

Explore at:

zip(13140 bytes)Available download formats

Dataset updated

Dec 25, 2021

Authors

Rohan Saha

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This is a copy of the original Boston Housing Dataset. As of December 2021, the original link doesn't contain the dataset so I'm uploading it if anyone wants to use it. I'll implement a linear regression model to predict the output 'MEDV' variable using PyTorch (check the companion notebook).

I took the data given in this link and processed it to include the column names as well.

Acknowledgements

https://www.kaggle.com/prasadperera/the-boston-housing-dataset/data

Inspiration

Good luck on your data science career :)

Clear search

Close search

Google apps

Main menu

Boston-Housing-Dataset

Context

Acknowledgements

Inspiration

Secondary Input Data used in Developing Stochastically Generated Climate and...

Vertical 1-dbar averaged temperature and salinity profiles in Hornsund Fjord...

Vertical Temperature, Turbidity and Dissolved Oxygen profiles in Revvatnet...

Plankton and environmental monitoring dataset from the Iroise Marine Natural...

Data from: A consensus compound/bioactivity dataset for data-driven drug...

The City of Edinburgh Council trees dataset

Data from: A 24-hour dynamic population distribution dataset based on mobile...

Hand Sign Dataset

1. Dataset Composition:

2. Data Capture Process:

3. Data Structure:

4. Data Collection Details:

5. Purpose and Usage:

A Data Set to Compare Feature Extractors

matcher-time_knn stands for time duration for matching process with kNN Matcher

companies Dataset

Titanic Dataset

Data Description :-

Columns : -

Boston-Housing-Dataset

The boston housing dataset with column names.

Context

Acknowledgements

Inspiration