100+ datasets found
  1. d

    Clinical Questions Collection

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Jul 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). Clinical Questions Collection [Dataset]. https://catalog.data.gov/dataset/clinical-questions-collection-665af
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    National Library of Medicine
    Description

    The Clinical Questions Collection is a repository of questions that have been collected between 1991 – 2003 from healthcare providers in clinical settings across the country. The questions have been submitted by investigators who wish to share their data with other researchers. This dataset is no-longer updated with new content. The collection is used in developing approaches to clinical and consumer-health question answering, as well as researching information needs of clinicians and the language they use to express their information needs. All files are formatted in XML.

  2. s

    Statistics Interface Province-Level Data Collection - Datasets - This...

    • store.smartdatahub.io
    Updated Nov 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Statistics Interface Province-Level Data Collection - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/fi_tilastokeskus_tilastointialueet_maakunta1000k
    Explore at:
    Dataset updated
    Nov 11, 2024
    Description

    The dataset collection in question is a compilation of related data tables sourced from the website of Tilastokeskus (Statistics Finland) in Finland. The data present in the collection is organized in a tabular format comprising of rows and columns, each holding related data. The collection includes several tables, each of which represents different years, providing a temporal view of the data. The description provided by the data source, Tilastokeskuksen palvelurajapinta (Statistics Finland's service interface), suggests that the data is likely to be statistical in nature and could be related to regional statistics, given the nature of the source. This dataset is licensed under CC BY 4.0 (Creative Commons Attribution 4.0, https://creativecommons.org/licenses/by/4.0/deed.fi).

  3. U

    Best Practices in Data Collection and Management Workshop

    • dataverse.lib.virginia.edu
    • dataverse.harvard.edu
    pdf, pptx
    Updated Sep 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sherry Lake; Sherry Lake; Andrea Denton; Andrea Denton (2022). Best Practices in Data Collection and Management Workshop [Dataset]. http://doi.org/10.18130/V3/N9E9XP
    Explore at:
    pptx(811419), pptx(1742216), pptx(2522728), pptx(1725857), pptx(1137224), pptx(1782719), pdf(324410), pptx(1978968), pptx(620078), pdf(296332), pdf(281999), pdf(527659), pdf(275362), pdf(499960)Available download formats
    Dataset updated
    Sep 9, 2022
    Dataset provided by
    University of Virginia Dataverse
    Authors
    Sherry Lake; Sherry Lake; Andrea Denton; Andrea Denton
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Ever need to help a researcher share and archive their research data? Would you know how to advise them on managing their data so it can be easily shared and re-used? This workshop will cover best practices for collecting and organizing research data related to the goal of data preservation and sharing. We will focus on best practices and tips for collecting data, including file naming, documentation/metadata, quality control, and versioning, as well as access and control/security, backup and storage, and licensing. We will discuss the library’s role in data management, and the opportunities and challenges around supporting data sharing efforts. Through case studies we will explore a typical research data scenario and propose solutions and services by the library and institutional partners. Finally, we discuss methods to stay up to date with data management related topics.

  4. d

    Job Postings Dataset for Labour Market Research and Insights

    • datarade.ai
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset authored and provided by
    Oxylabs
    Area covered
    Togo, Jamaica, Kyrgyzstan, Sierra Leone, Luxembourg, Zambia, Tajikistan, Anguilla, Switzerland, British Indian Ocean Territory
    Description

    Introducing Job Posting Datasets: Uncover labor market insights!

    Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

    Job Posting Datasets Source:

    1. Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

    2. Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

    3. StackShare: Access StackShare datasets to make data-driven technology decisions.

    Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

    Choose your preferred dataset delivery options for convenience:

    Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

    Why Choose Oxylabs Job Posting Datasets:

    1. Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

    2. Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

    3. Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.

  5. g

    Insurance Dataset

    • gts.ai
    json
    Updated Oct 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2022). Insurance Dataset [Dataset]. https://gts.ai/case-study/insurance-dataset-annotation-services-for-precision-data-analysis/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 16, 2022
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Insurance Dataset project is an extensive initiative focused on collecting and analyzing insurance-related data from various sources.

  6. w

    Dataset of book subjects that contain Data collection : key debates and...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Data collection : key debates and methods in social research [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Data+collection+:+key+debates+and+methods+in+social+research&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 3 rows and is filtered where the books is Data collection : key debates and methods in social research. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  7. Z

    Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavel Seda (2023). Dataset: Shell Commands Used by Participants of Hands-on Cybersecurity Training [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5137354
    Explore at:
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    Pavel Seda
    Pavel Čeleda
    Valdemar Švábenský
    Jan Vykopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary materials for the following journal paper:

    Valdemar Švábenský, Jan Vykopal, Pavel Seda, Pavel Čeleda. Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training. In Elsevier Data in Brief. 2021. https://doi.org/10.1016/j.dib.2021.107398

    How to cite

    If you use or build upon the materials, please use the BibTeX entry below to cite the original paper (not only this web link).

    @article{Svabensky2021dataset, author = {\v{S}v\'{a}bensk\'{y}, Valdemar and Vykopal, Jan and Seda, Pavel and \v{C}eleda, Pavel}, title = {{Dataset of Shell Commands Used by Participants of Hands-on Cybersecurity Training}}, journal = {Data in Brief}, publisher = {Elsevier}, volume = {38}, year = {2021}, issn = {2352-3409}, url = {https://doi.org/10.1016/j.dib.2021.107398}, doi = {10.1016/j.dib.2021.107398}, }

    The data were collected using a logging toolset referenced here.

    Attached content

    Dataset (data.zip). The collected data are attached here on Zenodo. A copy is also available in this repository.

    Analytical tools (toolset.zip). To analyze the data, you can instantiate the toolset or this project for ELK.

    Version history

    Version 1 (https://zenodo.org/record/5137355) contains 13446 log records from 175 trainees. These data are precisely those that are described in the associated journal paper. Version 1 provides a snapshot of the state when the article was published.

    Version 2 (https://zenodo.org/record/5517479) contains 13446 log records from 175 trainees. The data are unchanged from Version 1, but the analytical toolset includes a minor fix.

    Version 3 (https://zenodo.org/record/6670113) contains 21762 log records from 275 trainees. It is a superset of Version 2, with newly collected data added to the dataset.

    The current Version 4 (https://zenodo.org/record/8136017) contains 21459 log records from 275 trainees. Compared to Version 3, we cleaned 303 invalid/duplicate command records.

  8. An inertial and positioning dataset for the walking activity

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Caramaschi; Carl Magnus Olsson; Elizabeth Orchard; Jackson Molloy; Dario Salvi (2024). An inertial and positioning dataset for the walking activity [Dataset]. http://doi.org/10.5061/dryad.n2z34tn5q
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    Oxford University Hospitals NHS Trust
    Malmö University
    Authors
    Sara Caramaschi; Carl Magnus Olsson; Elizabeth Orchard; Jackson Molloy; Dario Salvi
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    We are publishing a walking activity dataset including inertial and positioning information from 19 volunteers, including reference distance measured using a trundle wheel. The dataset includes a total of 96.7 Km walked by the volunteers, split into 203 separate tracks. The trundle wheel is of two types: it is either an analogue trundle wheel, which provides the total amount of meters walked in a single track, or it is a sensorized trundle wheel, which measures every revolution of the wheel, therefore recording a continuous incremental distance.
    Each track has data from the accelerometer and gyroscope embedded in the phones, location information from the Global Navigation Satellite System (GNSS), and the step count obtained by the device. The dataset can be used to implement walking distance estimation algorithms and to explore data quality in the context of walking activity and physical capacity tests, fitness, and pedestrian navigation. Methods The proposed dataset is a collection of walks where participants used their own smartphones to capture inertial and positioning information. The participants involved in the data collection come from two sites. The first site is the Oxford University Hospitals NHS Foundation Trust, United Kingdom, where 10 participants (7 affected by cardiovascular diseases and 3 healthy individuals) performed unsupervised 6MWTs in an outdoor environment of their choice (ethical approval obtained by the UK National Health Service Health Research Authority protocol reference numbers: 17/WM/0355). All participants involved provided informed consent. The second site is at Malm ̈o University, in Sweden, where a group of 9 healthy researchers collected data. This dataset can be used by researchers to develop distance estimation algorithms and how data quality impacts the estimation.

    All walks were performed by holding a smartphone in one hand, with an app collecting inertial data, the GNSS signal, and the step counting. On the other free hand, participants held a trundle wheel to obtain the ground truth distance. Two different trundle wheels were used: an analogue trundle wheel that allowed the registration of a total single value of walked distance, and a sensorized trundle wheel which collected timestamps and distance at every 1-meter revolution, resulting in continuous incremental distance information. The latter configuration is innovative and allows the use of temporal windows of the IMU data as input to machine learning algorithms to estimate walked distance. In the case of data collected by researchers, if the walks were done simultaneously and at a close distance from each other, only one person used the trundle wheel, and the reference distance was associated with all walks that were collected at the same time.The walked paths are of variable length, duration, and shape. Participants were instructed to walk paths of increasing curvature, from straight to rounded. Irregular paths are particularly useful in determining limitations in the accuracy of walked distance algorithms. Two smartphone applications were developed for collecting the information of interest from the participants' devices, both available for Android and iOS operating systems. The first is a web-application that retrieves inertial data (acceleration, rotation rate, orientation) while connecting to the sensorized trundle wheel to record incremental reference distance [1]. The second app is the Timed Walk app [2], which guides the user in performing a walking test by signalling when to start and when to stop the walk while collecting both inertial and positioning data. All participants in the UK used the Timed Walk app.

    The data collected during the walk is from the Inertial Measurement Unit (IMU) of the phone and, when available, the Global Navigation Satellite System (GNSS). In addition, the step count information is retrieved by the sensors embedded in each participant’s smartphone. With the dataset, we provide a descriptive table with the characteristics of each recording, including brand and model of the smartphone, duration, reference total distance, types of signals included and additionally scoring some relevant parameters related to the quality of the various signals. The path curvature is one of the most relevant parameters. Previous literature from our team, in fact, confirmed the negative impact of curved-shaped paths with the use of multiple distance estimation algorithms [3]. We visually inspected the walked paths and clustered them in three groups, a) straight path, i.e. no turns wider than 90 degrees, b) gently curved path, i.e. between one and five turns wider than 90 degrees, and c) curved path, i.e. more than five turns wider than 90 degrees. Other features relevant to the quality of collected signals are the total amount of time above a threshold (0.05s and 6s) where, respectively, inertial and GNSS data were missing due to technical issues or due to the app going in the background thus losing access to the sensors, sampling frequency of different data streams, average walking speed and the smartphone position. The start of each walk is set as 0 ms, thus not reporting time-related information. Walks locations collected in the UK are anonymized using the following approach: the first position is fixed to a central location of the city of Oxford (latitude: 51.7520, longitude: -1.2577) and all other positions are reassigned by applying a translation along the longitudinal and latitudinal axes which maintains the original distance and angle between samples. This way, the exact geographical location is lost, but the path shape and distances between samples are maintained. The difference between consecutive points “as the crow flies” and path curvature was numerically and visually inspected to obtain the same results as the original walks. Computations were made possible by using the Haversine Python library.

    Multiple datasets are available regarding walking activity recognition among other daily living tasks. However, few studies are published with datasets that focus on the distance for both indoor and outdoor environments and that provide relevant ground truth information for it. Yan et al. [4] introduced an inertial walking dataset within indoor scenarios using a smartphone placed in 4 positions (on the leg, in a bag, in the hand, and on the body) by six healthy participants. The reference measurement used in this study is a Visual Odometry System embedded in a smartphone that has to be worn at the chest level, using a strap to hold it. While interesting and detailed, this dataset lacks GNSS data, which is likely to be used in outdoor scenarios, and the reference used for localization also suffers from accuracy issues, especially outdoors. Vezovcnik et al. [5] analysed estimation models for step length and provided an open-source dataset for a total of 22 km of only inertial walking data from 15 healthy adults. While relevant, their dataset focuses on steps rather than total distance and was acquired on a treadmill, which limits the validity in real-world scenarios. Kang et al. [6] proposed a way to estimate travelled distance by using an Android app that uses outdoor walking patterns to match them in indoor contexts for each participant. They collect data outdoors by including both inertial and positioning information and they use average values of speed obtained by the GPS data as reference labels. Afterwards, they use deep learning models to estimate walked distance obtaining high performances. Their results share that 3% to 11% of the data for each participant was discarded due to low quality. Unfortunately, the name of the used app is not reported and the paper does not mention if the dataset can be made available.

    This dataset is heterogeneous under multiple aspects. It includes a majority of healthy participants, therefore, it is not possible to generalize the outcomes from this dataset to all walking styles or physical conditions. The dataset is heterogeneous also from a technical perspective, given the difference in devices, acquired data, and used smartphone apps (i.e. some tests lack IMU or GNSS, sampling frequency in iPhone was particularly low). We suggest selecting the appropriate track based on desired characteristics to obtain reliable and consistent outcomes.

    This dataset allows researchers to develop algorithms to compute walked distance and to explore data quality and reliability in the context of the walking activity. This dataset was initiated to investigate the digitalization of the 6MWT, however, the collected information can also be useful for other physical capacity tests that involve walking (distance- or duration-based), or for other purposes such as fitness, and pedestrian navigation.

    The article related to this dataset will be published in the proceedings of the IEEE MetroXRAINE 2024 conference, held in St. Albans, UK, 21-23 October.

    This research is partially funded by the Swedish Knowledge Foundation and the Internet of Things and People research center through the Synergy project Intelligent and Trustworthy IoT Systems.

  9. d

    LAS dataset of lidar, single-beam, and multibeam data collected at Lake...

    • catalog.data.gov
    Updated Sep 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). LAS dataset of lidar, single-beam, and multibeam data collected at Lake Superior at Minnesota Point near the Superior Entry of Lake Superior, Duluth, MN, September 2022 [Dataset]. https://catalog.data.gov/dataset/las-dataset-of-lidar-single-beam-and-multibeam-data-collected-at-lake-superior-at-minnesot
    Explore at:
    Dataset updated
    Sep 14, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Duluth, Lake Superior, Minnesota Point, Minnesota, Superior Entry
    Description

    This dataset is a LAS (industry-standard binary format for storing lidar point clouds) dataset containing light detection and ranging (lidar) data and sonar data representing the beach and near-shore topography of Lake Superior at Minnesota Point, near the Superior entry, Duluth, Minnesota. Average point spacing of the LAS files in the dataset are as follows: lidar, 0.086 meters (m); multibeam sonar, 0.512 m; single-beam sonar, 1.919 m. The LAS dataset was used to create digital elevation models (DEMs) of 10 m (32.8084 feet) and 1 m (3.28084 feet) resolution, of the approximate 2.15 square kilometer surveyed area. Lidar data were collected September 07, 2022 using a boat mounted Velodyne VLP-16 unit and methodology similar to that described by Huizinga and Wagner (2019). Multibeam sonar data were collected September 06-07, 2022 using a Norbit integrated wide band multibeam system compact (iWBMSc) sonar unit and methodology similar to that described by Richards and Huizinga (2018). Single-beam sonar data were collected September 07, 2022 using a Ceescope echosounder and methodology similar to that described by Wilson and Richards (2006).This project followed similar methods to that of Wagner, Lund, and Sanks (2020), who completed a similar survey in 2019.

  10. f

    Replication data for: Collection and statistical analysis of a fixed-text...

    • usn.figshare.com
    txt
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Halvor Nybø Risto; Olaf Hallan Graven (2025). Replication data for: Collection and statistical analysis of a fixed-text keystroke dynamics authentication data set [Dataset]. http://doi.org/10.23642/usn.23790858.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    University of South-Eastern Norway
    Authors
    Halvor Nybø Risto; Olaf Hallan Graven
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set for keystroke dynamics authentication benchmarking and research, containing 6 passwords typed by a wide set of people, containing a large set of "attackers" and a smaller set of "legitimate users". This data set was collected for the paper "Collection and statistical analysis of a fixed-text keystroke dynamics authentication data set" for the CSNet23 conference.Article Abstract :Keystroke dynamics authentication is a promising method of improving account security with minimal detriment for user convenience. While there is an abundance of research, there is a lack of available data sets. In this study, data sets for keystroke dynamics authentication were collected for a set of 6 passwords from a group of participants, and a correlation algorithm was developed to analyze and use these data sets for authentication. The experiments aim to produce data for keystroke dynamics authentication benchmarking, and to show the effect of typing speed and consistency, password length and entropy on prediction accuracy. Through simple correlation methods, the authors achieve an Equal Error Rate varying between a range of 2.57% and 29.7%. These result give insight into what may cause the accuracy to vary depending on the person and the password.

  11. C

    Raw Data for ConfLab: A Data Collection Concept, Dataset, and Benchmark for...

    • data.4tu.nl
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Raw Data for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild [Dataset]. http://doi.org/10.4121/20017748.v2
    Explore at:
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Description

    This file contains raw data for cameras and wearables of the ConfLab dataset.


    ./cameras

    contains the overhead video recordings for 9 cameras (cam2-10) in MP4 files.

    These cameras cover the whole interaction floor, with camera 2 capturing the

    bottom of the scene layout, and camera 10 capturing top of the scene layout.

    Note that cam5 ran out of battery before the other cameras and thus the recordings

    are cut short. However, cam4 and 6 contain significant overlap with cam 5, to

    reconstruct any information needed.


    Note that the annotations are made and provided in 2 minute segments.

    The annotated portions of the video include the last 3min38sec of x2xxx.MP4

    video files, and the first 12 min of x3xxx.MP4 files for cameras (2,4,6,8,10),

    with "x" being the placeholder character in the mp4 file names. If one wishes

    to separate the video into 2 min segments as we did, the "video-splitting.sh"

    script is provided.


    ./camera-calibration contains the camera instrinsic files obtained from

    https://github.com/idiap/multicamera-calibration. Camera extrinsic parameters can

    be calculated using the existing intrinsic parameters and the instructions in the

    multicamera-calibration repo. The coordinates in the image are provided by the

    crosses marked on the floor, which are visible in the video recordings.

    The crosses are 1m apart (=100cm).


    ./wearables

    subdirectory includes the IMU, proximity and audio data from each

    participant at the Conflab event (48 in total). In the directory numbered

    by participant ID, the following data are included:

    1. raw audio file

    2. proximity (bluetooth) pings (RSSI) file (raw and csv) and a visualization

    3. Tri-axial accelerometer data (raw and csv) and a visualization

    4. Tri-axial gyroscope data (raw and csv) and a visualization

    5. Tri-axial magnetometer data (raw and csv) and a visualization

    6. Game rotation vector (raw and csv), recorded in quaternions.


    All files are timestamped.

    The sampling frequencies are:

    - audio: 1250 Hz

    - rest: around 50Hz. However, the sample rate is not fixed

    and instead the timestamps should be used.


    For rotation, the game rotation vector's output frequency is limited by the

    actual sampling frequency of the magnetometer. For more information, please refer to

    https://invensense.tdk.com/wp-content/uploads/2016/06/DS-000189-ICM-20948-v1.3.pdf


    Audio files in this folder are in raw binary form. The following can be used to convert

    them to WAV files (1250Hz):


    ffmpeg -f s16le -ar 1250 -ac 1 -i /path/to/audio/file


    Synchronization of cameras and werables data

    Raw videos contain timecode information which matches the timestamps of the data in

    the "wearables" folder. The starting timecode of a video can be read as:

    ffprobe -hide_banner -show_streams -i /path/to/video


    ./audio

    ./sync: contains wav files per each subject

    ./sync_files: auxiliary csv files used to sync the audio. Can be used to improve the synchronization.

    The code used for syncing the audio can be found here:

    https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/audio

  12. h

    Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records

    • healthdatagateway.org
    • find.data.gov.scot
    • +1more
    unknown
    Updated Oct 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mental Health & Learning Disabilities Dataset v 1 (Sensitive) Records [Dataset]. https://healthdatagateway.org/en/dataset/853
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    License

    https://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdfhttps://digital.nhs.uk/binaries/content/assets/website-assets/services/dars/nhs_digital_approved_edition_2_dsa_demo.pdf

    Description

    The Mental Health and Learning Disabilities Data Set version 1 (Record Level - sensitive data inclusion). The Mental Health Minimum Data Set was superseded by the Mental Health and Learning Disabilities Data Set, which in turn was superseded by the Mental Health Services Data Set. The Mental Health and Learning Disabilities Data Set collected data from the health records of individual children, young people and adults who were in contact with mental health services.

  13. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World, World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  14. P

    SMS Spam Collection Data Set Dataset

    • paperswithcode.com
    Updated Mar 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). SMS Spam Collection Data Set Dataset [Dataset]. https://paperswithcode.com/dataset/sms-spam-collection-data-set
    Explore at:
    Dataset updated
    Mar 13, 2022
    Description

    This corpus has been collected from free or free for research sources at the Internet:

    A collection of 425 SMS spam messages was manually extracted from the Grumbletext Web site. This is a UK forum in which cell phone users make public claims about SMS spam messages, most of them without reporting the very spam message received. The identification of the text of spam messages in the claims is a very hard and time-consuming task, and it involved carefully scanning hundreds of web pages. A subset of 3,375 SMS randomly chosen ham messages of the NUS SMS Corpus (NSC), which is a dataset of about 10,000 legitimate messages collected for research at the Department of Computer Science at the National University of Singapore. The messages largely originate from Singaporeans and mostly from students attending the University. These messages were collected from volunteers who were made aware that their contributions were going to be made publicly available. A list of 450 SMS ham messages collected from Caroline Tag's PhD Thesis. the SMS Spam Corpus v.0.1 Big. It has 1,002 SMS ham messages and 322 spam messages.

  15. s

    Latest Orthophoto Outcome Shape Data Collection - Datasets - This service...

    • store.smartdatahub.io
    Updated Aug 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Latest Orthophoto Outcome Shape Data Collection - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/se_lantmateriet_utfall_ortofoto_senaste_shape_zip
    Explore at:
    Dataset updated
    Aug 26, 2024
    Description

    The dataset collection in question is comprised of a series of related tables, which are organized in a systematic manner with rows and columns for the ease of data interpretation. These tables are part of a larger dataset collection that is primarily sourced from the website of Lantmäteriet (The Land Survey of Sweden), located in Sweden. Each table within this collection contains a variety of information and data points, providing a comprehensive overview of the subject matter at hand. The dataset collection as a whole serves as a valuable resource for comprehensive data analysis and interpretation.

  16. Data from: Hand Washing Video Dataset Annotated According to the World...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atis Elsts; Atis Elsts; Maksims Ivanovs; Martins Lulla; Aleksejs Rutkovskis; Andreta Slavinska; Aija Vilde; Anastasija Gromova; Maksims Ivanovs; Martins Lulla; Aleksejs Rutkovskis; Andreta Slavinska; Aija Vilde; Anastasija Gromova (2022). Hand Washing Video Dataset Annotated According to the World Health Organization's Handwashing Guidelines [Dataset]. http://doi.org/10.5281/zenodo.4537209
    Explore at:
    zip, csv, binAvailable download formats
    Dataset updated
    Jan 3, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Atis Elsts; Atis Elsts; Maksims Ivanovs; Martins Lulla; Aleksejs Rutkovskis; Andreta Slavinska; Aija Vilde; Anastasija Gromova; Maksims Ivanovs; Martins Lulla; Aleksejs Rutkovskis; Andreta Slavinska; Aija Vilde; Anastasija Gromova
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview: This is a large-scale real-world dataset with videos recording medical staff washing their hands as part of their normal job duties in the Pauls Stradins Clinical University Hospital. There are 3185 hand washing episodes in total, each of which is annotated by up to seven different persons. The annotations classify the washing movements according to the World Health Organization's (WHO) guidelines by marking each frame in each video with a certain movement code.

    This dataset is part on three dataset series all following the same format:

    Applications: The intention of this dataset is twofold: to serve as a basis for training machine learning classifiers for automated hand washing movement recognition and quality control, and to allow to investigate the real-world quality of washing performed by working medical staff.

    Statistics:

    • Frame rate: 30 FPS
    • Resolution: 320x240 and 640x480
    • Number of videos: 3185
    • Number of annotation files: 6690

    Movement codes (both in CSV and JSON files):

    • 1: Hand washing movement — Palm to palm
    • 2: Hand washing movement — Palm over dorsum, fingers interlaced
    • 3: Hand washing movement — Palm to palm, fingers interlaced
    • 4: Hand washing movement — Backs of fingers to opposing palm, fingers interlocked
    • 5: Hand washing movement — Rotational rubbing of the thumb
    • 6: Hand washing movement — Fingertips to palm
    • 7: Turning off the faucet with a paper towel
    • 0: Other hand washing movement

    Additional annotations (in JSON files only):

    • Armband or watch present
    • Ring present
    • Long nails present

    Acknowledgments: The dataset collection was funded by the Ministry of Education and Science, Republic of Latvia, project “Integration of reliable technologies for protection against Covid-19 in healthcare and high-risk areas”, project No. VPP-COVID-2020/1-0004.

    References: For more detailed information, see this article: M. Lulla, A. Rutkovskis, A. Slavinska, A. Vilde, A. Gromova, M. Ivanovs, A. Skadins, R. Kadikis and A. Elsts. Hand Washing Video Dataset Annotated According to the World Health Organization’s Handwashing Guidelines. Submitted to MDPI Data, 2021.

    Contact information: atis.elsts@edi.lv

  17. w

    Dataset of books called Things to collect in a bag

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Things to collect in a bag [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Things+to+collect+in+a+bag
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Things to collect in a bag. It features 7 columns including author, publication date, language, and book publisher.

  18. Data from: Dataset of soil and cacao leaf samples collected in all regions...

    • dataverse.cirad.fr
    tsv, txt
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Didier Snoeck; Louis Koko; Kouamé N'Guessan; Émmanuel Kassin; Didier Snoeck; Louis Koko; Kouamé N'Guessan; Émmanuel Kassin (2025). Dataset of soil and cacao leaf samples collected in all regions of Côte d'Ivoire [Dataset]. http://doi.org/10.18167/DVN1/PWM9WW
    Explore at:
    txt(242), txt(594), tsv(21078), tsv(69718)Available download formats
    Dataset updated
    Apr 9, 2025
    Authors
    Didier Snoeck; Louis Koko; Kouamé N'Guessan; Émmanuel Kassin; Didier Snoeck; Louis Koko; Kouamé N'Guessan; Émmanuel Kassin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Côte d'Ivoire
    Description

    [FR] Données sur les analyses chimiques des échantillons de sols et feuilles des cacaoyers (Theobroma cacao L.) prélevés dans des plantations de cacaoyers adultes sélectionnées dans toutes les régions du sud de la Côte d'Ivoire. ................................. Les échantillons de sols et feuilles des cacaoyers (Theobroma cacao L.) ont été prélevés dans des plantations de cacaoyers adultes sélectionnées dans toutes les unités pédologiques de toutes les régions du sud de la Côte d'Ivoire. Les échantillons de sol ont été prélevés dans l'horizon de 0 à 30 cm à 1 m des cacaoyers. Les échantillons de sol ont été analysés pour connaître leur teneur en argile et leurs paramètres chimiques; à savoir pH, C, N, P, K, Ca, Mg, CEC, Al, Zn, B, Mo, Mn. Les échantillons de feuilles ont été prélevées en mars 2015. Les récoltes de l'année 2015 ont été estimées à partir du comptage des cabosses. Chaque parcelle de cacaoyer échantillonnée a été géolocalisée pour être associée à l'unité pédologique du sol correspondante. L'information a ensuite été utilisée pour construire une carte thématique fournissant des recommandations d'engrais localisées pour la culture du cacao (voir publication N'Guessan et al., 2017). -------------------------- [EN]Data of chemical analysis of soil and cacao (Theobroma cacao L.) leaf samples collected in adult cacao plantations selected in all regions of Southern Côte d'Ivoire. ................................. The soil and cacao (Theobroma cacao L.) leaf samples were collected in adult cacao plantations selected in all pedological units of all regions of Southern Côte d'Ivoire. Soil samples were taken in the 0-30 cm horizon at 1 m from the cacao trees. The soil samples were analysed for their clay content and chemical parameters; i‧e. pH, C, N, P, K, Ca, Mg, CEC, Al, Zn, B, Mo, Mn. Leaf samples were collected in March 2015. The harvests for the year 2015 were estimated from pod counting. Each sampled cacao plot was geolocated to be associated with the corresponding soil pedological unit. The information was further used to build a thematic map providing localised fertilizer recommendations for cacao cultivation (see N'Guessan et al., 2017).

  19. s

    Varmland Region Comprehensive Dataset Collection - Datasets - This service...

    • store.smartdatahub.io
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Varmland Region Comprehensive Dataset Collection - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/se_lantmateriet_varmland_zip
    Explore at:
    Dataset updated
    Aug 27, 2024
    Area covered
    Varmland County
    Description

    The dataset collection in focus comprises of several tables of related data, each organized systematically into columns and rows for easy reference and understanding. These tables have been sourced from the Lantmäteriet website, which is located in Sweden. The variety and sheer volume of data in this collection makes it a rich resource for in-depth research and analysis.

  20. P

    Data Collection Committee - Standard regional forms - 2016 logsheets

    • pacificdata.org
    • pacific-data.sprep.org
    doc, xls
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPC Fisheries, Aquaculture and Marine Ecosystems division (FAME) (2023). Data Collection Committee - Standard regional forms - 2016 logsheets [Dataset]. https://pacificdata.org/data/dataset/dcc-logsheet-2016
    Explore at:
    doc(42496), xls(44032), xls(21715), doc(32768)Available download formats
    Dataset updated
    Apr 12, 2023
    Dataset provided by
    SPC Fisheries, Aquaculture and Marine Ecosystems division (FAME)
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Standard regional data collection forms used in the Pacific

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Library of Medicine (2025). Clinical Questions Collection [Dataset]. https://catalog.data.gov/dataset/clinical-questions-collection-665af

Clinical Questions Collection

Explore at:
26 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 11, 2025
Dataset provided by
National Library of Medicine
Description

The Clinical Questions Collection is a repository of questions that have been collected between 1991 – 2003 from healthcare providers in clinical settings across the country. The questions have been submitted by investigators who wish to share their data with other researchers. This dataset is no-longer updated with new content. The collection is used in developing approaches to clinical and consumer-health question answering, as well as researching information needs of clinicians and the language they use to express their information needs. All files are formatted in XML.

Search
Clear search
Close search
Google apps
Main menu