49 datasets found
  1. h

    big-csv-2

    • huggingface.co
    Updated Nov 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caleb Fahlgren (2025). big-csv-2 [Dataset]. https://huggingface.co/datasets/cfahlgren1/big-csv-2
    Explore at:
    Dataset updated
    Nov 4, 2025
    Authors
    Caleb Fahlgren
    Description

    cfahlgren1/big-csv-2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. League Of Legends Data

    • kaggle.com
    zip
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Preston Robertson (2022). League Of Legends Data [Dataset]. https://www.kaggle.com/datasets/prestonrobertson7/league-of-legends-data-9292022
    Explore at:
    zip(15143027 bytes)Available download formats
    Dataset updated
    Oct 5, 2022
    Authors
    Preston Robertson
    Description

    Basic Data Description

    X matches of the most recent League of Legends games as of the date specified in the CSV file. Each game has 10 separate players and each player recorded has 68 recorded features.

    Where X matches is based on how many I am capable of pulling at a given time. With a maximum of 10,000 matches.

    League of Legends

    1. Data Set Description

    1.1. Introduction to the Data Set

    This data set is on an online competitive game called League of Legends. I chose this data set to challenge myself; the dataset’s unique nature will require me to apply techniques I have learned in classes this semester, while teaching myself and applying new techniques. The objective of the data set is to find the features that will most impact the win, so that it is easier to balance the game. “Balancing” refers to updating the game, such as downgrading strategies that are too strong and upgrading strategies that feel too weak. This document will serve as the preliminary to champion balance by correlating specific stats to a win, so that in the future someone may correlate these stats to champions in the game. This goal may seem convoluted; however, each game will have 5 winners and 5 losers meaning that a single champion’s impact on the game is roughly 10%. If a champion is unbalanced, being double the strength of another champion, that only raises the 10% to 20%. Due to the law of averages, each champion will still have around 50% win rate despite being too weak or too strong. Therefore to properly balance a champion, one must look at the correlation between stats of the champion to the win-rate of each of those stats. This effectively takes out the problem of bias since every game has at least one winner and one loser. This dataset has 23,752 data points and 24 features (or columns). Features refer to a measurable piece of data, such as champion name, damage done, etc. These features do not refer to game features but rather the name of the data being measured. It is a complicated dataset, with several variables requiring several stages of feature modification to run the code. The code is also large enough to have significance. This dataset was specifically chosen due to my prior familiarity with the data, allowing me to focus on the machine learning techniques.

    1.2. Brief Description of League of Legends

    The online competitive multiplayer game, League of Legends is a part of the MOBA genre of games and is considered the most widely popular game of all time. Its most recent tournaments have made more money than the Superbowl; in 2020 League of Legends made 1.7 billion dollars. The premise of the game is two teams fighting to destroy the enemy Nexus. Below is the map of the game so it is easier to reference the variables given. Figure 1: Map of League of Legends ([1])

    1.3. Description of the Map

    The map of League of Legends contains 3 paths, each a lane with a corresponding name. Top Lane, Mid Lane (short for Middle), and Bot Lane (short for Bottom Lane). Each lane spawns ”minions” to help push lanes. These minions are very easy monsters to defeat and provide gold if a player lands the killing blow. Each lane has 3 towers and an inhibitor. All three of one lane’s tower + inhibitor need to be destroyed before a player can reach the nexus. The towers provide protection to players by hurting enemy champions when they are in range. The inhibitor provides no uses to ally team; however, if a player destroys the enemy inhibitor then ”Super Minions” will spawn in that lane. These buffed minions help push to finish the game. There are also forests called the Jungle in the middle of these lanes. In the jungle there are several monsters’ worth gold and that grow stronger as the game goes on. Some monsters, if killed, can even provide special bonuses. All these monsters can be killed by one player. The blue section seen in 1 that splits the map into two sides is known as the river. This is the equivalent of the half-way line in soccer. In this part of the map, Large Monsters spawn that require a group effort to takedown but give huge bonuses.

    1.4. Description of the Gameplay

    The game is known for its complexity and if you want a comprehensive guide, I have provided a link that I think does an excellent job (([2]) and ([3]) provide great comprehensive guides). This paper will explain only the minimal necessities to follow the data. There are 5 players on each team and each player plays a champion. This champion is a character with unique gameplay, stats, and abilities. These 5 players will each fill a specific role: Top Lane goes to the Top Lane, Mid Lane goes to the Mid Lane, the Jungler goes to the Jungle, and the Attack Damage Carry (ADC) and Support go to the Bot Lane. In each of their respective locations, each role attempts to make gold and level up. The gold is used to buy items (each with unique effects, ...

  3. C

    City of Chicago Data prtal

    • data.cityofchicago.org
    csv, xlsx, xml
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). City of Chicago Data prtal [Dataset]. https://data.cityofchicago.org/widgets/qd2y-e669
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Dec 2, 2025
    Authors
    City of Chicago
    Area covered
    Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  4. Automated Paper Screening for Clinical Reviews

    • kaggle.com
    zip
    Updated Oct 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jocelyn Dumlao (2023). Automated Paper Screening for Clinical Reviews [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/automated-paper-screening-for-clinical-reviews
    Explore at:
    zip(78877414 bytes)Available download formats
    Dataset updated
    Oct 18, 2023
    Authors
    Jocelyn Dumlao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    This project is a tool that allows researchers to automate the screening of titles and abstracts for clinical review papers en bloc. Given a CSV file describing your dataset(s) and CSV files of the titles and abstracts of the papers you want to screen, the tool will automatically generate a CSV file of the papers that meet your criteria. The GPT API powers the tool.

    Steps to reproduce To use the tool, you will need to provide the following files:

    • Dataset information CSV with the following columns:
      • 'Dataset Name' (str): name of the dataset
      • 'Inclusion Criteria' (str): screening inclusion criteria
      • 'Exclusion Criteria' (str): screening exclusion criteria
    • Dataset(s)
      • The name of the csv must match the 'Dataset Name' in the dataset information csv
      • There must be a "title" and "abstract" column in each csv

    To run the tool, run the following command:

    python3 screening.py
    

    Then, type your dataset information CSV path in. The tool will output the results in a CSV file in the results directory.

    For an example of how to use the tool, I'd like you to please take a look at the analysis.ipynb.

    Categories

    Artificial Intelligence, Medical Informatics, Natural Language Processing, Screening, Systematic Review

    Acknowledgements & Source

    Eddie Guo

    Institutions: University of Calgary, University of Toronto

    Data Source

    View Details

    Image Source:Large Language Models: Complete Guide in 2023

    Please don't forget to upvote if you find this useful.

  5. h

    McBE

    • huggingface.co
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Velikaya Scarlet (2025). McBE [Dataset]. https://huggingface.co/datasets/Velikaya/McBE
    Explore at:
    Dataset updated
    Aug 9, 2025
    Authors
    Velikaya Scarlet
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ATTENTION: There are two types of data format files here: CSV and XLSX. The CSV files are uploaded for easy browsing of the data on Hugging Face. For actual testing, please use the files in the XLSX folder.

      Dataset Card for Dataset Name
    
    
    
    
    
      Dataset Details
    
    
    
    
    
      Dataset Description
    

    McBE is designed to address the scarcity of Chinese-centric bias evaluation resources for large language models (LLMs). It supports multi-faceted bias assessment across 5 evaluation tasks… See the full description on the dataset page: https://huggingface.co/datasets/Velikaya/McBE.

  6. Natural Questions Dataset

    • kaggle.com
    zip
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fujoos (2024). Natural Questions Dataset [Dataset]. https://www.kaggle.com/datasets/frankossai/natural-questions-dataset
    Explore at:
    zip(116502047 bytes)Available download formats
    Dataset updated
    Mar 15, 2024
    Authors
    fujoos
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context

    The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.

    Data Collection

    The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.

    Data Pre-processing

    The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.

    These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.

    Data Storage

    The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.

    Filtered Results

    The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.

    Flask CSV Reader App

    The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'

  7. d

    UNI-CEN Standardized Census Data Table - Census Tract (CT) - 1961 - Wide...

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNI-CEN Project (2023). UNI-CEN Standardized Census Data Table - Census Tract (CT) - 1961 - Wide Format (CSV) (Version 2023-03) [Dataset]. http://doi.org/10.5683/SP3/87U4LD
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    UNI-CEN Project
    Time period covered
    Jan 1, 1961
    Description

    UNI-CEN Standardized Census Data Tables contain Census data that have been reformatted into a common table format with standardized variable names and codes. The data are provided in two tabular formats for different use cases. "Long" tables are suitable for use in statistical environments, while "wide" tables are commonly used in GIS environments. The long tables are provided in Stata Binary (dta) format, which is readable by all statistics software. The wide tables are provided in comma-separated values (csv) and dBase 3 (dbf) formats with codebooks. The wide tables are easily joined to the UNI-CEN Digital Boundary Files. For the csv files, a .csvt file is provided to ensure that column data formats are correctly formatted when importing into QGIS. A schema.ini file does the same when importing into ArcGIS environments. As the DBF file format supports a maximum of 250 columns, tables with a larger number of variables are divided into multiple DBF files. For more information about file sources, the methods used to create them, and how to use them, consult the documentation at https://borealisdata.ca/dataverse/unicen_docs. For more information about the project, visit https://observatory.uwo.ca/unicen.

  8. BioTIME

    • zenodo.org
    bin, csv
    Updated Jun 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioTIME Consortium; BioTIME Consortium (2021). BioTIME [Dataset]. http://doi.org/10.5281/zenodo.1095628
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    BioTIME Consortium; BioTIME Consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.

    This data consists of several elements:

    • BioTIMESQL_06_12_2017.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.
    • BioTIMEQuery_06_12_2017.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.
    • BioTIMEMetadata_06_12_2017.csv - file containing the meta data for all studies.
    • BioTIMECitations_06_12_2017.csv - file containing the citation list for all studies.
    • BioTIMECitations_06_12_2017.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).
    • BioTIMEInteractions_06_12_2017.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).

    Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.

  9. UKBA: structure and salary, February 2011

    • gov.uk
    Updated Oct 15, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2010). UKBA: structure and salary, February 2011 [Dataset]. https://www.gov.uk/government/publications/ukba-structure-and-salary-february-2011
    Explore at:
    Dataset updated
    Oct 15, 2010
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    These documents explain the UK Border Agency’s structure in details, as well as the salaries of its senior civil servants.

    Date: Fri Oct 15 10:25:10 BST 2010

  10. DataSheet2_Social Presence Outside the Augmented Reality Field of View.CSV

    • frontiersin.figshare.com
    txt
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Roman Miller; Jeremy N. Bailenson (2023). DataSheet2_Social Presence Outside the Augmented Reality Field of View.CSV [Dataset]. http://doi.org/10.3389/frvir.2021.656473.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Mark Roman Miller; Jeremy N. Bailenson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Augmented reality headsets in use today have a large area in which the real world can be seen, but virtual content cannot be displayed. Users perceptions of content in this area is not well understood. This work studies participants perception of a virtual character in this area by grounding this question in relevant theories of perception and performing a study using both behavioral and self-report measures. We find that virtual characters within the augmented periphery receive lower social presence scores, but we do notfind a difference in task performance. These findings inform application design and encourage future work in theories of AR perception and perception of virtual humans.

  11. e

    Data and scripts associated with a manuscript on residence time distribution...

    • knb.ecoinformatics.org
    • search.dataone.org
    • +1more
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Bao; Xuehang Song; Yunxiang Chen; Yilin Fang; William Perkins; Beck Powers-McCormack; Zhuoran Duan; Huiying Ren (2024). Data and scripts associated with a manuscript on residence time distribution simulation in two 10-kilometer long river sections [Dataset]. http://doi.org/10.15485/2336865
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    ESS-DIVE
    Authors
    Jie Bao; Xuehang Song; Yunxiang Chen; Yilin Fang; William Perkins; Beck Powers-McCormack; Zhuoran Duan; Huiying Ren
    Time period covered
    Jan 1, 2011 - Dec 31, 2015
    Area covered
    Description

    This data package is associated with the publication “On the Transferability of Residence Time Distributions in Two 10-km Long River Sections with Similar Hydromorphic Units” submitted to the Journal of Hydrology (Bao et al. 2024). Quantifying hydrologic exchange fluxes (HEFs) at the stream-groundwater interface, along with their residence time distributions (RTDs) in the subsurface, is crucial for managing water quality and ecosystem health in dynamic river corridors. However, directly simulating high-spatial resolution HEFs and RTDs can be a time-consuming process, particularly for watershed-scale modeling. Efficient surrogate models that link RTDs to hydromorphic units (HUs) may serve as alternatives for simulating RTDs in large-scale models. One common concern with these surrogate models, however, is the transferability of the relationship between the RTDs and HUs from one river corridor to another. To address this, we evaluated the HEFs and the resulting RTD-HU relationships for two 10-kilometer-long river corridors along the Columbia River, using a one-way coupled three-dimensional transient surface-subsurface water transport modeling framework that we previously developed. Applying this framework to the two river corridors with similar HUs allows for quantitative comparisons of HEFs and RTDs using both statistical tests and machine learning classification models. This data package includes the model inputs files and the simulation results data. This data package contains 10 folders. The modeling simulation results data are in the folders 100H_pt_data and 300area_pt_data, for the study domain Hanford 100H and 300 area respectively. The remaining eight folders contain the scripts and data to generate the manuscript figures. The file-level metadata file (Bao_2024_Residence_Time_Distribution _flmd.csv) includes a list of all files contained in this data package and descriptions for each. The data dictionary file (Bao_2024_Residence_Time_Distribution _dd.csv) includes column header definitions and units of all tabular files.

  12. Z

    BioTIME

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioTIME Consortium (2021). BioTIME [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1095627
    Explore at:
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    Centre for Biological Diversity and Scottish Oceans Institute, School of Biology, University of St. Andrews, St. Andrews, UK
    Authors
    BioTIME Consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.

    This data consists of several elements:

    BioTIMESQL_02_04_2018.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.

    BioTIMEQuery_02_04_2018.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.

    BioTIMEMetadata_02_04_2018.csv - file containing the meta data for all studies.

    BioTIMECitations_02_04_2018.csv - file containing the citation list for all studies.

    BioTIMECitations_02_04_2018.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).

    BioTIMEInteractions_02_04_2018.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).

    Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.

    To cite the data paper use the following:

    Dornelas M, Antão LH, Moyes F, Bates, AE, Magurran, AE, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Global Ecol Biogeogr. 2018; 27:760 - 786. https://doi.org/10.1111/geb.12729

  13. Z

    NII Face Mask Dataset

    • data.niaid.nih.gov
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trung-Nghia Le; Khanh-Duy Nguyen; Huy H. Nguyen; Junichi Yamagishi; Isao Echizen (2022). NII Face Mask Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5761724
    Explore at:
    Dataset updated
    Jan 26, 2022
    Dataset provided by
    National Institute of Informatics, Japan
    University of Information Technology-VNUHCM, Vietnam
    Authors
    Trung-Nghia Le; Khanh-Duy Nguyen; Huy H. Nguyen; Junichi Yamagishi; Isao Echizen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    =====================================================================

    NII Face Mask Dataset v1.0

    =====================================================================

    Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)

    Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam

    National Institute of Informatics Copyright (c) 2021

    Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn

    Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725

    =============================== INTRODUCTION ===============================

    The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.

    The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.

    =============================== REFERENCES ===============================

    If your publish using any of the data in this dataset please cite the following papers:

    Pre-print version

    @article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }

    Final version

    @INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}

    ======================== DATA STRUCTURE ==================================

    1. Directory Structure

    ./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md

    2. Description for each files in detail.

    We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.

    ============================== COPYING ================================

    This repository is made available under Creative Commons Attribution License (CC-BY).

    Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/

    THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

    ====================== ACKNOWLEDGEMENTS ================================

    This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.

    This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac

  14. Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories

    • catalog.data.gov
    • data.transportation.gov
    • +2more
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-294-l1-trajectories
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Area covered
    Interstate 294
    Description

    The main dataset is a 70 MB file of trajectory data (I294_L1_final.csv) that contains position, speed, and acceleration data for small and large automated (L1) vehicles and non-automated vehicles on a highway in a suburban environment. Supporting files include aerial reference images for ten distinct data collection “Runs” (I294_L1_RunX_with_lanes.png, where X equals 8, 18, and 20 for southbound runs and 1, 3, 7, 9, 11, 19, and 21 for northbound runs). Associated centerline files are also provided for each “Run” (I-294-L1-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I294 L1.csv” for more details). The dataset defines eight lanes (four lanes in each direction) using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The southbound lanes are shown visually in I294_L1_Lane-2.png through I294_L1_Lane-5.png and the northbound lanes are shown visually in I294_L1_Lane2.png through I294_L1_Lane5.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 1 ADAS-equipped vehicles with adaptive cruise control (ACC) enabled. The three vehicles manually entered the highway, moved to the second from left most lane, then enabled ACC with minimum following distance settings to initiate a string. The helicopter then followed the string of vehicles (which sometimes broke from the sting due to large following distances) northbound through the 4.8 km section of highway at an altitude of 300 meters. The goal of the data collection effort was to collect data related to human drivers' responses to vehicle strings. The road segment has four lanes in each direction and covers major on-ramp and an off-ramp in the southbound direction and one on-ramp in the northbound direction. The segment of highway is operated by Illinois Tollway and contains a high percentage of heavy vehicles. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a sunny day. As part of this dataset, the following files were provided: I294_L1_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the test vehicles with ACC engaged ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I294_L1_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound and southbound lanes) for each run X. I-294-L1-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane cent

  15. Z

    THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement...

    • data-staging.niaid.nih.gov
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schreiter, Tim; Rodrigues de Almeida, Tiago; Zhu, Yufei; Gutierrez Maestro, Eduardo; Rudenko, Andrey (2024). THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Interaction [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10407222
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset provided by
    Örebro University
    Robert Bosch (Germany)
    Authors
    Schreiter, Tim; Rodrigues de Almeida, Tiago; Zhu, Yufei; Gutierrez Maestro, Eduardo; Rudenko, Andrey
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The THÖR-MAGNI Dataset Tutorials

    THÖR-MAGNI datasets is a novel dataset of accurate human and robot navigation and interaction in diverse indoor contexts, building on the previous THÖR dataset protocol. We provide position and head orientation motion capture data, 3D LiDAR scans and gaze tracking. In total, THÖR-MAGNI captures 3.5 hours of motion of 40 participants on 5 recording days.

    This data collection is designed around systematic variation of factors in the environment to allow building cue-conditioned models of human motion and verifying hypotheses on factor impact. To that end, THÖR-MAGNI encompasses 5 scenarios, in which some of them have different conditions (i.e., we vary some factor):

    Scenario 1 (plus conditions A and B):

    Participants move in groups and individually;

    Robot as static obstacle;

    Environment with 3 obstacles and lane marking on the floor for condition B;

    Scenario 2:

    Participants move in groups, individually and transport objects with variable difficulty (i.e. bucket, boxes and a poster stand);

    Robot as static obstacle;

    Environment with 3 obstacles;

    Scenario 3 (plus conditions A and B):

    Participants move in groups, individually and transporting objects with variable difficulty (i.e. bucket, boxes and a poster stand). We denote each role as: Visitors-Alone, Visitors-Group 2, Visitors-Group 3, Carrier-Bucket, Carrier-Box, Carrier-Large Object;

    Teleoperated robot as moving agent: in condition A, the robot moves with differential drive; in condition B, the robot moves with omni-directional drive;

    Environment with 2 obstacles;

    Scenario 4 (plus conditions A and B):

    All participants, denoted as Visitors-Alone HRI interacted with the teleoperated mobile robot;

    Robot interacted in two ways: in condition A (Verbal-Only), the Anthropomorphic Robot Mock Driver (ARMoD), a small humanoid NAO robot on top of the mobile platform, only used speech to communicate the next goal point to the participant; in condition B the ARMoD used speech, gestures and robotic gaze to convey the same message;

    Free space environment

    Scenario 5:

    Participants move alone (Visitors-Alone) and one of the participants, denoted as Visitors-Alone HRI, transport objects and interact with the robot;

    The ARMoD is remotely controlled by an experimenter and proactively offers help;

    Free space environment;

    Preliminary steps

    Before proceeding, make sure to download the data from ZENODO

    1. Directory Structure

    ├── CLiFF_Maps <- Directory for CLiFF Maps for all files

       ├── Files <- Directory for the csv files
    
       ├── Readme.md
    

    ├── CSVs_Scenarios <- Directory for aligned data for all scenarios

       ├── Scenario_1 <- Directory for the csv files for Scenario 1
    
       ├── Scenario_2 <- Directory for the csv files for Scenario 2
    
       ├── Scenario_3 <- Directory for the csv files for Scenario 3
    
       ├── Scenario_4 <- Directory for the csv files for Scenario 4
    
       ├── Scenario_5 <- Directory for the csv files for Scenario 5
    

    ├── docs

       ├── tutorials.md <- Tutorials document on how to use the data
    

    ├── Lidar_sample

      ├── Files <- Directory for sample files
    
          ├── 170522_SC3B_1 <- Directory for the pcd files
    
          ├── 170522_SC3B_1.csv <- Synchronization file with QTM
    
          ├── manual_view_point.json <- json file with manual view point for visualization
    
          ├── requirements.txt <- script pip requirements
    
          ├── visualize_pcd.py <- script visualize the lidar data
    
      ├── Readme.md
    

    ├── maps <- Directory for maps of the environment (PNG files) and offsets (json file)

      ├── offsets.json <- Offsets of the map with respect to the global coordinate frame origin
    
      ├── {date}_SC{sc_id}_map.png <- Maps for `date` in {1205, 1305, 1705, 1805} and `sc_id` in {1A, 1B, 2, 3}
    
      ├── 3009_map.png <- Map for the Scenarios 4A, 4B and 5
    

    ├── MP4_Videos

      ├── Files <- Directory for the mp4 files
    
      ├── pupil_scene_camera_instrinsics.json <- json file with the intrinsics of pupil camera
    

    ├── TSVs_RAWET <- Directory for the TSV files for the Raw Eyetracking data for all Scenarios

      ├── synch_info.csv <- Event markers necessary to align motion capture with eyetracking data
    
      ├── Files <- Directory with all the raw eyetracking TSV files
    

    ├── goals_positions.csv <- File with the goals locations

    1. Data Structure and Dataset Files

    Withing each Scenario directory, each csv file contains:

    2.1. Headers

    The dataset metadata overview contains important information found in the CSV file headers. This reference is designed to help users understand and use the dataset effectively. The headers include details such as FILE_ID, which provides information on the date, scenario, condition, and run associated with each recording. The header of the document includes important quantities such as the number of frames recorded (N_FRAMES_QTM), the count of rigid bodies (N_BODIES), and the total number of markers (N_MARKERS).

    It also provides information about the order of the contiguous rotation matrix (CONTIGUOUS_ROTATION_MATRIX), modalities measured with units, and specified measurement units. The text presents details on the eyetracking devices used in each recording, including their infrared sensor and scene camera frequencies, as well as an indication of the presence of eyetracking data.

    The header provides specific information about rigid bodies, including their names (BODY_NAMES), role labels (BODY_ROLES), and the number of markers associated with each rigid body (BODY_NR_MARKERS). Finally, the table lists all marker names used in the file.

    This metadata provides researchers and practitioners with essential guidance on recording information, data quantities, and specifics about rigid bodies and markers. It is a valuable resource for understanding and effectively using the dataset in the CSV files.

    2.2. Trajectory Data

    The remaining portion of the CSV file integrates merged data from the motion capture system and eye tracking devices, organized based on participants' helmet rigid bodies. Columns within the dataset include XYZ coordinates of all markers, spatial centroid coordinates, 6DOF orientation of the object's local coordinate frame, and if available eye tracking data, encompassing 2D/3D gaze coordinates, scene recording frame numbers, eye movement types, and IMU data.

    Missing data is denoted by "N/A" or an empty cell. Temporal indexing is facilitated by the "Time" or "Frame" column, indicating timestamps or frame numbers. The motion capture system records at 100Hz, Tobii Glasses at 50Hz (Raw); 25 Hz (Camera), and Pupil Glasses at 100Hz (Raw); 30 Hz (Camera). The dataset is structured around motion capture recordings, and for each rigid body, such as "Helmet_1," details per frame include XYZ coordinates of markers, centroid coordinates, and a 9-element rotational matrix describing helmet orientation.

    Header Explanation

    Helmet_1 - 1 X X-Coordinate of Marker Number 1

    Helmet_1 - 1 Y Y-Coordinate of Marker Number 1

    Helmet_1 - 1 Z Z-Coordinate of Marker Number 1

    Helmet_1 - [...] Same for Marker 2 and 3 of Helmet_1

    Helmet_1 Centroid_X X-Coordinate of the Centroid

    Helmet_1 Centroid_Y Y-Coordinate of the Centroid

    Helmet_1 Centroid_Z Z-Coordinate of the Centroid

    Helmet_1 R0 1st Element of the CONTIGUOUS_ROTATION_MATRIX

    Helmet_1 R[..] Same for R1- R7

    Helmet_1 R8 9th Element of the CONTIGUOUS_ROTATION_MATRIX

    2.3. Eyetracking Data

    The eye tracking data in the dataset includes 16 participants, providing a comprehensive dataset of over 500 minutes of recorded data across the different activities and scenarios with three different eyetracking devices. Devices are denoted with a special "Tracker_ID" in the dataset, i.e.:

    Tracker ID Eyetracking Device

    TB2 Tobii 2 Glasses

    TB3 Tobii 3 Glasses

    PPL Pupil Insivisible Glasses

    Gaze points are classified into fixations and saccades using the Tobii I-VT Attention filter, which is specifically optimized for dynamic scenarios with a velocity threshold of 100°. Eyetracking devices were systematically repeated after each 4-minute recording to account for natural variations in participants' eye shapes and to improve the gaze estimation algorithms. In addition, gaze estimation adjustments for the pupil invisible glasses were made after each 4-minute recording to mitigate potential drifts. It's worth noting that the scene cameras of the eye tracking glasses had different fields of view. The scene camera of the Pupil Invisible Glasses had a 1088x1080 image with both horizontal (HFOV) and vertical (VFOV) opening angles of 80°, while the Tobii Glasses provided a 1920x1080 image with different opening angles for Tobii Glasses 3 (HFOV: 95°, VFOV: 63°) and Tobii Glasses 2 (HFOV: 82°, VFOV: 52°).

    NOTE AS OF 2024: Videos are NOW part of the dataset

    For one participant, wearing the Tobii Glasses 3 and Helmet_6, the data would be denoted as:

    Header Explanation

    Helmet_6 - [...] X,Y,Z Coordinates for 5 markers

    Helmet_6 [...] X,Y,Z Coordinates for 1 Centroid*

    Helmet_6 R[...] 9 Elements of the CONTIGUOUS_ROTATION_MATRIX

    Helmet_6 TB3_Accelerometer_[...]

    Accelerometer data along the X,Y,Z Axis

    Helmet_6 TB3_Gyroscope_[...] Gyroscope data along the X,Y,Z Axis

    Helmet_6 TB3_Magnetometer_[...] Magnetometer data along the X,Y,Z Axis

    Helmet_6 TB3_G2D_[...] 2D Eye tracking data (X,Y)

    Helmet_6 TB3_G3D_[...] 3D Cyclopic Eye gaze Vector (X,Y,Z)

    Helmet_6 TB3_Movement Eye movement type (N/A, Fixation or Saccade)

    Helmet_6 TB3_SceneFNr Frame number of the scene camera recording

    How to use and tools

    magni-dash

    This is a dashboard to quickly visualize our data: trajectories, speeds,

  16. d

    Data from: Large brains and groups associated with high rates of agonism in...

    • datadryad.org
    • data.niaid.nih.gov
    • +3more
    zip
    Updated Feb 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica B. Cowl; Susanne Shultz (2017). Large brains and groups associated with high rates of agonism in primates [Dataset]. http://doi.org/10.5061/dryad.2nd65
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 27, 2017
    Dataset provided by
    Dryad
    Authors
    Veronica B. Cowl; Susanne Shultz
    Time period covered
    Feb 26, 2017
    Description

    Animals living in social groups will almost inevitably experience competition for limited resources. One consequence of competition can be agonism, an activity that is not only costly to participate in at the individual level but potentially also at the group level due the detrimental effects that agonism can have on group stability and cohesion. Agonism rates across primate species have previously been associated with group size and terrestriality; therefore primates, particularly those in large groups, should develop strategies to mitigate or counter-act agonism. Here, we use phylogenetically controlled analyses to evaluate whether the known relationship between brain size and group size may partially reflect an association between agonism and brain size in large groups. We find strong positive associations between group level agonism and 2 measures of brain size (endocranial volume and neocortex ratio) in 45 separate populations across 23 different primate species. In contrast, dyadi...

  17. H

    Data from: QUADICA - water quality, discharge and catchment attributes for...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Jan 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pia Ebeling; Rohini Kumar; Michael Weber; Andreas Musolff (2022). QUADICA - water quality, discharge and catchment attributes for large-sample studies in Germany [Dataset]. http://doi.org/10.4211/hs.26e8238f0be14fa1a49641cd8a455e29
    Explore at:
    zip(36.7 MB)Available download formats
    Dataset updated
    Jan 5, 2022
    Dataset provided by
    HydroShare
    Authors
    Pia Ebeling; Rohini Kumar; Michael Weber; Andreas Musolff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1954 - Dec 31, 2015
    Area covered
    Germany
    Description

    This data set provides data of water quality, discharge, driving forces (meteorological and nitrogen surplus), and catchment attributes for a set of 1386 German catchments covering a wide range of natural and anthropogenic conditions. A corresponding paper "Water quality, discharge and catchment attributes for large-sample studies in Germany - QUADICA" by Ebeling et al. describing the data set in detail will be made available in the Journal Earth System Science Data (https://www.earth-system-science-data.net/).

    This repository includes:

    1.) Water quality data as annual medians of observed concentration data of N, P and C species (c_annual.csv) 2.) Water quantity data as annual medians of observed discharge (q_annual.csv) 3.) Monthly medians over the whole time series of water quality variables and discharge (c_months.csv) 4.) Monthly and annual median concentrations, flow-normalized concentrations, and mean fluxes estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability (for details see the corresponding paper Ebeling et al.; wrtds_monthly.csv, wrtds_annual.csv). 5.) Meteorological data as monthly median average temperatures and sums of precipitation and potential evapotranspiration (tavg_monthly.csv, pre_monthly.csv, pet_monthly.csv) 6.) N surplus time series on annual basis (n_surplus.csv) 7.) Summary statistics for the stations including number of samples, covered time periods, degree of censoring (concentrations below the detection limit), availability of discharge data, and availability and performance of WRTDS models (metadata.csv). 8.) Description of data tables (Metadata_QUADICA.pdf).

    Data on catchment attributes and geodata also part of the QUADICA data set are available at "CCDB - catchment characteristics data base Germany" (https://doi.org/10.4211/hs.82f8094dd61e449a826afdef820a2c19). The metadata of the water quality and quantity data base is available at "WQQDB - water quality and quantity data base Germany" (https://doi.org/10.4211/hs.a42addcbd59a466a9aa56472dfef8721).

    Conditions: The data set is freely and easily accessible. Please refer to the corresponding paper Ebeling et al. when using or referring to this data set.

  18. Clash Royale Battles- Upper Ladder, December 2021

    • kaggle.com
    zip
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Xue (2022). Clash Royale Battles- Upper Ladder, December 2021 [Dataset]. https://www.kaggle.com/datasets/nonrice/clash-royale-battles-upper-ladder-december-2021/data
    Explore at:
    zip(11617500 bytes)Available download formats
    Dataset updated
    Jan 6, 2022
    Authors
    Eric Xue
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Clash Royale is a wildly popular game that revolves around fast 1v1 battles. The game is unique in that, along with skill, a player's "deck" (exactly 8 deployable troops the player picks) has also been widely thought to affect one's chances in the game. It is intriguing to be able to see how the endless world of deck configurations match up.

    Content

    The dataset has two files: - data_ord.csv: This is juicy part. A collection of 3/4ths of 1 million 1v1 battles recorded from the game. - Each row is 1 battle - The first 8 columns describe Player 1's deck, representing each card with a number - The next 8 columns describe Player 2's deck, representing each card with a number - The next 2 columns describe the respective trophy counts of Players 1 and 2 - The last column is the outcome of the battle; 1 if Player 1 wins, and 0 if Player 2 wins - IMPORTANT: Player decks HAVE NO ORDER (In data_ord.csv, they are sorted by ascending card ID, but they can be in any arrangement in real life), and card ID numbers are all categorical - cardlist.csv: You will notice that rather than the names of a card, they are instead described as integers in data_ord.csv. This file translates each card ID from data_ord.csv into its corresponding card name.

    Acknowledgements

    Data obtained from the Clash Royale API

    Inspiration

    I collected this data in hopes to see if the aforementioned "deck configuration" and "trophy counts" could, for the most part, fully predict the outcome of a game.

  19. h

    VIVID-10M

    • huggingface.co
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kling Team (2025). VIVID-10M [Dataset]. https://huggingface.co/datasets/KlingTeam/VIVID-10M
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Kling Team
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    VIVID-10M

    [project page] | [Paper] | [arXiv] VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, comprising 9.7M samples that encompass a wide range of video editing tasks.

      Data Index
    

    The data index is located at four .csv files: vivid-image-change.csv vivid-image-remove.csv vivid-video-change.csv vivid-video-remove.csv

    VIVID-Video splits contains the columns: local_caption, #… See the full description on the dataset page: https://huggingface.co/datasets/KlingTeam/VIVID-10M.

  20. u

    EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils,...

    • data.nceas.ucsb.edu
    • dataone.org
    • +2more
    Updated Aug 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie C. Pennington; Silver Alford; Michael P. Back; Vanessa Bailey; Andy Baldwin; Jade Bolinger; Madison Bowe; Maxim I. Boyanov; Jacob A. Cianci-Gaskill; Nathan A. Conroy; Matthew J. Cooper; Donnie Day; Alex Demeo; Kyle Derby; Derek Detweiler; Suzanne Devres-Zimmerman; Erin Eberhard; Keryn Gedan; LeeAnn Haaf; Khadijah K. Homolka; Erin Johnson; Kenneth M. Kemner; Aliya Khan; Matthew Kirwan; Payton Kittaka; Erika Koontz; Adam Langley; Riley Leff; Scott Lerberg; Allison M. Lewis; Sairah Malkin; Amy M. Marcarelli; Steven E. McMurray; Tyler Messerschmidt; Taylor C. Michael; Holly A. Michael; Elizabeth C. Minor; Brian Moye; Thomas J. Mozdzer; Scott Neubauer; Cooper G. Norris; Edward J. O'Loughlin; Opal Otenburg; Andrea Pain; Kaizad F. Patel; Michael Philben; Evan Phillips; Dannielle Pratt; Peter Regier; Jesse Alan Roebuck Jr.; Lauren Sage; Daniel Sandborn; Stacy Smith; Alex Smith; Samina Soin-Voshell; Bongkeun Song; Amanda Sprague-Getsy; Kari St. Laurent; Lorie Staver; Alice Stearns; Lucie Stetten; Rebecca Swerida; Ethan J. Theuerkauf; Katherine Tully; Rodrigo Vargas; Nicholas D. Ward; Elizabeth Watson; Coreen Weilminster; Allison N. Myers-Pigg (2023). EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients [Dataset]. http://doi.org/10.15485/1960313
    Explore at:
    Dataset updated
    Aug 14, 2023
    Dataset provided by
    ESS-DIVE
    Authors
    Stephanie C. Pennington; Silver Alford; Michael P. Back; Vanessa Bailey; Andy Baldwin; Jade Bolinger; Madison Bowe; Maxim I. Boyanov; Jacob A. Cianci-Gaskill; Nathan A. Conroy; Matthew J. Cooper; Donnie Day; Alex Demeo; Kyle Derby; Derek Detweiler; Suzanne Devres-Zimmerman; Erin Eberhard; Keryn Gedan; LeeAnn Haaf; Khadijah K. Homolka; Erin Johnson; Kenneth M. Kemner; Aliya Khan; Matthew Kirwan; Payton Kittaka; Erika Koontz; Adam Langley; Riley Leff; Scott Lerberg; Allison M. Lewis; Sairah Malkin; Amy M. Marcarelli; Steven E. McMurray; Tyler Messerschmidt; Taylor C. Michael; Holly A. Michael; Elizabeth C. Minor; Brian Moye; Thomas J. Mozdzer; Scott Neubauer; Cooper G. Norris; Edward J. O'Loughlin; Opal Otenburg; Andrea Pain; Kaizad F. Patel; Michael Philben; Evan Phillips; Dannielle Pratt; Peter Regier; Jesse Alan Roebuck Jr.; Lauren Sage; Daniel Sandborn; Stacy Smith; Alex Smith; Samina Soin-Voshell; Bongkeun Song; Amanda Sprague-Getsy; Kari St. Laurent; Lorie Staver; Alice Stearns; Lucie Stetten; Rebecca Swerida; Ethan J. Theuerkauf; Katherine Tully; Rodrigo Vargas; Nicholas D. Ward; Elizabeth Watson; Coreen Weilminster; Allison N. Myers-Pigg
    Time period covered
    Oct 1, 2021 - Dec 1, 2021
    Area covered
    Description

    The EXploration of Coastal Hydrobiogeochemistry Across a Network of Gradients and Experiments (EXCHANGE) program is a consortium of scientists working together to improve our understanding of how the two-way exchange of water between estuaries or large lake lacustuaries and the terrestrial landscape influence the state and function of ecosystems across the coastal interface. EXCHANGE Campaign 1 (EC1) focuses on the spatial variation in biogeochemical structure and function at the coastal terrestrial-aquatic interface (TAI). In the Fall of 2021, the EXCHANGE Consortium gathered samples from 52 TAIs. Samples collected from EC1 were analyzed for bulk geochemical parameters, bulk physicochemical parameters, organic matter characteristics, and redox-sensitive elements. Please download ec1_README.pdf for a complete list of available data in each .zip folder, package version history, and detailed information about the project. This README will serve as the central place for EC1 Data Package updates. EC1 Data Package Structure: ec1_README.pdf ec1_methods.pdf ec1_metadata_v1.zip ...ec1_dd.csv ...ec1_flmd.csv ...ec1_sample_catalog.csv ...ec1_metadata_kitlevel.csv ...ec1_metadata_collectionlevel.csv ...ec1_data_collectionlevel.csv ...ec1_igsn_metadata.csv ec1_soil_v1.zip ec1_sediment_v1.zip ec1_water_v1.zip This data package was originally published May 2023 (v1). Subsequent updates will be published here with new version numbers. Please see the Change History section in ec1_README.pdf for detailed changes. --- Acknowledging EXCHANGE: General Support and Data Product Use We ask that users of EXCHANGE data add the following acknowledgement when publishing data in scholarly articles and data repositories: "This research is based on work supported by COMPASS-FME, a multi-institutional project supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research as part of the Environmental System Science Program."

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Caleb Fahlgren (2025). big-csv-2 [Dataset]. https://huggingface.co/datasets/cfahlgren1/big-csv-2

big-csv-2

cfahlgren1/big-csv-2

Explore at:
Dataset updated
Nov 4, 2025
Authors
Caleb Fahlgren
Description

cfahlgren1/big-csv-2 dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu