49 datasets found

h
big-csv-2
huggingface.co
Updated Nov 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caleb Fahlgren (2025). big-csv-2 [Dataset]. https://huggingface.co/datasets/cfahlgren1/big-csv-2
Explore at:
Dataset updated
Nov 4, 2025
Authors
Caleb Fahlgren
Description
cfahlgren1/big-csv-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
League Of Legends Data
kaggle.com
zip
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Preston Robertson (2022). League Of Legends Data [Dataset]. https://www.kaggle.com/datasets/prestonrobertson7/league-of-legends-data-9292022
Explore at:
zip(15143027 bytes)Available download formats
Dataset updated
Oct 5, 2022
Authors
Preston Robertson
Description
Basic Data Description

X matches of the most recent League of Legends games as of the date specified in the CSV file. Each game has 10 separate players and each player recorded has 68 recorded features.

Where X matches is based on how many I am capable of pulling at a given time. With a maximum of 10,000 matches.

League of Legends

1. Data Set Description

1.1. Introduction to the Data Set

This data set is on an online competitive game called League of Legends. I chose this data set to challenge myself; the dataset’s unique nature will require me to apply techniques I have learned in classes this semester, while teaching myself and applying new techniques. The objective of the data set is to find the features that will most impact the win, so that it is easier to balance the game. “Balancing” refers to updating the game, such as downgrading strategies that are too strong and upgrading strategies that feel too weak. This document will serve as the preliminary to champion balance by correlating specific stats to a win, so that in the future someone may correlate these stats to champions in the game. This goal may seem convoluted; however, each game will have 5 winners and 5 losers meaning that a single champion’s impact on the game is roughly 10%. If a champion is unbalanced, being double the strength of another champion, that only raises the 10% to 20%. Due to the law of averages, each champion will still have around 50% win rate despite being too weak or too strong. Therefore to properly balance a champion, one must look at the correlation between stats of the champion to the win-rate of each of those stats. This effectively takes out the problem of bias since every game has at least one winner and one loser. This dataset has 23,752 data points and 24 features (or columns). Features refer to a measurable piece of data, such as champion name, damage done, etc. These features do not refer to game features but rather the name of the data being measured. It is a complicated dataset, with several variables requiring several stages of feature modification to run the code. The code is also large enough to have significance. This dataset was specifically chosen due to my prior familiarity with the data, allowing me to focus on the machine learning techniques.

1.2. Brief Description of League of Legends

The online competitive multiplayer game, League of Legends is a part of the MOBA genre of games and is considered the most widely popular game of all time. Its most recent tournaments have made more money than the Superbowl; in 2020 League of Legends made 1.7 billion dollars. The premise of the game is two teams fighting to destroy the enemy Nexus. Below is the map of the game so it is easier to reference the variables given. Figure 1: Map of League of Legends ([1])

1.3. Description of the Map

The map of League of Legends contains 3 paths, each a lane with a corresponding name. Top Lane, Mid Lane (short for Middle), and Bot Lane (short for Bottom Lane). Each lane spawns ”minions” to help push lanes. These minions are very easy monsters to defeat and provide gold if a player lands the killing blow. Each lane has 3 towers and an inhibitor. All three of one lane’s tower + inhibitor need to be destroyed before a player can reach the nexus. The towers provide protection to players by hurting enemy champions when they are in range. The inhibitor provides no uses to ally team; however, if a player destroys the enemy inhibitor then ”Super Minions” will spawn in that lane. These buffed minions help push to finish the game. There are also forests called the Jungle in the middle of these lanes. In the jungle there are several monsters’ worth gold and that grow stronger as the game goes on. Some monsters, if killed, can even provide special bonuses. All these monsters can be killed by one player. The blue section seen in 1 that splits the map into two sides is known as the river. This is the equivalent of the half-way line in soccer. In this part of the map, Large Monsters spawn that require a group effort to takedown but give huge bonuses.

1.4. Description of the Gameplay

The game is known for its complexity and if you want a comprehensive guide, I have provided a link that I think does an excellent job (([2]) and ([3]) provide great comprehensive guides). This paper will explain only the minimal necessities to follow the data. There are 5 players on each team and each player plays a champion. This champion is a character with unique gameplay, stats, and abilities. These 5 players will each fill a specific role: Top Lane goes to the Top Lane, Mid Lane goes to the Mid Lane, the Jungler goes to the Jungle, and the Attack Damage Carry (ADC) and Support go to the Bot Lane. In each of their respective locations, each role attempts to make gold and level up. The gold is used to buy items (each with unique effects, ...
C
City of Chicago Data prtal
data.cityofchicago.org
csv, xlsx, xml
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2025). City of Chicago Data prtal [Dataset]. https://data.cityofchicago.org/widgets/qd2y-e669
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Dec 2, 2025
Authors
City of Chicago
Area covered
Chicago
Description
This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

Data fields requiring description are detailed below.

APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

LICENSE STATUS: 'AAI' means the license was issued.

Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

Data Owner: Business Affairs and Consumer Protection

Time Period: Current

Frequency: Data is updated daily
Automated Paper Screening for Clinical Reviews
kaggle.com
zip
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jocelyn Dumlao (2023). Automated Paper Screening for Clinical Reviews [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/automated-paper-screening-for-clinical-reviews
Explore at:
zip(78877414 bytes)Available download formats
Dataset updated
Oct 18, 2023
Authors
Jocelyn Dumlao
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

This project is a tool that allows researchers to automate the screening of titles and abstracts for clinical review papers en bloc. Given a CSV file describing your dataset(s) and CSV files of the titles and abstracts of the papers you want to screen, the tool will automatically generate a CSV file of the papers that meet your criteria. The GPT API powers the tool.

Steps to reproduce To use the tool, you will need to provide the following files:

Dataset information CSV with the following columns:

'Dataset Name' (str): name of the dataset

'Inclusion Criteria' (str): screening inclusion criteria

'Exclusion Criteria' (str): screening exclusion criteria

Dataset(s)

The name of the csv must match the 'Dataset Name' in the dataset information csv

There must be a "title" and "abstract" column in each csv

To run the tool, run the following command:

python3 screening.py

Then, type your dataset information CSV path in. The tool will output the results in a CSV file in the results directory.

For an example of how to use the tool, I'd like you to please take a look at the analysis.ipynb.

Categories

Artificial Intelligence, Medical Informatics, Natural Language Processing, Screening, Systematic Review

Acknowledgements & Source

Eddie Guo

Institutions: University of Calgary, University of Toronto

Data Source

View Details

Image Source:Large Language Models: Complete Guide in 2023

Please don't forget to upvote if you find this useful.
h
McBE
huggingface.co
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Velikaya Scarlet (2025). McBE [Dataset]. https://huggingface.co/datasets/Velikaya/McBE
Explore at:
Dataset updated
Aug 9, 2025
Authors
Velikaya Scarlet
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ATTENTION: There are two types of data format files here: CSV and XLSX. The CSV files are uploaded for easy browsing of the data on Hugging Face. For actual testing, please use the files in the XLSX folder.

Dataset Card for Dataset Name Dataset Details Dataset Description

McBE is designed to address the scarcity of Chinese-centric bias evaluation resources for large language models (LLMs). It supports multi-faceted bias assessment across 5 evaluation tasks… See the full description on the dataset page: https://huggingface.co/datasets/Velikaya/McBE.
Natural Questions Dataset
kaggle.com
zip
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fujoos (2024). Natural Questions Dataset [Dataset]. https://www.kaggle.com/datasets/frankossai/natural-questions-dataset
Explore at:
zip(116502047 bytes)Available download formats
Dataset updated
Mar 15, 2024
Authors
fujoos
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context

The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.

Data Collection

The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.

Data Pre-processing

The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.

These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.

Data Storage

The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.

Filtered Results

The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.

Flask CSV Reader App

The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'
d
UNI-CEN Standardized Census Data Table - Census Tract (CT) - 1961 - Wide...
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNI-CEN Project (2023). UNI-CEN Standardized Census Data Table - Census Tract (CT) - 1961 - Wide Format (CSV) (Version 2023-03) [Dataset]. http://doi.org/10.5683/SP3/87U4LD
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/87U4LD
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
UNI-CEN Project
Time period covered
Jan 1, 1961
Description
UNI-CEN Standardized Census Data Tables contain Census data that have been reformatted into a common table format with standardized variable names and codes. The data are provided in two tabular formats for different use cases. "Long" tables are suitable for use in statistical environments, while "wide" tables are commonly used in GIS environments. The long tables are provided in Stata Binary (dta) format, which is readable by all statistics software. The wide tables are provided in comma-separated values (csv) and dBase 3 (dbf) formats with codebooks. The wide tables are easily joined to the UNI-CEN Digital Boundary Files. For the csv files, a .csvt file is provided to ensure that column data formats are correctly formatted when importing into QGIS. A schema.ini file does the same when importing into ArcGIS environments. As the DBF file format supports a maximum of 250 columns, tables with a larger number of variables are divided into multiple DBF files. For more information about file sources, the methods used to create them, and how to use them, consult the documentation at https://borealisdata.ca/dataverse/unicen_docs. For more information about the project, visit https://observatory.uwo.ca/unicen.
BioTIME
zenodo.org
bin, csv
Updated Jun 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BioTIME Consortium; BioTIME Consortium (2021). BioTIME [Dataset]. http://doi.org/10.5281/zenodo.1095628
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1095628
Dataset updated
Jun 24, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
BioTIME Consortium; BioTIME Consortium
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.

This data consists of several elements:

BioTIMESQL_06_12_2017.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.

BioTIMEQuery_06_12_2017.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.

BioTIMEMetadata_06_12_2017.csv - file containing the meta data for all studies.

BioTIMECitations_06_12_2017.csv - file containing the citation list for all studies.

BioTIMECitations_06_12_2017.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).

BioTIMEInteractions_06_12_2017.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).

Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.
UKBA: structure and salary, February 2011
gov.uk
Updated Oct 15, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2010). UKBA: structure and salary, February 2011 [Dataset]. https://www.gov.uk/government/publications/ukba-structure-and-salary-february-2011
Explore at:
Dataset updated
Oct 15, 2010
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Home Office
Description
These documents explain the UK Border Agency’s structure in details, as well as the salaries of its senior civil servants.

Date: Fri Oct 15 10:25:10 BST 2010

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/org-structure-feb11?view=Binary">Organisational structure - revised February 2011 (Microsoft Powerpoint file - 1mb - Warning: large file)

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/organisation-structure-ukba?view=Binary">Organisational structure of the UK Border Agency (Microsoft Powerpoint file - 1mb - Warning: large file)

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/ukba-junior-dataset.csv?view=Binary">Junior staff dataset - UK Border Agency (Microsoft Excel file - 64kb)

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/senior-staff-feb11.csv?view=Binary">Senior staff dataset - revised February 2011 (Microsoft Excel file - 60kb)

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/ukba-senior-staff-dataset.csv?view=Binary">Senior staff dataset - UK Border Agency (Microsoft Excel file - 63kb)

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/senior-salaries-feb11?view=Binary">Senior staff salaries - revised February 2011 (Microsoft Excel file - 2kb)

http://www.homeoffice.gov.uk/publications/about-us/corporate-publications/structure-salaries/ukba-structure-salary/senior-staff-salaries-ukba.csv?view=Binary">Senior staff salaries - UK Border Agency (Microsoft Excel file - 14kb)
DataSheet2_Social Presence Outside the Augmented Reality Field of View.CSV
frontiersin.figshare.com
txt
Updated Jun 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Roman Miller; Jeremy N. Bailenson (2023). DataSheet2_Social Presence Outside the Augmented Reality Field of View.CSV [Dataset]. http://doi.org/10.3389/frvir.2021.656473.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/frvir.2021.656473.s002
Dataset updated
Jun 5, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Mark Roman Miller; Jeremy N. Bailenson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Augmented reality headsets in use today have a large area in which the real world can be seen, but virtual content cannot be displayed. Users perceptions of content in this area is not well understood. This work studies participants perception of a virtual character in this area by grounding this question in relevant theories of perception and performing a study using both behavioral and self-report measures. We find that virtual characters within the augmented periphery receive lower social presence scores, but we do notfind a difference in task performance. These findings inform application design and encourage future work in theories of AR perception and perception of virtual humans.
e
Data and scripts associated with a manuscript on residence time distribution...
knb.ecoinformatics.org
search.dataone.org
+1more
Updated Apr 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jie Bao; Xuehang Song; Yunxiang Chen; Yilin Fang; William Perkins; Beck Powers-McCormack; Zhuoran Duan; Huiying Ren (2024). Data and scripts associated with a manuscript on residence time distribution simulation in two 10-kilometer long river sections [Dataset]. http://doi.org/10.15485/2336865
Explore at:
Unique identifier
https://doi.org/10.15485/2336865
Dataset updated
Apr 17, 2024
Dataset provided by
ESS-DIVE
Authors
Jie Bao; Xuehang Song; Yunxiang Chen; Yilin Fang; William Perkins; Beck Powers-McCormack; Zhuoran Duan; Huiying Ren
Time period covered
Jan 1, 2011 - Dec 31, 2015
Area covered

Description
This data package is associated with the publication “On the Transferability of Residence Time Distributions in Two 10-km Long River Sections with Similar Hydromorphic Units” submitted to the Journal of Hydrology (Bao et al. 2024). Quantifying hydrologic exchange fluxes (HEFs) at the stream-groundwater interface, along with their residence time distributions (RTDs) in the subsurface, is crucial for managing water quality and ecosystem health in dynamic river corridors. However, directly simulating high-spatial resolution HEFs and RTDs can be a time-consuming process, particularly for watershed-scale modeling. Efficient surrogate models that link RTDs to hydromorphic units (HUs) may serve as alternatives for simulating RTDs in large-scale models. One common concern with these surrogate models, however, is the transferability of the relationship between the RTDs and HUs from one river corridor to another. To address this, we evaluated the HEFs and the resulting RTD-HU relationships for two 10-kilometer-long river corridors along the Columbia River, using a one-way coupled three-dimensional transient surface-subsurface water transport modeling framework that we previously developed. Applying this framework to the two river corridors with similar HUs allows for quantitative comparisons of HEFs and RTDs using both statistical tests and machine learning classification models. This data package includes the model inputs files and the simulation results data. This data package contains 10 folders. The modeling simulation results data are in the folders 100H_pt_data and 300area_pt_data, for the study domain Hanford 100H and 300 area respectively. The remaining eight folders contain the scripts and data to generate the manuscript figures. The file-level metadata file (Bao_2024_Residence_Time_Distribution _flmd.csv) includes a list of all files contained in this data package and descriptions for each. The data dictionary file (Bao_2024_Residence_Time_Distribution _dd.csv) includes column header definitions and units of all tabular files.
Z
BioTIME
data.niaid.nih.gov
zenodo.org
Updated Jun 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BioTIME Consortium (2021). BioTIME [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1095627
Explore at:
Dataset updated
Jun 24, 2021
Dataset provided by
Centre for Biological Diversity and Scottish Oceans Institute, School of Biology, University of St. Andrews, St. Andrews, UK
Authors
BioTIME Consortium
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.

This data consists of several elements:

BioTIMESQL_02_04_2018.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.

BioTIMEQuery_02_04_2018.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.

BioTIMEMetadata_02_04_2018.csv - file containing the meta data for all studies.

BioTIMECitations_02_04_2018.csv - file containing the citation list for all studies.

BioTIMECitations_02_04_2018.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).

BioTIMEInteractions_02_04_2018.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).

Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.

To cite the data paper use the following:

Dornelas M, Antão LH, Moyes F, Bates, AE, Magurran, AE, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Global Ecol Biogeogr. 2018; 27:760 - 786. https://doi.org/10.1111/geb.12729
Z
NII Face Mask Dataset
data.niaid.nih.gov
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trung-Nghia Le; Khanh-Duy Nguyen; Huy H. Nguyen; Junichi Yamagishi; Isao Echizen (2022). NII Face Mask Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5761724
Explore at:
Dataset updated
Jan 26, 2022
Dataset provided by
National Institute of Informatics, Japan
University of Information Technology-VNUHCM, Vietnam
Authors
Trung-Nghia Le; Khanh-Duy Nguyen; Huy H. Nguyen; Junichi Yamagishi; Isao Echizen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
=====================================================================

NII Face Mask Dataset v1.0

=====================================================================

Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)

Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam

National Institute of Informatics Copyright (c) 2021

Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn

Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725

=============================== INTRODUCTION ===============================

The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.

https://www.youtube.com/c/Rambalac

The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.

=============================== REFERENCES ===============================

If your publish using any of the data in this dataset please cite the following papers:

Pre-print version

@article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }

Final version

@INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}

======================== DATA STRUCTURE ==================================

1. Directory Structure

./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md

2. Description for each files in detail.

We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.

============================== COPYING ================================

This repository is made available under Creative Commons Attribution License (CC-BY).

Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/

THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE

====================== ACKNOWLEDGEMENTS ================================

This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.

This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac
Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories
catalog.data.gov
data.transportation.gov
+2more
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories [Dataset]. https://catalog.data.gov/dataset/third-generation-simulation-data-tgsim-i-294-l1-trajectories
Explore at:
Dataset updated
Sep 30, 2025
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Area covered
Interstate 294
Description
The main dataset is a 70 MB file of trajectory data (I294_L1_final.csv) that contains position, speed, and acceleration data for small and large automated (L1) vehicles and non-automated vehicles on a highway in a suburban environment. Supporting files include aerial reference images for ten distinct data collection “Runs” (I294_L1_RunX_with_lanes.png, where X equals 8, 18, and 20 for southbound runs and 1, 3, 7, 9, 11, 19, and 21 for northbound runs). Associated centerline files are also provided for each “Run” (I-294-L1-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I294 L1.csv” for more details). The dataset defines eight lanes (four lanes in each direction) using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The southbound lanes are shown visually in I294_L1_Lane-2.png through I294_L1_Lane-5.png and the northbound lanes are shown visually in I294_L1_Lane2.png through I294_L1_Lane5.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 1 ADAS-equipped vehicles with adaptive cruise control (ACC) enabled. The three vehicles manually entered the highway, moved to the second from left most lane, then enabled ACC with minimum following distance settings to initiate a string. The helicopter then followed the string of vehicles (which sometimes broke from the sting due to large following distances) northbound through the 4.8 km section of highway at an altitude of 300 meters. The goal of the data collection effort was to collect data related to human drivers' responses to vehicle strings. The road segment has four lanes in each direction and covers major on-ramp and an off-ramp in the southbound direction and one on-ramp in the northbound direction. The segment of highway is operated by Illinois Tollway and contains a high percentage of heavy vehicles. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a sunny day. As part of this dataset, the following files were provided: I294_L1_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the test vehicles with ACC engaged ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I294_L1_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound and southbound lanes) for each run X. I-294-L1-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane cent
Z
THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement...
data-staging.niaid.nih.gov
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schreiter, Tim; Rodrigues de Almeida, Tiago; Zhu, Yufei; Gutierrez Maestro, Eduardo; Rudenko, Andrey (2024). THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement and Interaction [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10407222
Explore at:
Dataset updated
Oct 28, 2024
Dataset provided by
Örebro University
Robert Bosch (Germany)
Authors
Schreiter, Tim; Rodrigues de Almeida, Tiago; Zhu, Yufei; Gutierrez Maestro, Eduardo; Rudenko, Andrey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The THÖR-MAGNI Dataset Tutorials

THÖR-MAGNI datasets is a novel dataset of accurate human and robot navigation and interaction in diverse indoor contexts, building on the previous THÖR dataset protocol. We provide position and head orientation motion capture data, 3D LiDAR scans and gaze tracking. In total, THÖR-MAGNI captures 3.5 hours of motion of 40 participants on 5 recording days.

This data collection is designed around systematic variation of factors in the environment to allow building cue-conditioned models of human motion and verifying hypotheses on factor impact. To that end, THÖR-MAGNI encompasses 5 scenarios, in which some of them have different conditions (i.e., we vary some factor):

Scenario 1 (plus conditions A and B):

Participants move in groups and individually;

Robot as static obstacle;

Environment with 3 obstacles and lane marking on the floor for condition B;

Scenario 2:

Participants move in groups, individually and transport objects with variable difficulty (i.e. bucket, boxes and a poster stand);

Robot as static obstacle;

Environment with 3 obstacles;

Scenario 3 (plus conditions A and B):

Participants move in groups, individually and transporting objects with variable difficulty (i.e. bucket, boxes and a poster stand). We denote each role as: Visitors-Alone, Visitors-Group 2, Visitors-Group 3, Carrier-Bucket, Carrier-Box, Carrier-Large Object;

Teleoperated robot as moving agent: in condition A, the robot moves with differential drive; in condition B, the robot moves with omni-directional drive;

Environment with 2 obstacles;

Scenario 4 (plus conditions A and B):

All participants, denoted as Visitors-Alone HRI interacted with the teleoperated mobile robot;

Robot interacted in two ways: in condition A (Verbal-Only), the Anthropomorphic Robot Mock Driver (ARMoD), a small humanoid NAO robot on top of the mobile platform, only used speech to communicate the next goal point to the participant; in condition B the ARMoD used speech, gestures and robotic gaze to convey the same message;

Free space environment

Scenario 5:

Participants move alone (Visitors-Alone) and one of the participants, denoted as Visitors-Alone HRI, transport objects and interact with the robot;

The ARMoD is remotely controlled by an experimenter and proactively offers help;

Free space environment;

Preliminary steps

Before proceeding, make sure to download the data from ZENODO

Directory Structure

├── CLiFF_Maps <- Directory for CLiFF Maps for all files

├── Files <- Directory for the csv files ├── Readme.md

├── CSVs_Scenarios <- Directory for aligned data for all scenarios

├── Scenario_1 <- Directory for the csv files for Scenario 1 ├── Scenario_2 <- Directory for the csv files for Scenario 2 ├── Scenario_3 <- Directory for the csv files for Scenario 3 ├── Scenario_4 <- Directory for the csv files for Scenario 4 ├── Scenario_5 <- Directory for the csv files for Scenario 5

├── docs

├── tutorials.md <- Tutorials document on how to use the data

├── Lidar_sample

├── Files <- Directory for sample files ├── 170522_SC3B_1 <- Directory for the pcd files ├── 170522_SC3B_1.csv <- Synchronization file with QTM ├── manual_view_point.json <- json file with manual view point for visualization ├── requirements.txt <- script pip requirements ├── visualize_pcd.py <- script visualize the lidar data ├── Readme.md

├── maps <- Directory for maps of the environment (PNG files) and offsets (json file)

├── offsets.json <- Offsets of the map with respect to the global coordinate frame origin ├── {date}_SC{sc_id}_map.png <- Maps for `date` in {1205, 1305, 1705, 1805} and `sc_id` in {1A, 1B, 2, 3} ├── 3009_map.png <- Map for the Scenarios 4A, 4B and 5

├── MP4_Videos

├── Files <- Directory for the mp4 files ├── pupil_scene_camera_instrinsics.json <- json file with the intrinsics of pupil camera

├── TSVs_RAWET <- Directory for the TSV files for the Raw Eyetracking data for all Scenarios

├── synch_info.csv <- Event markers necessary to align motion capture with eyetracking data ├── Files <- Directory with all the raw eyetracking TSV files

├── goals_positions.csv <- File with the goals locations

Data Structure and Dataset Files

Withing each Scenario directory, each csv file contains:

2.1. Headers

The dataset metadata overview contains important information found in the CSV file headers. This reference is designed to help users understand and use the dataset effectively. The headers include details such as FILE_ID, which provides information on the date, scenario, condition, and run associated with each recording. The header of the document includes important quantities such as the number of frames recorded (N_FRAMES_QTM), the count of rigid bodies (N_BODIES), and the total number of markers (N_MARKERS).

It also provides information about the order of the contiguous rotation matrix (CONTIGUOUS_ROTATION_MATRIX), modalities measured with units, and specified measurement units. The text presents details on the eyetracking devices used in each recording, including their infrared sensor and scene camera frequencies, as well as an indication of the presence of eyetracking data.

The header provides specific information about rigid bodies, including their names (BODY_NAMES), role labels (BODY_ROLES), and the number of markers associated with each rigid body (BODY_NR_MARKERS). Finally, the table lists all marker names used in the file.

This metadata provides researchers and practitioners with essential guidance on recording information, data quantities, and specifics about rigid bodies and markers. It is a valuable resource for understanding and effectively using the dataset in the CSV files.

2.2. Trajectory Data

The remaining portion of the CSV file integrates merged data from the motion capture system and eye tracking devices, organized based on participants' helmet rigid bodies. Columns within the dataset include XYZ coordinates of all markers, spatial centroid coordinates, 6DOF orientation of the object's local coordinate frame, and if available eye tracking data, encompassing 2D/3D gaze coordinates, scene recording frame numbers, eye movement types, and IMU data.

Missing data is denoted by "N/A" or an empty cell. Temporal indexing is facilitated by the "Time" or "Frame" column, indicating timestamps or frame numbers. The motion capture system records at 100Hz, Tobii Glasses at 50Hz (Raw); 25 Hz (Camera), and Pupil Glasses at 100Hz (Raw); 30 Hz (Camera). The dataset is structured around motion capture recordings, and for each rigid body, such as "Helmet_1," details per frame include XYZ coordinates of markers, centroid coordinates, and a 9-element rotational matrix describing helmet orientation.

Header Explanation

Helmet_1 - 1 X X-Coordinate of Marker Number 1

Helmet_1 - 1 Y Y-Coordinate of Marker Number 1

Helmet_1 - 1 Z Z-Coordinate of Marker Number 1

Helmet_1 - [...] Same for Marker 2 and 3 of Helmet_1

Helmet_1 Centroid_X X-Coordinate of the Centroid

Helmet_1 Centroid_Y Y-Coordinate of the Centroid

Helmet_1 Centroid_Z Z-Coordinate of the Centroid

Helmet_1 R0 1st Element of the CONTIGUOUS_ROTATION_MATRIX

Helmet_1 R[..] Same for R1- R7

Helmet_1 R8 9th Element of the CONTIGUOUS_ROTATION_MATRIX

2.3. Eyetracking Data

The eye tracking data in the dataset includes 16 participants, providing a comprehensive dataset of over 500 minutes of recorded data across the different activities and scenarios with three different eyetracking devices. Devices are denoted with a special "Tracker_ID" in the dataset, i.e.:

Tracker ID Eyetracking Device

TB2 Tobii 2 Glasses

TB3 Tobii 3 Glasses

PPL Pupil Insivisible Glasses

Gaze points are classified into fixations and saccades using the Tobii I-VT Attention filter, which is specifically optimized for dynamic scenarios with a velocity threshold of 100°. Eyetracking devices were systematically repeated after each 4-minute recording to account for natural variations in participants' eye shapes and to improve the gaze estimation algorithms. In addition, gaze estimation adjustments for the pupil invisible glasses were made after each 4-minute recording to mitigate potential drifts. It's worth noting that the scene cameras of the eye tracking glasses had different fields of view. The scene camera of the Pupil Invisible Glasses had a 1088x1080 image with both horizontal (HFOV) and vertical (VFOV) opening angles of 80°, while the Tobii Glasses provided a 1920x1080 image with different opening angles for Tobii Glasses 3 (HFOV: 95°, VFOV: 63°) and Tobii Glasses 2 (HFOV: 82°, VFOV: 52°).

NOTE AS OF 2024: Videos are NOW part of the dataset

For one participant, wearing the Tobii Glasses 3 and Helmet_6, the data would be denoted as:

Header Explanation

Helmet_6 - [...] X,Y,Z Coordinates for 5 markers

Helmet_6 [...] X,Y,Z Coordinates for 1 Centroid*

Helmet_6 R[...] 9 Elements of the CONTIGUOUS_ROTATION_MATRIX

Helmet_6 TB3_Accelerometer_[...]

Accelerometer data along the X,Y,Z Axis

Helmet_6 TB3_Gyroscope_[...] Gyroscope data along the X,Y,Z Axis

Helmet_6 TB3_Magnetometer_[...] Magnetometer data along the X,Y,Z Axis

Helmet_6 TB3_G2D_[...] 2D Eye tracking data (X,Y)

Helmet_6 TB3_G3D_[...] 3D Cyclopic Eye gaze Vector (X,Y,Z)

Helmet_6 TB3_Movement Eye movement type (N/A, Fixation or Saccade)

Helmet_6 TB3_SceneFNr Frame number of the scene camera recording

How to use and tools

magni-dash

This is a dashboard to quickly visualize our data: trajectories, speeds,
d
Data from: Large brains and groups associated with high rates of agonism in...
datadryad.org
data.niaid.nih.gov
+3more
zip
Updated Feb 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veronica B. Cowl; Susanne Shultz (2017). Large brains and groups associated with high rates of agonism in primates [Dataset]. http://doi.org/10.5061/dryad.2nd65
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2nd65
Dataset updated
Feb 27, 2017
Dataset provided by
Dryad
Authors
Veronica B. Cowl; Susanne Shultz
Time period covered
Feb 26, 2017
Description
Animals living in social groups will almost inevitably experience competition for limited resources. One consequence of competition can be agonism, an activity that is not only costly to participate in at the individual level but potentially also at the group level due the detrimental effects that agonism can have on group stability and cohesion. Agonism rates across primate species have previously been associated with group size and terrestriality; therefore primates, particularly those in large groups, should develop strategies to mitigate or counter-act agonism. Here, we use phylogenetically controlled analyses to evaluate whether the known relationship between brain size and group size may partially reflect an association between agonism and brain size in large groups. We find strong positive associations between group level agonism and 2 measures of brain size (endocranial volume and neocortex ratio) in 45 separate populations across 23 different primate species. In contrast, dyadi...
H
Data from: QUADICA - water quality, discharge and catchment attributes for...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Jan 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pia Ebeling; Rohini Kumar; Michael Weber; Andreas Musolff (2022). QUADICA - water quality, discharge and catchment attributes for large-sample studies in Germany [Dataset]. http://doi.org/10.4211/hs.26e8238f0be14fa1a49641cd8a455e29
Explore at:
zip(36.7 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.26e8238f0be14fa1a49641cd8a455e29
Dataset updated
Jan 5, 2022
Dataset provided by
HydroShare
Authors
Pia Ebeling; Rohini Kumar; Michael Weber; Andreas Musolff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1954 - Dec 31, 2015
Area covered
Germany
Description
This data set provides data of water quality, discharge, driving forces (meteorological and nitrogen surplus), and catchment attributes for a set of 1386 German catchments covering a wide range of natural and anthropogenic conditions. A corresponding paper "Water quality, discharge and catchment attributes for large-sample studies in Germany - QUADICA" by Ebeling et al. describing the data set in detail will be made available in the Journal Earth System Science Data (https://www.earth-system-science-data.net/).

This repository includes:

1.) Water quality data as annual medians of observed concentration data of N, P and C species (c_annual.csv) 2.) Water quantity data as annual medians of observed discharge (q_annual.csv) 3.) Monthly medians over the whole time series of water quality variables and discharge (c_months.csv) 4.) Monthly and annual median concentrations, flow-normalized concentrations, and mean fluxes estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability (for details see the corresponding paper Ebeling et al.; wrtds_monthly.csv, wrtds_annual.csv). 5.) Meteorological data as monthly median average temperatures and sums of precipitation and potential evapotranspiration (tavg_monthly.csv, pre_monthly.csv, pet_monthly.csv) 6.) N surplus time series on annual basis (n_surplus.csv) 7.) Summary statistics for the stations including number of samples, covered time periods, degree of censoring (concentrations below the detection limit), availability of discharge data, and availability and performance of WRTDS models (metadata.csv). 8.) Description of data tables (Metadata_QUADICA.pdf).

Data on catchment attributes and geodata also part of the QUADICA data set are available at "CCDB - catchment characteristics data base Germany" (https://doi.org/10.4211/hs.82f8094dd61e449a826afdef820a2c19). The metadata of the water quality and quantity data base is available at "WQQDB - water quality and quantity data base Germany" (https://doi.org/10.4211/hs.a42addcbd59a466a9aa56472dfef8721).

Conditions: The data set is freely and easily accessible. Please refer to the corresponding paper Ebeling et al. when using or referring to this data set.
Clash Royale Battles- Upper Ladder, December 2021
kaggle.com
zip
Updated Jan 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Xue (2022). Clash Royale Battles- Upper Ladder, December 2021 [Dataset]. https://www.kaggle.com/datasets/nonrice/clash-royale-battles-upper-ladder-december-2021/data
Explore at:
zip(11617500 bytes)Available download formats
Dataset updated
Jan 6, 2022
Authors
Eric Xue
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Clash Royale is a wildly popular game that revolves around fast 1v1 battles. The game is unique in that, along with skill, a player's "deck" (exactly 8 deployable troops the player picks) has also been widely thought to affect one's chances in the game. It is intriguing to be able to see how the endless world of deck configurations match up.

Content

The dataset has two files: - data_ord.csv: This is juicy part. A collection of 3/4ths of 1 million 1v1 battles recorded from the game. - Each row is 1 battle - The first 8 columns describe Player 1's deck, representing each card with a number - The next 8 columns describe Player 2's deck, representing each card with a number - The next 2 columns describe the respective trophy counts of Players 1 and 2 - The last column is the outcome of the battle; 1 if Player 1 wins, and 0 if Player 2 wins - IMPORTANT: Player decks HAVE NO ORDER (In data_ord.csv, they are sorted by ascending card ID, but they can be in any arrangement in real life), and card ID numbers are all categorical - cardlist.csv: You will notice that rather than the names of a card, they are instead described as integers in data_ord.csv. This file translates each card ID from data_ord.csv into its corresponding card name.

Acknowledgements

Data obtained from the Clash Royale API

Inspiration

I collected this data in hopes to see if the aforementioned "deck configuration" and "trophy counts" could, for the most part, fully predict the outcome of a game.
h
VIVID-10M
huggingface.co
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kling Team (2025). VIVID-10M [Dataset]. https://huggingface.co/datasets/KlingTeam/VIVID-10M
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Kling Team
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
VIVID-10M

[project page] | [Paper] | [arXiv] VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, comprising 9.7M samples that encompass a wide range of video editing tasks.

Data Index

The data index is located at four .csv files: vivid-image-change.csv vivid-image-remove.csv vivid-video-change.csv vivid-video-remove.csv

VIVID-Video splits contains the columns: local_caption, #… See the full description on the dataset page: https://huggingface.co/datasets/KlingTeam/VIVID-10M.
u
EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils,...
data.nceas.ucsb.edu
dataone.org
+2more
Updated Aug 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie C. Pennington; Silver Alford; Michael P. Back; Vanessa Bailey; Andy Baldwin; Jade Bolinger; Madison Bowe; Maxim I. Boyanov; Jacob A. Cianci-Gaskill; Nathan A. Conroy; Matthew J. Cooper; Donnie Day; Alex Demeo; Kyle Derby; Derek Detweiler; Suzanne Devres-Zimmerman; Erin Eberhard; Keryn Gedan; LeeAnn Haaf; Khadijah K. Homolka; Erin Johnson; Kenneth M. Kemner; Aliya Khan; Matthew Kirwan; Payton Kittaka; Erika Koontz; Adam Langley; Riley Leff; Scott Lerberg; Allison M. Lewis; Sairah Malkin; Amy M. Marcarelli; Steven E. McMurray; Tyler Messerschmidt; Taylor C. Michael; Holly A. Michael; Elizabeth C. Minor; Brian Moye; Thomas J. Mozdzer; Scott Neubauer; Cooper G. Norris; Edward J. O'Loughlin; Opal Otenburg; Andrea Pain; Kaizad F. Patel; Michael Philben; Evan Phillips; Dannielle Pratt; Peter Regier; Jesse Alan Roebuck Jr.; Lauren Sage; Daniel Sandborn; Stacy Smith; Alex Smith; Samina Soin-Voshell; Bongkeun Song; Amanda Sprague-Getsy; Kari St. Laurent; Lorie Staver; Alice Stearns; Lucie Stetten; Rebecca Swerida; Ethan J. Theuerkauf; Katherine Tully; Rodrigo Vargas; Nicholas D. Ward; Elizabeth Watson; Coreen Weilminster; Allison N. Myers-Pigg (2023). EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils, Sediments, and Water Across Coastal Gradients [Dataset]. http://doi.org/10.15485/1960313
Explore at:
Unique identifier
https://doi.org/10.15485/1960313
Dataset updated
Aug 14, 2023
Dataset provided by
ESS-DIVE
Authors
Stephanie C. Pennington; Silver Alford; Michael P. Back; Vanessa Bailey; Andy Baldwin; Jade Bolinger; Madison Bowe; Maxim I. Boyanov; Jacob A. Cianci-Gaskill; Nathan A. Conroy; Matthew J. Cooper; Donnie Day; Alex Demeo; Kyle Derby; Derek Detweiler; Suzanne Devres-Zimmerman; Erin Eberhard; Keryn Gedan; LeeAnn Haaf; Khadijah K. Homolka; Erin Johnson; Kenneth M. Kemner; Aliya Khan; Matthew Kirwan; Payton Kittaka; Erika Koontz; Adam Langley; Riley Leff; Scott Lerberg; Allison M. Lewis; Sairah Malkin; Amy M. Marcarelli; Steven E. McMurray; Tyler Messerschmidt; Taylor C. Michael; Holly A. Michael; Elizabeth C. Minor; Brian Moye; Thomas J. Mozdzer; Scott Neubauer; Cooper G. Norris; Edward J. O'Loughlin; Opal Otenburg; Andrea Pain; Kaizad F. Patel; Michael Philben; Evan Phillips; Dannielle Pratt; Peter Regier; Jesse Alan Roebuck Jr.; Lauren Sage; Daniel Sandborn; Stacy Smith; Alex Smith; Samina Soin-Voshell; Bongkeun Song; Amanda Sprague-Getsy; Kari St. Laurent; Lorie Staver; Alice Stearns; Lucie Stetten; Rebecca Swerida; Ethan J. Theuerkauf; Katherine Tully; Rodrigo Vargas; Nicholas D. Ward; Elizabeth Watson; Coreen Weilminster; Allison N. Myers-Pigg
Time period covered
Oct 1, 2021 - Dec 1, 2021
Area covered

Description
The EXploration of Coastal Hydrobiogeochemistry Across a Network of Gradients and Experiments (EXCHANGE) program is a consortium of scientists working together to improve our understanding of how the two-way exchange of water between estuaries or large lake lacustuaries and the terrestrial landscape influence the state and function of ecosystems across the coastal interface. EXCHANGE Campaign 1 (EC1) focuses on the spatial variation in biogeochemical structure and function at the coastal terrestrial-aquatic interface (TAI). In the Fall of 2021, the EXCHANGE Consortium gathered samples from 52 TAIs. Samples collected from EC1 were analyzed for bulk geochemical parameters, bulk physicochemical parameters, organic matter characteristics, and redox-sensitive elements. Please download ec1_README.pdf for a complete list of available data in each .zip folder, package version history, and detailed information about the project. This README will serve as the central place for EC1 Data Package updates. EC1 Data Package Structure: ec1_README.pdf ec1_methods.pdf ec1_metadata_v1.zip ...ec1_dd.csv ...ec1_flmd.csv ...ec1_sample_catalog.csv ...ec1_metadata_kitlevel.csv ...ec1_metadata_collectionlevel.csv ...ec1_data_collectionlevel.csv ...ec1_igsn_metadata.csv ec1_soil_v1.zip ec1_sediment_v1.zip ec1_water_v1.zip This data package was originally published May 2023 (v1). Subsequent updates will be published here with new version numbers. Please see the Change History section in ec1_README.pdf for detailed changes. --- Acknowledging EXCHANGE: General Support and Data Product Use We ask that users of EXCHANGE data add the following acknowledgement when publishing data in scholarly articles and data repositories: "This research is based on work supported by COMPASS-FME, a multi-institutional project supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research as part of the Environmental System Science Program."

Facebook

Twitter

Click to copy link

Link copied

Cite

Caleb Fahlgren (2025). big-csv-2 [Dataset]. https://huggingface.co/datasets/cfahlgren1/big-csv-2

big-csv-2

cfahlgren1/big-csv-2

Explore at:

Dataset updated

Nov 4, 2025

Authors

Caleb Fahlgren

Description

cfahlgren1/big-csv-2 dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

big-csv-2

League Of Legends Data

Basic Data Description

League of Legends

1. Data Set Description

1.1. Introduction to the Data Set

1.2. Brief Description of League of Legends

1.3. Description of the Map

1.4. Description of the Gameplay

City of Chicago Data prtal

Automated Paper Screening for Clinical Reviews

Description

Categories

Acknowledgements & Source

Please don't forget to upvote if you find this useful.

McBE

Natural Questions Dataset

Context

Data Collection

Data Pre-processing

Data Storage

Filtered Results

Flask CSV Reader App

UNI-CEN Standardized Census Data Table - Census Tract (CT) - 1961 - Wide...

BioTIME

UKBA: structure and salary, February 2011

DataSheet2_Social Presence Outside the Augmented Reality Field of View.CSV

Data and scripts associated with a manuscript on residence time distribution...

BioTIME

NII Face Mask Dataset

NII Face Mask Dataset v1.0

Pre-print version

Final version

1. Directory Structure

2. Description for each files in detail.

Third Generation Simulation Data (TGSIM) I-294 L1 Trajectories

THÖR-MAGNI: A Large-scale Indoor Motion Capture Recording of Human Movement...

Data from: Large brains and groups associated with high rates of agonism in...

Data from: QUADICA - water quality, discharge and catchment attributes for...

Clash Royale Battles- Upper Ladder, December 2021

Context

Content

Acknowledgements

Inspiration

VIVID-10M

EXCHANGE Campaign 1: A Community-Driven Baseline Characterization of Soils,...

big-csv-2See More Versions

cfahlgren1/big-csv-2

big-csv-2