Facebook
Twittercfahlgren1/big-csv-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterX matches of the most recent League of Legends games as of the date specified in the CSV file. Each game has 10 separate players and each player recorded has 68 recorded features.
Where X matches is based on how many I am capable of pulling at a given time. With a maximum of 10,000 matches.
This data set is on an online competitive game called League of Legends. I chose this data set to challenge myself; the dataset’s unique nature will require me to apply techniques I have learned in classes this semester, while teaching myself and applying new techniques. The objective of the data set is to find the features that will most impact the win, so that it is easier to balance the game. “Balancing” refers to updating the game, such as downgrading strategies that are too strong and upgrading strategies that feel too weak. This document will serve as the preliminary to champion balance by correlating specific stats to a win, so that in the future someone may correlate these stats to champions in the game. This goal may seem convoluted; however, each game will have 5 winners and 5 losers meaning that a single champion’s impact on the game is roughly 10%. If a champion is unbalanced, being double the strength of another champion, that only raises the 10% to 20%. Due to the law of averages, each champion will still have around 50% win rate despite being too weak or too strong. Therefore to properly balance a champion, one must look at the correlation between stats of the champion to the win-rate of each of those stats. This effectively takes out the problem of bias since every game has at least one winner and one loser. This dataset has 23,752 data points and 24 features (or columns). Features refer to a measurable piece of data, such as champion name, damage done, etc. These features do not refer to game features but rather the name of the data being measured. It is a complicated dataset, with several variables requiring several stages of feature modification to run the code. The code is also large enough to have significance. This dataset was specifically chosen due to my prior familiarity with the data, allowing me to focus on the machine learning techniques.
The online competitive multiplayer game, League of Legends is a part of the MOBA genre of games and is considered the most widely popular game of all time. Its most recent tournaments have made more money than the Superbowl; in 2020 League of Legends made 1.7 billion dollars. The premise of the game is two teams fighting to destroy the enemy Nexus. Below is the map of the game so it is easier to reference the variables given. Figure 1: Map of League of Legends ([1])
The map of League of Legends contains 3 paths, each a lane with a corresponding name. Top Lane, Mid Lane (short for Middle), and Bot Lane (short for Bottom Lane). Each lane spawns ”minions” to help push lanes. These minions are very easy monsters to defeat and provide gold if a player lands the killing blow. Each lane has 3 towers and an inhibitor. All three of one lane’s tower + inhibitor need to be destroyed before a player can reach the nexus. The towers provide protection to players by hurting enemy champions when they are in range. The inhibitor provides no uses to ally team; however, if a player destroys the enemy inhibitor then ”Super Minions” will spawn in that lane. These buffed minions help push to finish the game. There are also forests called the Jungle in the middle of these lanes. In the jungle there are several monsters’ worth gold and that grow stronger as the game goes on. Some monsters, if killed, can even provide special bonuses. All these monsters can be killed by one player. The blue section seen in 1 that splits the map into two sides is known as the river. This is the equivalent of the half-way line in soccer. In this part of the map, Large Monsters spawn that require a group effort to takedown but give huge bonuses.
The game is known for its complexity and if you want a comprehensive guide, I have provided a link that I think does an excellent job (([2]) and ([3]) provide great comprehensive guides). This paper will explain only the minimal necessities to follow the data. There are 5 players on each team and each player plays a champion. This champion is a character with unique gameplay, stats, and abilities. These 5 players will each fill a specific role: Top Lane goes to the Top Lane, Mid Lane goes to the Mid Lane, the Jungler goes to the Jungle, and the Attack Damage Carry (ADC) and Support go to the Bot Lane. In each of their respective locations, each role attempts to make gold and level up. The gold is used to buy items (each with unique effects, ...
Facebook
TwitterThis dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.
Data fields requiring description are detailed below.
APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.
LICENSE STATUS: 'AAI' means the license was issued.
Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.
Data Owner: Business Affairs and Consumer Protection
Time Period: Current
Frequency: Data is updated daily
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project is a tool that allows researchers to automate the screening of titles and abstracts for clinical review papers en bloc. Given a CSV file describing your dataset(s) and CSV files of the titles and abstracts of the papers you want to screen, the tool will automatically generate a CSV file of the papers that meet your criteria. The GPT API powers the tool.
Steps to reproduce To use the tool, you will need to provide the following files:
To run the tool, run the following command:
python3 screening.py
Then, type your dataset information CSV path in. The tool will output the results in a CSV file in the results directory.
For an example of how to use the tool, I'd like you to please take a look at the analysis.ipynb.
Artificial Intelligence, Medical Informatics, Natural Language Processing, Screening, Systematic Review
Eddie Guo
Institutions: University of Calgary, University of Toronto
Image Source:Large Language Models: Complete Guide in 2023
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
ATTENTION: There are two types of data format files here: CSV and XLSX. The CSV files are uploaded for easy browsing of the data on Hugging Face. For actual testing, please use the files in the XLSX folder.
Dataset Card for Dataset Name
Dataset Details
Dataset Description
McBE is designed to address the scarcity of Chinese-centric bias evaluation resources for large language models (LLMs). It supports multi-faceted bias assessment across 5 evaluation tasks… See the full description on the dataset page: https://huggingface.co/datasets/Velikaya/McBE.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.
The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.
The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.
These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.
The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.
The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.
The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'
Facebook
TwitterUNI-CEN Standardized Census Data Tables contain Census data that have been reformatted into a common table format with standardized variable names and codes. The data are provided in two tabular formats for different use cases. "Long" tables are suitable for use in statistical environments, while "wide" tables are commonly used in GIS environments. The long tables are provided in Stata Binary (dta) format, which is readable by all statistics software. The wide tables are provided in comma-separated values (csv) and dBase 3 (dbf) formats with codebooks. The wide tables are easily joined to the UNI-CEN Digital Boundary Files. For the csv files, a .csvt file is provided to ensure that column data formats are correctly formatted when importing into QGIS. A schema.ini file does the same when importing into ArcGIS environments. As the DBF file format supports a maximum of 250 columns, tables with a larger number of variables are divided into multiple DBF files. For more information about file sources, the methods used to create them, and how to use them, consult the documentation at https://borealisdata.ca/dataverse/unicen_docs. For more information about the project, visit https://observatory.uwo.ca/unicen.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.
This data consists of several elements:
Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.
Facebook
TwitterThese documents explain the UK Border Agency’s structure in details, as well as the salaries of its senior civil servants.
Date: Fri Oct 15 10:25:10 BST 2010
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Augmented reality headsets in use today have a large area in which the real world can be seen, but virtual content cannot be displayed. Users perceptions of content in this area is not well understood. This work studies participants perception of a virtual character in this area by grounding this question in relevant theories of perception and performing a study using both behavioral and self-report measures. We find that virtual characters within the augmented periphery receive lower social presence scores, but we do notfind a difference in task performance. These findings inform application design and encourage future work in theories of AR perception and perception of virtual humans.
Facebook
TwitterThis data package is associated with the publication “On the Transferability of Residence Time Distributions in Two 10-km Long River Sections with Similar Hydromorphic Units” submitted to the Journal of Hydrology (Bao et al. 2024). Quantifying hydrologic exchange fluxes (HEFs) at the stream-groundwater interface, along with their residence time distributions (RTDs) in the subsurface, is crucial for managing water quality and ecosystem health in dynamic river corridors. However, directly simulating high-spatial resolution HEFs and RTDs can be a time-consuming process, particularly for watershed-scale modeling. Efficient surrogate models that link RTDs to hydromorphic units (HUs) may serve as alternatives for simulating RTDs in large-scale models. One common concern with these surrogate models, however, is the transferability of the relationship between the RTDs and HUs from one river corridor to another. To address this, we evaluated the HEFs and the resulting RTD-HU relationships for two 10-kilometer-long river corridors along the Columbia River, using a one-way coupled three-dimensional transient surface-subsurface water transport modeling framework that we previously developed. Applying this framework to the two river corridors with similar HUs allows for quantitative comparisons of HEFs and RTDs using both statistical tests and machine learning classification models. This data package includes the model inputs files and the simulation results data. This data package contains 10 folders. The modeling simulation results data are in the folders 100H_pt_data and 300area_pt_data, for the study domain Hanford 100H and 300 area respectively. The remaining eight folders contain the scripts and data to generate the manuscript figures. The file-level metadata file (Bao_2024_Residence_Time_Distribution _flmd.csv) includes a list of all files contained in this data package and descriptions for each. The data dictionary file (Bao_2024_Residence_Time_Distribution _dd.csv) includes column header definitions and units of all tabular files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BioTIME database contains raw data on species identities and abundances in ecological assemblages through time. The database consists of 11 tables; one raw data table plus ten related meta data tables. For further information please see our associated data paper.
This data consists of several elements:
BioTIMESQL_02_04_2018.sql - an SQL file for the full public version of BioTIME which can be imported into any mySQL database.
BioTIMEQuery_02_04_2018.csv - data file, although too large to view in Excel, this can be read into several software applications such as R or various database packages.
BioTIMEMetadata_02_04_2018.csv - file containing the meta data for all studies.
BioTIMECitations_02_04_2018.csv - file containing the citation list for all studies.
BioTIMECitations_02_04_2018.xlsx - file containing the citation list for all studies (some special characters are not supported in the csv format).
BioTIMEInteractions_02_04_2018.Rmd - an r markdown page providing a brief overview of how to interact with the database and associated .csv files (this will not work until field paths and database connections have been added/updated).
Please note: any users of any of this material should cite the associated data paper in addition to the DOI listed here.
To cite the data paper use the following:
Dornelas M, Antão LH, Moyes F, Bates, AE, Magurran, AE, et al. BioTIME: A database of biodiversity time series for the Anthropocene. Global Ecol Biogeogr. 2018; 27:760 - 786. https://doi.org/10.1111/geb.12729
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
=====================================================================
=====================================================================
Authors: Trung-Nghia Le (1), Khanh-Duy Nguyen (2), Huy H. Nguyen (1), Junichi Yamagishi (1), Isao Echizen (1)
Affiliations: (1)National Institute of Informatics, Japan (2)University of Information Technology-VNUHCM, Vietnam
National Institute of Informatics Copyright (c) 2021
Emails: {ltnghia, nhhuy, jyamagis, iechizen}@nii.ac.jp, {khanhd}@uit.edu.vn
Arxiv: https://arxiv.org/abs/2111.12888 NII Face Mask Dataset v1.0: https://zenodo.org/record/5761725
=============================== INTRODUCTION ===============================
The NII Face Mask Dataset is the first large-scale dataset targeting mask-wearing ratio estimation in street cameras. This dataset contains 581,108 face annotations extracted from 18,088 video frames (1920x1080 pixels) in 17 street-view videos obtained from the Rambalac's YouTube channel.
The videos were taken in multiple places, at various times, before and during the COVID-19 pandemic. The total length of the videos is approximately 56 hours.
=============================== REFERENCES ===============================
If your publish using any of the data in this dataset please cite the following papers:
@article{Nguyen202112888, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, author={Nguyen, Khanh-Duy and Nguyen, Huy H and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, archivePrefix={arXiv}, arxivId={2111.12888}, url={https://arxiv.org/abs/2111.12888}, year={2021} }
@INPROCEEDINGS{Nguyen2021EstMaskWearing, author={Nguyen, Khanh-Duv and Nguyen, Huv H. and Le, Trung-Nghia and Yamagishi, Junichi and Echizen, Isao}, booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, title={Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio}, year={2021}, pages={1-8}, url={https://ieeexplore.ieee.org/document/9667046}, doi={10.1109/FG52635.2021.9667046}}
======================== DATA STRUCTURE ==================================
./NFM ├── dataset │ ├── train.csv: annotations for the train set. │ ├── test.csv: annotations for the test set. └── README_v1.0.md
We use the same structure for two CSV files (train.csv and test.csv). Both CSV files have the same columns: <1st column>: video_id (a source video can be found by following the link: https://www.youtube.com/watch?v=) <2nd column>: frame_id (the index of a frame extracted from the source video) <3rd column>: timestamp in milisecond (the timestamp of a frame extracted from the source video) <4th column>: label (for each annotated face, one of three labels was attached with a bounding box: 'Mask'/'No-Mask'/'Unknown') <5th column>: left <6th column>: top <7th column>: right <8th column>: bottom Four coordinates (left, top, right, bottom) were used to denote a face's bounding box.
============================== COPYING ================================
This repository is made available under Creative Commons Attribution License (CC-BY).
Regarding Creative Commons License: Attribution 4.0 International (CC BY 4.0), please see https://creativecommons.org/licenses/by/4.0/
THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE
====================== ACKNOWLEDGEMENTS ================================
This research was partly supported by JSPS KAKENHI Grants (JP16H06302, JP18H04120, JP21H04907, JP20K23355, JP21K18023), and JST CREST Grants (JPMJCR20D3, JPMJCR18A6), Japan.
This dataset is based on the Rambalac's YouTube channel: https://www.youtube.com/c/Rambalac
Facebook
TwitterThe main dataset is a 70 MB file of trajectory data (I294_L1_final.csv) that contains position, speed, and acceleration data for small and large automated (L1) vehicles and non-automated vehicles on a highway in a suburban environment. Supporting files include aerial reference images for ten distinct data collection “Runs” (I294_L1_RunX_with_lanes.png, where X equals 8, 18, and 20 for southbound runs and 1, 3, 7, 9, 11, 19, and 21 for northbound runs). Associated centerline files are also provided for each “Run” (I-294-L1-Run_X-geometry-with-ramps.csv). In each centerline file, x and y coordinates (in meters) marking each lane centerline are provided. The origin point of the reference image is located at the top left corner. Additionally, in each centerline file, an indicator variable is used for each lane to define the following types of road sections: 0=no ramp, 1=on-ramps, 2=off-ramps, and 3=weaving segments. The number attached to each column header is the numerical ID assigned for the specific lane (see “TGSIM – Centerline Data Dictionary – I294 L1.csv” for more details). The dataset defines eight lanes (four lanes in each direction) using these centerline files. Images that map the lanes of interest to the numerical lane IDs referenced in the trajectory dataset are stored in the folder titled “Annotation on Regions.zip”. The southbound lanes are shown visually in I294_L1_Lane-2.png through I294_L1_Lane-5.png and the northbound lanes are shown visually in I294_L1_Lane2.png through I294_L1_Lane5.png. This dataset was collected as part of the Third Generation Simulation Data (TGSIM): A Closer Look at the Impacts of Automated Driving Systems on Human Behavior project. During the project, six trajectory datasets capable of characterizing human-automated vehicle interactions under a diverse set of scenarios in highway and city environments were collected and processed. For more information, see the project report found here: https://rosap.ntl.bts.gov/view/dot/74647. This dataset, which is one of the six collected as part of the TGSIM project, contains data collected using one high-resolution 8K camera mounted on a helicopter that followed three SAE Level 1 ADAS-equipped vehicles with adaptive cruise control (ACC) enabled. The three vehicles manually entered the highway, moved to the second from left most lane, then enabled ACC with minimum following distance settings to initiate a string. The helicopter then followed the string of vehicles (which sometimes broke from the sting due to large following distances) northbound through the 4.8 km section of highway at an altitude of 300 meters. The goal of the data collection effort was to collect data related to human drivers' responses to vehicle strings. The road segment has four lanes in each direction and covers major on-ramp and an off-ramp in the southbound direction and one on-ramp in the northbound direction. The segment of highway is operated by Illinois Tollway and contains a high percentage of heavy vehicles. The camera captured footage during the evening rush hour (3:00 PM-5:00 PM CT) on a sunny day. As part of this dataset, the following files were provided: I294_L1_final.csv contains the numerical data to be used for analysis that includes vehicle level trajectory data at every 0.1 second. Vehicle size (small or large), width, length, and whether the vehicle was one of the test vehicles with ACC engaged ("yes" or "no") are provided with instantaneous location, speed, and acceleration data. All distance measurements (width, length, location) were converted from pixels to meters using the following conversion factor: 1 pixel = 0.3-meter conversion. I294_L1_RunX_with_lanes.png are the aerial reference images that define the geographic region and associated roadway segments of interest (see bounding boxes on northbound and southbound lanes) for each run X. I-294-L1-Run_X-geometry-with-ramps.csv contain the coordinates that define the lane cent
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The THÖR-MAGNI Dataset Tutorials
THÖR-MAGNI datasets is a novel dataset of accurate human and robot navigation and interaction in diverse indoor contexts, building on the previous THÖR dataset protocol. We provide position and head orientation motion capture data, 3D LiDAR scans and gaze tracking. In total, THÖR-MAGNI captures 3.5 hours of motion of 40 participants on 5 recording days.
This data collection is designed around systematic variation of factors in the environment to allow building cue-conditioned models of human motion and verifying hypotheses on factor impact. To that end, THÖR-MAGNI encompasses 5 scenarios, in which some of them have different conditions (i.e., we vary some factor):
Scenario 1 (plus conditions A and B):
Participants move in groups and individually;
Robot as static obstacle;
Environment with 3 obstacles and lane marking on the floor for condition B;
Scenario 2:
Participants move in groups, individually and transport objects with variable difficulty (i.e. bucket, boxes and a poster stand);
Robot as static obstacle;
Environment with 3 obstacles;
Scenario 3 (plus conditions A and B):
Participants move in groups, individually and transporting objects with variable difficulty (i.e. bucket, boxes and a poster stand). We denote each role as: Visitors-Alone, Visitors-Group 2, Visitors-Group 3, Carrier-Bucket, Carrier-Box, Carrier-Large Object;
Teleoperated robot as moving agent: in condition A, the robot moves with differential drive; in condition B, the robot moves with omni-directional drive;
Environment with 2 obstacles;
Scenario 4 (plus conditions A and B):
All participants, denoted as Visitors-Alone HRI interacted with the teleoperated mobile robot;
Robot interacted in two ways: in condition A (Verbal-Only), the Anthropomorphic Robot Mock Driver (ARMoD), a small humanoid NAO robot on top of the mobile platform, only used speech to communicate the next goal point to the participant; in condition B the ARMoD used speech, gestures and robotic gaze to convey the same message;
Free space environment
Scenario 5:
Participants move alone (Visitors-Alone) and one of the participants, denoted as Visitors-Alone HRI, transport objects and interact with the robot;
The ARMoD is remotely controlled by an experimenter and proactively offers help;
Free space environment;
Preliminary steps
Before proceeding, make sure to download the data from ZENODO
├── CLiFF_Maps <- Directory for CLiFF Maps for all files
├── Files <- Directory for the csv files
├── Readme.md
├── CSVs_Scenarios <- Directory for aligned data for all scenarios
├── Scenario_1 <- Directory for the csv files for Scenario 1
├── Scenario_2 <- Directory for the csv files for Scenario 2
├── Scenario_3 <- Directory for the csv files for Scenario 3
├── Scenario_4 <- Directory for the csv files for Scenario 4
├── Scenario_5 <- Directory for the csv files for Scenario 5
├── docs
├── tutorials.md <- Tutorials document on how to use the data
├── Lidar_sample
├── Files <- Directory for sample files
├── 170522_SC3B_1 <- Directory for the pcd files
├── 170522_SC3B_1.csv <- Synchronization file with QTM
├── manual_view_point.json <- json file with manual view point for visualization
├── requirements.txt <- script pip requirements
├── visualize_pcd.py <- script visualize the lidar data
├── Readme.md
├── maps <- Directory for maps of the environment (PNG files) and offsets (json file)
├── offsets.json <- Offsets of the map with respect to the global coordinate frame origin
├── {date}_SC{sc_id}_map.png <- Maps for `date` in {1205, 1305, 1705, 1805} and `sc_id` in {1A, 1B, 2, 3}
├── 3009_map.png <- Map for the Scenarios 4A, 4B and 5
├── MP4_Videos
├── Files <- Directory for the mp4 files
├── pupil_scene_camera_instrinsics.json <- json file with the intrinsics of pupil camera
├── TSVs_RAWET <- Directory for the TSV files for the Raw Eyetracking data for all Scenarios
├── synch_info.csv <- Event markers necessary to align motion capture with eyetracking data
├── Files <- Directory with all the raw eyetracking TSV files
├── goals_positions.csv <- File with the goals locations
Withing each Scenario directory, each csv file contains:
2.1. Headers
The dataset metadata overview contains important information found in the CSV file headers. This reference is designed to help users understand and use the dataset effectively. The headers include details such as FILE_ID, which provides information on the date, scenario, condition, and run associated with each recording. The header of the document includes important quantities such as the number of frames recorded (N_FRAMES_QTM), the count of rigid bodies (N_BODIES), and the total number of markers (N_MARKERS).
It also provides information about the order of the contiguous rotation matrix (CONTIGUOUS_ROTATION_MATRIX), modalities measured with units, and specified measurement units. The text presents details on the eyetracking devices used in each recording, including their infrared sensor and scene camera frequencies, as well as an indication of the presence of eyetracking data.
The header provides specific information about rigid bodies, including their names (BODY_NAMES), role labels (BODY_ROLES), and the number of markers associated with each rigid body (BODY_NR_MARKERS). Finally, the table lists all marker names used in the file.
This metadata provides researchers and practitioners with essential guidance on recording information, data quantities, and specifics about rigid bodies and markers. It is a valuable resource for understanding and effectively using the dataset in the CSV files.
2.2. Trajectory Data
The remaining portion of the CSV file integrates merged data from the motion capture system and eye tracking devices, organized based on participants' helmet rigid bodies. Columns within the dataset include XYZ coordinates of all markers, spatial centroid coordinates, 6DOF orientation of the object's local coordinate frame, and if available eye tracking data, encompassing 2D/3D gaze coordinates, scene recording frame numbers, eye movement types, and IMU data.
Missing data is denoted by "N/A" or an empty cell. Temporal indexing is facilitated by the "Time" or "Frame" column, indicating timestamps or frame numbers. The motion capture system records at 100Hz, Tobii Glasses at 50Hz (Raw); 25 Hz (Camera), and Pupil Glasses at 100Hz (Raw); 30 Hz (Camera). The dataset is structured around motion capture recordings, and for each rigid body, such as "Helmet_1," details per frame include XYZ coordinates of markers, centroid coordinates, and a 9-element rotational matrix describing helmet orientation.
Header Explanation
Helmet_1 - 1 X X-Coordinate of Marker Number 1
Helmet_1 - 1 Y Y-Coordinate of Marker Number 1
Helmet_1 - 1 Z Z-Coordinate of Marker Number 1
Helmet_1 - [...] Same for Marker 2 and 3 of Helmet_1
Helmet_1 Centroid_X X-Coordinate of the Centroid
Helmet_1 Centroid_Y Y-Coordinate of the Centroid
Helmet_1 Centroid_Z Z-Coordinate of the Centroid
Helmet_1 R0 1st Element of the CONTIGUOUS_ROTATION_MATRIX
Helmet_1 R[..] Same for R1- R7
Helmet_1 R8 9th Element of the CONTIGUOUS_ROTATION_MATRIX
2.3. Eyetracking Data
The eye tracking data in the dataset includes 16 participants, providing a comprehensive dataset of over 500 minutes of recorded data across the different activities and scenarios with three different eyetracking devices. Devices are denoted with a special "Tracker_ID" in the dataset, i.e.:
Tracker ID Eyetracking Device
TB2 Tobii 2 Glasses
TB3 Tobii 3 Glasses
PPL Pupil Insivisible Glasses
Gaze points are classified into fixations and saccades using the Tobii I-VT Attention filter, which is specifically optimized for dynamic scenarios with a velocity threshold of 100°. Eyetracking devices were systematically repeated after each 4-minute recording to account for natural variations in participants' eye shapes and to improve the gaze estimation algorithms. In addition, gaze estimation adjustments for the pupil invisible glasses were made after each 4-minute recording to mitigate potential drifts. It's worth noting that the scene cameras of the eye tracking glasses had different fields of view. The scene camera of the Pupil Invisible Glasses had a 1088x1080 image with both horizontal (HFOV) and vertical (VFOV) opening angles of 80°, while the Tobii Glasses provided a 1920x1080 image with different opening angles for Tobii Glasses 3 (HFOV: 95°, VFOV: 63°) and Tobii Glasses 2 (HFOV: 82°, VFOV: 52°).
NOTE AS OF 2024: Videos are NOW part of the dataset
For one participant, wearing the Tobii Glasses 3 and Helmet_6, the data would be denoted as:
Header Explanation
Helmet_6 - [...] X,Y,Z Coordinates for 5 markers
Helmet_6 [...] X,Y,Z Coordinates for 1 Centroid*
Helmet_6 R[...] 9 Elements of the CONTIGUOUS_ROTATION_MATRIX
Helmet_6 TB3_Accelerometer_[...]
Accelerometer data along the X,Y,Z Axis
Helmet_6 TB3_Gyroscope_[...] Gyroscope data along the X,Y,Z Axis
Helmet_6 TB3_Magnetometer_[...] Magnetometer data along the X,Y,Z Axis
Helmet_6 TB3_G2D_[...] 2D Eye tracking data (X,Y)
Helmet_6 TB3_G3D_[...] 3D Cyclopic Eye gaze Vector (X,Y,Z)
Helmet_6 TB3_Movement Eye movement type (N/A, Fixation or Saccade)
Helmet_6 TB3_SceneFNr Frame number of the scene camera recording
How to use and tools
magni-dash
This is a dashboard to quickly visualize our data: trajectories, speeds,
Facebook
TwitterAnimals living in social groups will almost inevitably experience competition for limited resources. One consequence of competition can be agonism, an activity that is not only costly to participate in at the individual level but potentially also at the group level due the detrimental effects that agonism can have on group stability and cohesion. Agonism rates across primate species have previously been associated with group size and terrestriality; therefore primates, particularly those in large groups, should develop strategies to mitigate or counter-act agonism. Here, we use phylogenetically controlled analyses to evaluate whether the known relationship between brain size and group size may partially reflect an association between agonism and brain size in large groups. We find strong positive associations between group level agonism and 2 measures of brain size (endocranial volume and neocortex ratio) in 45 separate populations across 23 different primate species. In contrast, dyadi...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides data of water quality, discharge, driving forces (meteorological and nitrogen surplus), and catchment attributes for a set of 1386 German catchments covering a wide range of natural and anthropogenic conditions. A corresponding paper "Water quality, discharge and catchment attributes for large-sample studies in Germany - QUADICA" by Ebeling et al. describing the data set in detail will be made available in the Journal Earth System Science Data (https://www.earth-system-science-data.net/).
This repository includes:
1.) Water quality data as annual medians of observed concentration data of N, P and C species (c_annual.csv) 2.) Water quantity data as annual medians of observed discharge (q_annual.csv) 3.) Monthly medians over the whole time series of water quality variables and discharge (c_months.csv) 4.) Monthly and annual median concentrations, flow-normalized concentrations, and mean fluxes estimated using Weighted Regressions on Time, Discharge, and Season (WRTDS) for stations with enough data availability (for details see the corresponding paper Ebeling et al.; wrtds_monthly.csv, wrtds_annual.csv). 5.) Meteorological data as monthly median average temperatures and sums of precipitation and potential evapotranspiration (tavg_monthly.csv, pre_monthly.csv, pet_monthly.csv) 6.) N surplus time series on annual basis (n_surplus.csv) 7.) Summary statistics for the stations including number of samples, covered time periods, degree of censoring (concentrations below the detection limit), availability of discharge data, and availability and performance of WRTDS models (metadata.csv). 8.) Description of data tables (Metadata_QUADICA.pdf).
Data on catchment attributes and geodata also part of the QUADICA data set are available at "CCDB - catchment characteristics data base Germany" (https://doi.org/10.4211/hs.82f8094dd61e449a826afdef820a2c19). The metadata of the water quality and quantity data base is available at "WQQDB - water quality and quantity data base Germany" (https://doi.org/10.4211/hs.a42addcbd59a466a9aa56472dfef8721).
Conditions: The data set is freely and easily accessible. Please refer to the corresponding paper Ebeling et al. when using or referring to this data set.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Clash Royale is a wildly popular game that revolves around fast 1v1 battles. The game is unique in that, along with skill, a player's "deck" (exactly 8 deployable troops the player picks) has also been widely thought to affect one's chances in the game. It is intriguing to be able to see how the endless world of deck configurations match up.
The dataset has two files:
- data_ord.csv: This is juicy part. A collection of 3/4ths of 1 million 1v1 battles recorded from the game.
- Each row is 1 battle
- The first 8 columns describe Player 1's deck, representing each card with a number
- The next 8 columns describe Player 2's deck, representing each card with a number
- The next 2 columns describe the respective trophy counts of Players 1 and 2
- The last column is the outcome of the battle; 1 if Player 1 wins, and 0 if Player 2 wins
- IMPORTANT: Player decks HAVE NO ORDER (In data_ord.csv, they are sorted by ascending card ID, but they can be in any arrangement in real life), and card ID numbers are all categorical
- cardlist.csv: You will notice that rather than the names of a card, they are instead described as integers in data_ord.csv. This file translates each card ID from data_ord.csv into its corresponding card name.
Data obtained from the Clash Royale API
I collected this data in hopes to see if the aforementioned "deck configuration" and "trophy counts" could, for the most part, fully predict the outcome of a game.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
VIVID-10M
[project page] | [Paper] | [arXiv] VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, comprising 9.7M samples that encompass a wide range of video editing tasks.
Data Index
The data index is located at four .csv files: vivid-image-change.csv vivid-image-remove.csv vivid-video-change.csv vivid-video-remove.csv
VIVID-Video splits contains the columns: local_caption, #… See the full description on the dataset page: https://huggingface.co/datasets/KlingTeam/VIVID-10M.
Facebook
TwitterThe EXploration of Coastal Hydrobiogeochemistry Across a Network of Gradients and Experiments (EXCHANGE) program is a consortium of scientists working together to improve our understanding of how the two-way exchange of water between estuaries or large lake lacustuaries and the terrestrial landscape influence the state and function of ecosystems across the coastal interface. EXCHANGE Campaign 1 (EC1) focuses on the spatial variation in biogeochemical structure and function at the coastal terrestrial-aquatic interface (TAI). In the Fall of 2021, the EXCHANGE Consortium gathered samples from 52 TAIs. Samples collected from EC1 were analyzed for bulk geochemical parameters, bulk physicochemical parameters, organic matter characteristics, and redox-sensitive elements. Please download ec1_README.pdf for a complete list of available data in each .zip folder, package version history, and detailed information about the project. This README will serve as the central place for EC1 Data Package updates. EC1 Data Package Structure: ec1_README.pdf ec1_methods.pdf ec1_metadata_v1.zip ...ec1_dd.csv ...ec1_flmd.csv ...ec1_sample_catalog.csv ...ec1_metadata_kitlevel.csv ...ec1_metadata_collectionlevel.csv ...ec1_data_collectionlevel.csv ...ec1_igsn_metadata.csv ec1_soil_v1.zip ec1_sediment_v1.zip ec1_water_v1.zip This data package was originally published May 2023 (v1). Subsequent updates will be published here with new version numbers. Please see the Change History section in ec1_README.pdf for detailed changes. --- Acknowledging EXCHANGE: General Support and Data Product Use We ask that users of EXCHANGE data add the following acknowledgement when publishing data in scholarly articles and data repositories: "This research is based on work supported by COMPASS-FME, a multi-institutional project supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research as part of the Environmental System Science Program."
Facebook
Twittercfahlgren1/big-csv-2 dataset hosted on Hugging Face and contributed by the HF Datasets community