Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset contains more than 50000 records of Sales and order data related to an online store.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains inventory data for a pharmacy e-commerce website in JSON format, designed for easy integration into MongoDB databases, making it ideal for MERN stack projects. It includes 10 fields:
This dataset is useful for developing pharmacy-related web applications, inventory management systems, or online medical stores using the MERN stack.
Do not use for production-level purposes; use for project development only. Feel free to contribute if you find any mistakes or have suggestions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.
alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.
docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.
api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.
identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).
data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:
participant identification: participant's unique identifier (ID);
socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);
detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID);
user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.
requirements.r: An R script to install the required libraries for the analysis script.
normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.
normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.
Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
Group N (no hints): http://localhost:3000/0CAN
Group L (error locations): http://localhost:3000/CA0L
Group E (counter-example): http://localhost:3000/350E
Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files,
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This JSON file contains a collection of conversational AI intents designed to motivate and interact with users. The intents cover various topics, including greetings, weather inquiries, hobbies, music, movies, farewells, informal and formal questions, math operations and formulas, prime numbers, geometry concepts, math puzzles, and even a Shakespearean poem.
The additional intents related to consolidating people and motivating them have been included to provide users with uplifting and encouraging responses. These intents aim to offer support during challenging times, foster teamwork, and provide words of motivation and inspiration to users seeking guidance and encouragement.
The JSON structure is organized into individual intent objects, each containing a tag to identify the intent, a set of patterns representing user inputs, and corresponding responses provided by the AI model. This dataset can be used to train a conversational AI system to engage in positive interactions with users and offer motivational messages.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A JSON file used as an example to illustrate queries and to benchmark some tool.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation.
The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data.
The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.
The data file with temperatures (smhi-july-23-29-2018.csv) acts as input for the thermodynamic building simulation found on Github, where it is used to get the outside temperature and corresponding timestamps. Temperature data for Luleå Summer 2018 were downloaded from SMHI.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example Microscopy Metadata JSON files produced using the Micro-Meta App documenting an example raw-image file acquired using the custom-built TIRF Epifluorescence Structured Illumination Microscope.
For this use case, which is presented in Figure 5 of Rigano et al., 2021, Micro-Meta App was utilized to document:
1) The Hardware Specifications of the custom build TIRF Epifluorescence Structured light Microscope (TESM; Navaroli et al., 2010) developed, built on the basis of the based on Olympus IX71 microscope stand, and owned by the Biomedical Imaging Group (http://big.umassmed.edu/) at the Program in Molecular Medicine of the University of Massachusetts Medical School. Because TESM was custom-built the most appropriate documentation level is Tier 3 (Manufacturing/Technical Development/Full Documentation) as specified by the 4DN-BINA-OME Microscopy Metadata model (Hammer et al., 2021).
The TESM Hardware Specifications are stored in: Rigano et al._Figure 5_UseCase_Biomedical Imaging Group_TESM.JSON
2) The Image Acquisition Settings that were applied to the TESM microscope for the acquisition of an example image (FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif) obtained by Nicholas Vecchietti and Caterina Strambio-De-Castillia. For this image, TZM-bl human cells were infected with HIV-1 retroviral three-part vector (FSWT+PAX2+pMD2.G). Six hours post-infection cells were fixed for 10 min with 1% formaldehyde in PBS, and permeabilized. Cells were stained with mouse anti-p24 primary antibody followed by DyLight488-anti-Mouse secondary antibody, to detect HIV-1 viral Capsid. In addition, cells were counterstained using rabbit anti-Lamin B1 primary antibody followed by DyLight649-anti-Rabbit secondary antibody, to visualize the nuclear envelope and with DAPI to visualize the nuclear chromosomal DNA.
The Image Acquisition Settings used to acquire the FSWT-6hVirus-10minFIX-stk_4-EPI.tif.ome.tif image are stored in: Rigano et al._Figure 5_UseCase_AS_fswt-6hvirus-10minfix-stk_4-epi.tif.JSON
Instructional video tutorials on how to use these example data files:
Use these videos to get started with using Micro-Meta App after downloading the example data files available here.
Facebook
Twitterdata file description
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
Facebook
TwitterThis dataset contains a collection of JSON files used to configure map catalogs in TerriaJS, an interactive geospatial data visualization platform. The files include detailed configurations for services such as WMS, WFS, and other geospatial resources, enabling the integration and visualization of diverse datasets in a user-friendly web interface. This resource is ideal for developers, researchers, and professionals who wish to customize or implement interactive map catalogs in their own applications using TerriaJS.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This entry contains the SLO-VLM-IT-Dataset, a comprehensive dataset designed for instruction-tuning vision-language models in the Slovenian language. It is composed of five main .json files, which together provide a rich and diverse set of examples for training and fine-tuning models to understand and process both visual and textual information in Slovenian.
llava_v1_5_mix665k_translated_gemini_1_5_pro_all.json This file contains a machine-translated version of the popular Llava_v1_5_mix665k dataset. The translation from English to Slovenian was performed using the proprietary Gemini 1.5 Pro model.
wiki_14_march_2024_latest.json This file consists of conversational examples generated from Slovenian Wikipedia articles. The proprietary Gemini 1.5 Pro model was utilized for the data curation process, transforming the articles into an instruction-tuning format.
rtv.json This file consists of conversational examples generated on the basis of images from the news portal https://www.rtvslo.si. The proprietary Gemini 1.5 Pro model was utilized for the data generation.
siol.json This file consists of conversational examples generated on the basis of images from the news portal https://siol.net. The proprietary Gemini 1.5 Pro model was utilized for the data generation.
24ur.json This file consists of conversational examples generated on the basis of images from the news portal https://www.24ur.com. The proprietary Gemini 1.5 Pro model was utilized for the data generation.
The combined dataset includes a total of 1,128,228 examples, categorized as follows:
21,838 textvqa examples: Instructions for vision question answering based on specific Optical Character Recognition (OCR) tokens.
349,369 coco examples: A mix of instructions corresponding to 118,000 images from the COCO 2017 Object Detection Dataset. These include tasks such as generating long image descriptions, providing single-word answers, and answering multiple-choice questions.
81,309 vg examples: Instructions to either provide bounding box coordinates for a specified region in an image or describe a region defined by given coordinates.
66,227 gqa examples: Instructions requiring a one-word or one-phrase response to a question about the corresponding image.
78,976 ocr_vqa examples: Instructions focused on performing OCR to extract text from an image.
139,433 wiki examples: Instruction-tuning examples generated from Slovenian Wikipedia articles. The original Wikipedia articles were obtained from a Wikipedia database dump from March 14th 2025.
100,000 rtv examples: Instruction-tuning examples generated on the basis of images from the news portal https://www.rtvslo.si. Image scraping was completed on February 7th 2025.
100,000 siol examples: Instruction-tuning examples generated on the basis of images from the news portal https://siol.net. Image scraping was completed on March 22nd 2025.
100,000 24ur examples: Instruction-tuning examples generated on the basis of images from the news portal https://www.24ur.com. Image scraping was completed on February 7th 2025.
Accessing the Corresponding Images
News portal Images The images corresponding to the 'rtv', 'siol' and '24ur' examples need to be downloaded from the appropriate news portal. Each example in the json file contains an 'image' key with a URL of the corresponding image.
Wiki Images The images corresponding to the 'wiki' examples are available for download at the following link: https://kt-cloud.ijs.si/index.php/s/nbLmWkaJEXHMMwe
Llava_v1_5_mix665k Images To facilitate the download of images for the translated Llava_v1_5_mix665k dataset, we provide the necessary Python script get_llava_images.py and its dependency overwatch.py.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by samsatp
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.
This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.
It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Related dataset
Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.
Data preprocessing
The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:
PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.
This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.
Folder structure
For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.
The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.
Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4
Environments description
The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.
Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania
Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.
Known dataset shortcomings
Due to technical and physical limitations, the dataset contains some identified deficiencies.
PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.
Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.
The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.
Location 1 - Piazza del Duomo - Chierici
The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.
Location 2 - Via Etnea - Piazza del Duomo
The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.
Location 3 - Via Etnea - Piazza Università
Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.
Location 4 - Piazza Università
This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.
Recognitions
The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
Facebook
TwitterThe table DataCite Public Data File 2023 is part of the dataset DataCite Public Data, available at https://redivis.com/datasets/7wec-6vgw8qaaq. It contains 52863283 rows across 36 variables.
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
[Sample Dataset] April 2024 Public Data File from Crossref. This dataset includes 100 random JSON records from the Crossref metadata corpus.
Facebook
TwitterThermoML is an XML-based IUPAC standard for the storage and exchange of experimental thermophysical and thermochemical property data. The ThermoML archive is a subset of Thermodynamics Research Center (TRC) data holdings corresponding to cooperation between NIST TRC and five journals: Journal of Chemical Engineering and Data (ISSN: 1520-5134), The Journal of Chemical Thermodynamics (ISSN: 1096-3626), Fluid Phase Equilibria (ISSN: 0378-3812), Thermochimica Acta (ISSN: 0040-6031), and International Journal of Thermophysics (ISSN: 1572-9567). Data from initial cooperation (around 2003) through the 2019 calendar year are included. The original scope of the archive has been expanded to include JSON files. The JSON files are structured according to the ThermoML.xsd (available below) and rendered from the same experimental thermophysical and thermochemical property data reported in the corresponding articles as the ThermoML files. In fact, the ThermoML files are generated from the JSON files to keep the information in sync. The JSON files may contain additional information not supported by the ThermoML schema. For example, each JSON file contains the md5 checksum on the ThermoML file (THERMOML_MD5_CHECKSUM) that may be used to validate the ThermoML download. This data.nist.gov resource provides a .tgz file download containing the JSON and ThermoML files for each version of the archive. Data from initial cooperation (around 2003) through the 2019 calendar year are provided below (ThermoML.v2020-09.30.tgz). The date of the extraction from TRC databases, as specified in the dateCit field of the xml files, are 2020-09-29 and 2020-09-30. The .tgz file contains a directory tree that maps to the DOI prefix/suffix of the entries; e.g. unzipping the .tgz file creates a directory for each of the prefixes ( 10.1007, 10.1016, and 10.1021) that contains all the .json and .xml files. The data and other information throughout this digital resource (including the website, API, JSON, and ThermoML files) have been carefully extracted from the original articles by NIST/TRC personnel. Neither the Journal publisher, nor its editors, nor NIST/TRC warrant or represent, expressly or implied, the correctness or accuracy of the content of information contained throughout this digital resource, nor its fitness for any use or for any purpose, nor can they, or will they, accept any liability or responsibility whatever for the consequences of its use or misuse by anyone. In any individual case of application, the respective user must check the correctness by consulting other relevant sources of information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electronic Structure, Topology and Spin-Polarization in Gulf-edged Zigzag Graphene Nanoribbons
This repository collects the necessary calculation files to reproduce the results shown in our manuscript (arXiv). It includes the following parts:* A) Structure files* B) TB calculations* C) DFT calculations* D) GW calculations* E) Parametrization of TB with Hubbard-U (TB+U)* F) TB+U calculations* G) ZGNR systems * H) Calculations for different $U$ values
(A) Structure files
ZGNR-G structures, created with a C-C (C-H) bond length of 1.4 Ang (1.1 Ang) and bond angles of 120°. Structures are given with and without saturation of dangling bonds by hydrogen atoms. The unit cells are rectangular. The GNR is periodic in the x-direction, and a vacuum gap of 20 Ang between the carbon atoms is added in the y- and z-direction. We did not perform geometry optimization. Files are given in XYZ, XSF, and CIF format. The structural parameters are varied in the following range:* $N$=4...11* $a$=3...10* $M$=2...9 (depending on $a$)* $b$=0...a/2 (depending on $a$ and whether $N$ is odd or even)* S and L inversion center
The files are named in the following way: * Carbon only (used in TB calculations): $N$-ZGNR-G$M$_$a$_$b$_.* Saturated systems (used in DFT calculations): $N$-ZGNR-G$M$_$a$_$b$_saturated.
(B) TB calculations
Minimal calculation files for the complete set of structures:* Structure file in CIF format* PythTB input file (onsite energy $\alpha$=0, 1st-NN hopping element $t_1$=-1)* Calculation results in JSON format
The data files contain the calculated band gaps and Z2 topological invariants, sorted into tables by structural parameters. A table is given for each combination of $N$, $M$, and inversion center. Each table varies the parameter $a$ in the rows (value of $a$ given first in each line) and the parameter $b$ in the columns (values not explicitly given, varied from 0 (0.5) to $a$/2 for even (odd) $N$). The band gaps are given in units of the 1st-NN hopping element $t_1$. The Z2 topological invariant is calculated using the Zak phase. A value is given for metallic systems even though the equations are not applicable to these systems. Additionally, the results are given as a simple list.
(C) DFT calculations
Calculation files for the subset of systems studied on the DFT/HSE06 level with a tight tier 1 basis and a k-grid of 18x3x3 using FHI-aims. ZGNR-G systems for this subset are selected to have a maximum of 100 carbon atoms in the primitive unit cell. Calculations are performed both without and with spin polarization. Spin-polarized systems are run with both an antiferromagnetic (AFM) and ferromagnetic (FM) initial guess (by placing an initial spin moment on the zigzag edge atoms), resulting in an AFM or FM magnetic state, respectively.
For each calculation, the following files are stored:* geometry.in: input geometry* control.in: input file for FHI-aims* aims.out: output file for FHI-aims * band1001.out: band structure file for the first spin channel* band2001.out: band structure file for the second spin channel (only for spin-polarized calculations)* cube_001_spin_density.cube: converged spin density in CUBE format (compressed in ZIP format to save storage space after extracting calculation files)* spin-polarization.png: plot of spin moments for carbon atoms (only for spin-polarized calculations)* gap.dat: band gap, extracted from the band structure file* spin_max.dat: maximum absolute spin moment, extracted from the Mulliken projection resultsAdditionally, for each structure, a plot comparing the band structure without spin polarization, the band structure of the AFM state, and the band structure of the FM state are stored. The DAT files are not stored for the FM state as those are never the magnetic ground state and thus were not further analyzed.
In addition to the calculation files, the main results are collected in DAT files: the total energy (without spin polarization, AFM state, FM state), the band gap (without spin polarization and AFM state), and the maximum spin moment (only for the AFM state).
(D) GW calculations
Calculation files for the GW calculations, performed for 4-ZGNR, 5-ZGNR, and 6-ZGNR. They are calculated at the GW@PBE level and compared against calculations on the DFT/PBE and DFT/HSE06 level. Calculations are performed using FHI-aims with a tier 1 or tier 2 basis set and varying k-grids as visible from the file names. For each calculation, the input and output files are stored. They are sorted into subdirectories by their properties in the following order:* Studied system,* Method (DFT or GW),* Functional, and* Basis set and k-grid.
(E) Parametrization of TB with Hubbard-U (TB+U)
The parametrization of TB+U was done in two steps: (1) parametrization of the 1st NN hopping element $t_1$ and (2) the subsequent parametrization of the Hubbard-U, using the previously parametrized $t_1$.
(1) Parametrization of $t_1$
The parametrization of $t_1$ was done using the ZGNR-G systems available for DFT calculations. The TB and DFT calculations without spin polarization were used from steps B and C. Systems were excluded from the data set if the position of the DFT band gap was not reproduced in TB, leaving 372 ZGNR-G systems in the data set. The parametrization itself was done by linear regression of the DFT band gap in eV as a function of the TB band gap in units of $t_1$, resulting in y=3.328x-0.072 (R^2=0.951), giving $t_1$=3.328 eV. The TB and DFT band gaps are stored in the file "step1_parametrize_t1.dat"; the calculation files are taken directly from steps B and C.
(2) Parameterization of $U$
The parameterization of the Hubbard-U was done by first running TB+U calculations with different $U$ values. For this purpose, we varied $U$ from 0 to 5 in intervals of 0.2, using $t_1$=-1 to keep this step independent of the parametrization of $t_1$. The resulting band gaps are stored in DAT files in the subdirectory "calc_step2_variation_U" with a single file per ZGNR-G system. To save storage space, we did not upload further calculation files - the input files are equivalent to those uploaded in step F, just with different values for $t_1$ and $U$.
Afterward, we used these results to obtain the optimal $U$ value for each system, focusing on systems that show a band gap opening in the AFM state on the DFT/HSE06 level. We performed the parametrization by identifying which value of $U$ in each system gives the best agreement of the TB+U band gap with the DFT band gap in the AFM state, using the calculations from step C and interpolating linearly between the $U$ values of the scan described above. We then ran a TB+U calculation with the obtained $U$ value to check the agreement with the DFT calculations. We generally obtained good agreement with a few exceptions that were filtered out: systems that resulted in a $U$ of zero and those without a band gap opening in the AFM state of the TB+U calculation. The results of the remaining 414 ZGNR-G systems are summarized in "step2_parametrize_U.dat". The final value of $U$ was obtained by averaging over those systems, yielding an average of 1.720 $t_1$, equivalent to 5.723 eV. The calculation files used for parametrization, including those filtered out, are stored in "calc_step2_TB+U_calculations". Plots comparing the band structures on DFT/HSE06 level, TB, and TB+U are in "plots_step2_fit_agreement".
(F) TB+U calculations
Calculation files for the complete set of structures:* Structure file in XYZ format* PythTB input file (onsite energy $\alpha$=0, 1st-NN hopping element $t_1$=-1, $U$=1.72 $t_1$)* Calculation results in JSON format* Plot of the band structure from TB vs. TB+U (AFM state)* Plot of spin moments as an overlay over the atomic structure
The data files contain the band gaps on TB and TB+U level ("results_band_gaps.dat"), the position of VBM and CBM on TB and TB+U level ("results_band_edge_positions.dat"), and spin momentum quantities ("results_spin_moments.dat"). Please note that, compared to the JSON files, a factor of 2 is applied to obtain the spin moment; this corrects the PythTB calculations, which multiply the final spin-polarization by a factor of 1/2 to account for electrons being particles with spin 1/2.
(G) ZGNR systems
ZGNR systems without gulf edges are included in the data set as a reference system. Structures are included in the subdirectory "structures" with widths $N$ from 2 to 50, analogously to part A. For all ZGNRs, DFT and TB+U calculations were performed. The provided files are equivalent to parts C and F. Additionally, for the DFT calculations with spin polarization, files for the maximum, minimum, and average (over the carbon atoms) spin moments are provided, distinguished by results from Mulliken and Hirshfeld analysis. The plots of the spin moments are also given for both the Mulliken and Hirshfeld analysis results.
The data files contain the band gaps of the AFM state on DFT and TB+U level ("results_band_gaps_AFM_state.dat"), the total energy of the DFT calculations without and with spin-polarization in the AFM and FM state ("results_total_energies_DFT.dat"), as well as the maximum, minimum, and average (averaged over the C atoms) spin moment of the TB+U and DFT calculations, distinguished by Mulliken and Hirshfeld analysis ("results_spin-moments_maximum.dat," "results_spin-moments_minimum.dat," "results_spin-moments_average.dat"). Please note that, compared to the JSON files, a factor of 2 is applied to obtain the spin moment for the TB+U calculations.
(H) Calculations for different $U$ values
TB+U calculations similar to part F were performed for ZGNR and ZGNR-G systems. The main difference is that different values of $U$ were used: 1.20, 1.50, 1.72, and 2.00 in units of $t_1$. Please note that, compared to the JSON files, a factor of 2 is applied to obtain the spin moment for the TB+U calculations.
Facebook
TwitterThe train data for the DFDC competition is big, almost 500Gb, so I hope it can be useful to have all the json files and the metadata in one dataframe.
The dataset includes, for each video file
Simple analysis of the dataset can be found at: https://www.kaggle.com/zaharch/looking-at-the-full-train-set-metadata
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
These are 7 electrocardiograms (EKGs or ECGs) from 7 patients that are roughly 14-22 hours each. These were recorded as part of a joint effort between MIT and Beth Israel Hospital in Boston, MA, and are one of dozens of datasets with electrocardiogram data.
These EKGs are CSVs of voltage data from real hearts in real people with varying states of health.
EKGs, or electrocardiograms, measure the heart's function by looking at its electrical activity. The electrical activity in each part of the heart is supposed to happen in a particular order and intensity, creating that classic "heartbeat" line (or "QRS complex") you see on monitors in medical TV shows. Every part of this line is psupposed to be a specific height, width, and distance from each other](https://www.youtube.com/watch?v=CNN30YHsJw0) in a theoretically "healthy" heartbeat.
There are a few types of EKGs (4-lead, 5-lead, 12-lead, etc.), which give us varying detail about the heart. A 12-lead is one of the most detailed types of EKGs, as it allows us to get 12 different outputs or graphs, all looking at different, specific parts of the heart muscles. If you were to take two leads of the EKG (two physical wires) and draw an imaginary line in between them going through the patient's chest, whichever part of the heart muscle that this line goes through is the part of the heart that the lead is "reading" voltage from.
This dataset only publishes two leads from each patient's 12-lead EKG, since that is all that the original MIT-BIH database provided.
Each patient has 6 files:
12345_ekg.csv - The 14- to 22-hour electrocardiogram as two channels of voltage measurements (millivolts) for one patient, with the locations of annotations as an additional column12345_ekg.json - The 14- to 22-hour electrocardiogram, plus metadata, like sample rate, patient age, patient gender, etc.12345_annotations.csv - The locations of miscellaneous annotations made by doctors or EKG technicians. See annotation_symbols.csv for the annotations' meanings.12345_annotations.json - The same data as 12345_annotations.csv in addition to metadataTo get started, you will probably want the *_ekg.csv files. Generally, the .csv files have just the voltage data and the locations of annotations made by doctors/technicians. The .json files have all of that data in addition to metadata (such as sample rate, ADC gain, patient age, and more).
The data was collected at 128 Hz (or 128 samples per second). This means that if you get the first 128 elements from the EKG array, you have 1 second of heartbeat data.
A "QRS complex" is the big spike in the classic heartbeat blip that you may see on your smartwatch or in a hospital show on TV.
In this dataset, doctors and EKG technicians have labeled the locations of the complexes, and by extension the location of each heartbeat. This can help you not only identify Q, R, and S waves right away, but also help feed these heartbeats into hand-written or machine learning algorithms to start identifying and classifying heartbeats--though this only one of many datasets you might want to train an algorithm on, since there are hundreds of types of arrhythmias](https://litfl.com/ecg-library/diagnosis/) (or "bad" heart rhythms).
Check out Ninja Nerd's EKG Basics tutorial on YouTube to understand what each part of the QRS complex (or heartbeat) means from an electrical standpoint.
Typically, electrocardiogram datasets will specify which channels from the 12-lead EKG that the data came from. For example, the EKG for patient 100 from our other MIT-BIH Arrhythmia Database dataset came with two channels: Lead II and V5. Other EKGs in the many MIT-BIH EKG datasets may have channels Lead I and V4, or Lead II and V2, and so on.
For some reason, the channels in this dataset were not labeled with the actual 12-lead EKG ch...
Facebook
TwitterSample data in GeoJSON format available for download for testing purposes.