100+ datasets found

Meta Kaggle Code
kaggle.com
zip
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(167219625372 bytes)Available download formats
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
h
kbp37
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Speech and Language Technology, DFKI, kbp37 [Dataset]. https://huggingface.co/datasets/DFKI-SLT/kbp37
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Speech and Language Technology, DFKI
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
KBP37 is a revision of MIML-RE annotation dataset, provided by Gabor Angeli et al. (2014). They use both the 2010 and 2013 KBP official document collections, as well as a July 2013 dump of Wikipedia as the text corpus for annotation. There are 33811 sentences been annotated. Zhang and Wang made several refinements: 1. They add direction to the relation names, e.g. 'per:employee_of' is split into 'per:employee of(e1,e2)' and 'per:employee of(e2,e1)'. They also replace 'org:parents' with 'org:subsidiaries' and replace 'org:member of’ with 'org:member`' (by their reverse directions). 2. They discard low frequency relations such that both directions of each relation occur more than 100 times in the dataset.

KBP37 contains 18 directional relations and an additional 'no_relation' relation, resulting in 37 relation classes.
F
Data from: A Neural Approach for Text Extraction from Scholarly Figures
data.uni-hannover.de
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). A Neural Approach for Text Extraction from Scholarly Figures [Dataset]. https://data.uni-hannover.de/dataset/a-neural-approach-for-text-extraction-from-scholarly-figures
Explore at:
zipAvailable download formats
Dataset updated
Jan 20, 2022
Dataset authored and provided by
TIB
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
A Neural Approach for Text Extraction from Scholarly Figures

This is the readme for the supplemental data for our ICDAR 2019 paper.

You can read our paper via IEEE here: https://ieeexplore.ieee.org/document/8978202

If you found this dataset useful, please consider citing our paper:

@inproceedings{DBLP:conf/icdar/MorrisTE19, author = {David Morris and Peichen Tang and Ralph Ewerth}, title = {A Neural Approach for Text Extraction from Scholarly Figures}, booktitle = {2019 International Conference on Document Analysis and Recognition, {ICDAR} 2019, Sydney, Australia, September 20-25, 2019}, pages = {1438--1443}, publisher = {{IEEE}}, year = {2019}, url = {https://doi.org/10.1109/ICDAR.2019.00231}, doi = {10.1109/ICDAR.2019.00231}, timestamp = {Tue, 04 Feb 2020 13:28:39 +0100}, biburl = {https://dblp.org/rec/conf/icdar/MorrisTE19.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

This work was financially supported by the German Federal Ministry of Education and Research (BMBF) and European Social Fund (ESF) (InclusiveOCW project, no. 01PE17004).

Datasets

We used different sources of data for testing, validation, and training. Our testing set was assembled by the work we cited by Böschen et al. We excluded the DeGruyter dataset, and use it as our validation dataset.

Testing

These datasets contain a readme with license information. Further information about the associated project can be found in the authors' published work we cited: https://doi.org/10.1007/978-3-319-51811-4_2

Validation

The DeGruyter dataset does not include the labeled images due to license restrictions. As of writing, the images can still be downloaded from DeGruyter via the links in the readme. Note that depending on what program you use to strip the images out of the PDF they are provided in, you may have to re-number the images.

Training

We used label_generator's generated dataset, which the author made available on a requester-pays amazon s3 bucket. We also used the Multi-Type Web Images dataset, which is mirrored here.

Code

We have made our code available in code.zip. We will upload code, announce further news, and field questions via the github repo.

Our text detection network is adapted from Argman's EAST implementation. The EAST/checkpoints/ours subdirectory contains the trained weights we used in the paper.

We used a tesseract script to run text extraction from detected text rows. This is inside our code code.tar as text_recognition_multipro.py.

We used a java script provided by Falk Böschen and adapted to our file structure. We included this as evaluator.jar.

Parameter sweeps are automated by param_sweep.rb. This file also shows how to invoke all of these components.
Can I Play It? (CIPI) Dataset
zenodo.org
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Ramoneda; Dasaem Jeong; Vsevolod Eremenko; Nazif Can Tamer; Marius Miron; Xavier Serra; Pedro Ramoneda; Dasaem Jeong; Vsevolod Eremenko; Nazif Can Tamer; Marius Miron; Xavier Serra (2024). Can I Play It? (CIPI) Dataset [Dataset]. http://doi.org/10.5281/zenodo.8037327
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8037327
Dataset updated
Jun 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro Ramoneda; Dasaem Jeong; Vsevolod Eremenko; Nazif Can Tamer; Marius Miron; Xavier Serra; Pedro Ramoneda; Dasaem Jeong; Vsevolod Eremenko; Nazif Can Tamer; Marius Miron; Xavier Serra
Description
Can I Play It? (CIPI) dataset from Combining piano performance dimensions for score difficulty classification

Description

Overview

Predicting the difficulty of playing a musical score plays a pivotal role in structuring and exploring score collections, with significant implications for music education. The automatic difficulty classification of piano scores, however, remains an unsolved challenge. This is largely due to the scarcity of annotated data and the inherent subjectiveness in the annotation process. The "Can I Play It?" (CIPI) dataset represents a substantial step forward in this domain, providing a machine-readable collection of piano scores paired with difficulty annotations from the esteemed Henle Verlag.

Dataset Creation

The CIPI dataset is meticulously assembled by aligning public domain scores with their corresponding difficulty labels sourced from Henle Verlag. This initial pairing was subsequently reviewed and refined by an expert pianist to ensure accuracy and reliability. The dataset is structured to facilitate easy access and interpretation, making it a valuable resource for researchers and educators alike.

Contributions and Findings

Our work makes two primary contributions to the field of score difficulty classification. Firstly, we address the critical issue of data scarcity, introducing the CIPI dataset to the academic community. Secondly, we delve into various input representations derived from score information, utilizing pre-trained machine learning models tailored for piano fingering and expressiveness. These models draw inspiration from musicological definitions of performance, offering nuanced insights into score difficulty.

Through extensive experimentation, we demonstrate that an ensemble approach—combining outputs from multiple classifiers—yields superior results compared to individual classifiers. This highlights the diverse facets of difficulty captured by different representations. Our comprehensive experiments lay a robust foundation for future endeavors in score difficulty classification, and our best-performing model reports a balanced accuracy of 39.5% and a median square error of 1.1 across the nine difficulty levels introduced in this study.

Access and Usage

The CIPI dataset, along with the associated code and models, is made publicly available to ensure reproducibility and to encourage further research in this domain. Users are encouraged to reference this resource in their work and to contribute to its ongoing development.

Citation

Ramoneda, P., Jeong, D., Eremenko, V., Tamer, N. C., Miron, M., & Serra, X. (2024). Combining Piano Performance Dimensions for Score Difficulty Classification. Expert Systems with Applications, 238, 121776. DOI: 10.1016/j.eswa.2023.121776

@article{Ramoneda2024,
author = {Pedro Ramoneda and Dasaem Jeong and Vsevolod Eremenko and Nazif Can Tamer and Marius Miron and Xavier Serra},
title = {Combining Piano Performance Dimensions for Score Difficulty Classification},
journal = {Expert Systems with Applications},
volume = {238},
pages = {121776},
year = {2024},
doi = {10.1016/j.eswa.2023.121776},
url = {https://doi.org/10.1016/j.eswa.2023.121776}
}

Contact

pedro.ramoneda@upf.edu

xavier.serra@upf.edu
Make Data Count Dataset - MinerU Extraction
kaggle.com
zip
Updated Aug 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omid Erfanmanesh (2025). Make Data Count Dataset - MinerU Extraction [Dataset]. https://www.kaggle.com/datasets/omiderfanmanesh/make-data-count-dataset-mineru-extraction
Explore at:
zip(4272989320 bytes)Available download formats
Dataset updated
Aug 26, 2025
Authors
Omid Erfanmanesh
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description

This dataset contains PDF-to-text conversions of scientific research articles, prepared for the task of data citation mining. The goal is to identify references to research datasets within full-text scientific papers and classify them as Primary (data generated in the study) or Secondary (data reused from external sources).

The PDF articles were processed using MinerU, which converts scientific PDFs into structured machine-readable formats (JSON, Markdown, images). This ensures participants can access both the raw text and layout information needed for fine-grained information extraction.

Files and Structure

Each paper directory contains the following files:

*_origin.pdf The original PDF file of the scientific article.

*_content_list.json Structured extraction of the PDF content, where each object represents a text or figure element with metadata. Example entry:

{ "type": "text", "text": "10.1002/2017JC013030", "text_level": 1, "page_idx": 0 }

full.md The complete article content in Markdown format (linearized for easier reading).

images/ Folder containing figures and extracted images from the article.

layout.json Page layout metadata, including positions of text blocks and images.

Data Mining Task

The aim is to detect dataset references in the article text and classify them:

DOIs (Digital Object Identifiers): https://doi.org/[prefix]/[suffix] Example: https://doi.org/10.5061/dryad.r6nq870

Accession IDs: Used by data repositories. Format varies by repository. Examples:

GSE12345 (NCBI GEO)

PDB 1Y2T (Protein Data Bank)

E-MEXP-568 (ArrayExpress)

Each dataset mention must be labeled as:

Primary: Data generated by the paper (new experiments, field observations, sequencing runs, etc.).

Secondary: Data reused from external repositories or prior studies.

Training and Test Splits

train/ → Articles with gold-standard labels (train_labels.csv).

test/ → Articles without labels, used for evaluation.

train_labels.csv → Ground truth with:

article_id: Research paper DOI.

dataset_id: Extracted dataset identifier.

type: Citation type (Primary / Secondary).

sample_submission.csv → Example submission format.

Example

Paper: https://doi.org/10.1098/rspb.2016.1151 Data: https://doi.org/10.5061/dryad.6m3n9 In-text span:

"The data we used in this publication can be accessed from Dryad at doi:10.5061/dryad.6m3n9." Citation type: Primary

This dataset enables participants to develop and test NLP systems for:

Information extraction (locating dataset mentions).

Identifier normalization (mapping mentions to persistent IDs).

Citation classification (distinguishing Primary vs Secondary data usage).
Zenodo Code Images
kaggle.com
zip
Updated Jun 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Research Computing Center (2018). Zenodo Code Images [Dataset]. https://www.kaggle.com/datasets/stanfordcompute/code-images
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 18, 2018
Dataset authored and provided by
Stanford Research Computing Center
Description
Code Images

Context

This is a subset of the Zenodo-ML Dinosaur Dataset [Github] that has been converted to small png files and organized in folders by the language so you can jump right in to using machine learning methods that assume image input.

Content

Included are .tar.gz files, each named based on a file extension, and when extracted, will produce a folder of the same name.

tree -L 1 . ├── c ├── cc ├── cpp ├── cs ├── css ├── csv ├── cxx ├── data ├── f90 ├── go ├── html ├── java ├── js ├── json ├── m ├── map ├── md ├── txt └── xml

And we can peep inside a (somewhat smaller) of the set to see that the subfolders are zenodo identifiers. A zenodo identifier corresponds to a single Github repository, so it means that the png files produced are chunks of code of the extension type from a particular repository.

$ tree map -L 1 map ├── 1001104 ├── 1001659 ├── 1001793 ├── 1008839 ├── 1009700 ├── 1033697 ├── 1034342 ... ├── 836482 ├── 838329 ├── 838961 ├── 840877 ├── 840881 ├── 844050 ├── 845960 ├── 848163 ├── 888395 ├── 891478 └── 893858 154 directories, 0 files

Within each folder (zenodo id) the files are prefixed by the zenodo id, followed by the index into the original image set array that is provided with the full dinosaur dataset archive.

$ tree m/891531/ -L 1 m/891531/ ├── 891531_0.png ├── 891531_10.png ├── 891531_11.png ├── 891531_12.png ├── 891531_13.png ├── 891531_14.png ├── 891531_15.png ├── 891531_16.png ├── 891531_17.png ├── 891531_18.png ├── 891531_19.png ├── 891531_1.png ├── 891531_20.png ├── 891531_21.png ├── 891531_22.png ├── 891531_23.png ├── 891531_24.png ├── 891531_25.png ├── 891531_26.png ├── 891531_27.png ├── 891531_28.png ├── 891531_29.png ├── 891531_2.png ├── 891531_30.png ├── 891531_3.png ├── 891531_4.png ├── 891531_5.png ├── 891531_6.png ├── 891531_7.png ├── 891531_8.png └── 891531_9.png 0 directories, 31 files

So what's the difference?

The difference is that these files are organized by extension type, and provided as actual png images. The original data is provided as numpy data frames, and is organized by zenodo ID. Both are useful for different things - this particular version is cool because we can actually see what a code image looks like.

How many images total?

We can count the number of total images:

find "." -type f -name *.png | wc -l 3,026,993

Dataset Curation

The script to create the dataset is provided here. Essentially, we start with the top extensions as identified by this work (excluding actual images files) and then write each 80x80 image to an actual png image, organizing by extension then zenodo id (as shown above).

Saving the Image

I tested a few methods to write the single channel 80x80 data frames as png images, and wound up liking cv2's imwrite function because it would save and then load the exact same content.

import cv2 cv2.imwrite(image_path, image)

Loading the Image

Given the above, it's pretty easy to load an image! Here is an example using scipy, and then for newer Python (if you get a deprecation message) using imageio.

image_path = '/tmp/data1/data/csv/1009185/1009185_0.png' from imageio import imread image = imread(image_path) array([[116, 105, 109, ..., 32, 32, 32], [ 48, 44, 48, ..., 32, 32, 32], [ 48, 46, 49, ..., 32, 32, 32], ..., [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8) image.shape (80,80) # Deprecated from scipy import misc misc.imread(image_path) Image([[116, 105, 109, ..., 32, 32, 32], [ 48, 44, 48, ..., 32, 32, 32], [ 48, 46, 49, ..., 32, 32, 32], ..., [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32], [ 32, 32, 32, ..., 32, 32, 32]], dtype=uint8)

Remember that the values in the data are characters that have been converted to ordinal. Can you guess what 32 is?

ord(' ') 32 # And thus if you wanted to convert it back... chr(32)

So how t...
s
CODE dataset
figshare.scilifelab.se
researchdata.se
+1more
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio H. Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Derick M. Oliveira; Paulo R. Gomes; Jéssica A. Canazart; Milton P. Ferreira; Carl R. Andersson; Peter W. Macfarlane; Wagner Meira Jr.; Thomas B. Schön; Antonio Luiz P. Ribeiro (2025). CODE dataset [Dataset]. http://doi.org/10.17044/scilifelab.15169716.v1
Explore at:
Unique identifier
https://doi.org/10.17044/scilifelab.15169716.v1
Dataset updated
Feb 27, 2025
Dataset provided by
Uppsala University & UFMG
Authors
Antonio H. Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Derick M. Oliveira; Paulo R. Gomes; Jéssica A. Canazart; Milton P. Ferreira; Carl R. Andersson; Peter W. Macfarlane; Wagner Meira Jr.; Thomas B. Schön; Antonio Luiz P. Ribeiro
License
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Description
Dataset with annotated 12-lead ECG records. The exams were taken in 811 counties in the state of Minas Gerais/Brazil by the Telehealth Network of Minas Gerais (TNMG) between 2010 and 2016. And organized by the CODE (Clinical outcomes in digital electrocardiography) group.Requesting accessResearchers affiliated to educational or research institutions might make requests to access this data dataset. Requests will be analyzed on an individual basis and should contain: Name of PI and host organisation; Contact details (including your name and email); and, the scientific purpose of data access request.If approved, a data user agreement will be forwarded to the researcher that made the request (through the email that was provided). After the agreement has been signed (by the researcher or by the research institution) access to the dataset will be granted.Openly available subset:A subset of this dataset (with 15% of the patients) is openly available. See: "CODE-15%: a large scale annotated dataset of 12-lead ECGs" https://doi.org/10.5281/zenodo.4916206.ContentThe folder contains: A column separated file containing basic patient attributes. The ECG waveforms in the wfdb format.Additional referencesThe dataset is described in the paper "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4. Related publications also using this dataset are:- [1] G. Paixao et al., “Validation of a Deep Neural Network Electrocardiographic-Age as a Mortality Predictor: The CODE Study,” Circulation, vol. 142, no. Suppl_3, pp. A16883–A16883, Nov. 2020, doi: 10.1161/circ.142.suppl_3.16883.- [2] A. L. P. Ribeiro et al., “Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/gf7pwg.- [3] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. P. Ribeiro, and W. Meira Jr, “Explaining end-to-end ECG automated diagnosis using contextual features,” in Machine Learning and Knowledge Discovery in Databases. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Ghent, Belgium, Sep. 2020, vol. 12461, pp. 204--219. doi: 10.1007/978-3-030-67670-4_13.- [4] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. Ribeiro, and W. M. Jr, “Explaining black-box automated electrocardiogram classiﬁcation to cardiologists,” in 2020 Computing in Cardiology (CinC), 2020, vol. 47. doi: 10.22489/CinC.2020.452.- [5] G. M. M. Paixão et al., “Evaluation of mortality in bundle branch block patients from an electronic cohort: Clinical Outcomes in Digital Electrocardiography (CODE) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/dcgk.- [6] G. M. M. Paixão et al., “Evaluation of Mortality in Atrial Fibrillation: Clinical Outcomes in Digital Electrocardiography (CODE) Study,” Global Heart, vol. 15, no. 1, p. 48, Jul. 2020, doi: 10.5334/gh.772.- [7] G. M. M. Paixão et al., “Electrocardiographic Predictors of Mortality: Data from a Primary Care Tele-Electrocardiography Cohort of Brazilian Patients,” Hearts, vol. 2, no. 4, Art. no. 4, Dec. 2021, doi: 10.3390/hearts2040035.- [8] G. M. Paixão et al., “ECG-AGE FROM ARTIFICIAL INTELLIGENCE: A NEW PREDICTOR FOR MORTALITY? THE CODE (CLINICAL OUTCOMES IN DIGITAL ELECTROCARDIOGRAPHY) STUDY,” Journal of the American College of Cardiology, vol. 75, no. 11 Supplement 1, p. 3672, 2020, doi: 10.1016/S0735-1097(20)34299-6.- [9] E. M. Lima et al., “Deep neural network estimated electrocardiographic-age as a mortality predictor,” Nature Communications, vol. 12, 2021, doi: 10.1038/s41467-021-25351-7.- [10] W. Meira Jr, A. L. P. Ribeiro, D. M. Oliveira, and A. H. Ribeiro, “Contextualized Interpretable Machine Learning for Medical Diagnosis,” Communications of the ACM, 2020, doi: 10.1145/3416965.- [11] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11, no. 1, p. 1760, 2020, doi: 10/drkd.- [12] A. H. Ribeiro et al., “Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network,” Machine Learning for Health (ML4H) Workshop at NeurIPS, 2018.- [13] A. H. Ribeiro et al., “Automatic 12-lead ECG classiﬁcation using a convolutional network ensemble,” 2020. doi: 10.22489/CinC.2020.130.- [14] V. Sangha et al., “Automated Multilabel Diagnosis on Electrocardiographic Images and Signals,” medRxiv, Sep. 2021, doi: 10.1101/2021.09.22.21263926.- [15] S. Biton et al., “Atrial fibrillation risk prediction from the 12-lead ECG using digital biomarkers and deep representation learning,” European Heart Journal - Digital Health, 2021, doi: 10.1093/ehjdh/ztab071.Code:The following github repositories perform analysis that use this dataset:- https://github.com/antonior92/automatic-ecg-diagnosis- https://github.com/antonior92/ecg-age-predictionRelated Datasets:- CODE-test: An annotated 12-lead ECG dataset (https://doi.org/10.5281/zenodo.3765780)- CODE-15%: a large scale annotated dataset of 12-lead ECGs (https://doi.org/10.5281/zenodo.4916206)- Sami-Trop: 12-lead ECG traces with age and mortality annotations (https://doi.org/10.5281/zenodo.4905618)Ethics declarationsThe CODE Study was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais, protocol 49368496317.7.0000.5149.
CODEBRIM: COncrete DEfect BRidge IMage Dataset
zenodo.org
data-staging.niaid.nih.gov
+2more
bin, zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh; Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh (2020). CODEBRIM: COncrete DEfect BRidge IMage Dataset [Dataset]. http://doi.org/10.5281/zenodo.2620293
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2620293
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh; Martin Mundt; Sagnik Majumder; Sreenivas Murali; Panagiotis Panetsos; Visvanathan Ramesh
Description
CODEBRIM: COncrete DEfect BRidge IMage Dataset for multi-target multi-class concrete defect classification in computer vision and machine learning.

Dataset as presented and detailed in our CVPR 2019 publication: http://openaccess.thecvf.com/content_CVPR_2019/html/Mundt_Meta-Learning_Convolutional_Neural_Architectures_for_Multi-Target_Concrete_Defect_Classification_With_CVPR_2019_paper.html or https://arxiv.org/abs/1904.08486 . If you make use of the dataset please cite it as follows:

"Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh. Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019"

We offer a supplementary GitHub repository with code to reproduce the paper and data loaders: https://github.com/ccc-frankfurt/meta-learning-CODEBRIM

For ease of use we provide the dataset in multiple different versions.

Files contained:
* CODEBRIM_original_images: contains the original full-resolution images and bounding box annotations
* CODEBRIM_cropped_dataset: contains the extracted crops/patches with corresponding class labels from the bounding boxes
* CODEBRIM_classification_dataset: contains the cropped patches with corresponding class labels split into training, validation and test sets for machine learning
* CODEBRIM_classification_balanced_dataset: similar to "CODEBRIM_classification_dataset" but with the exact replication of training images to balance the dataset in order to reproduce results obtained in the paper.

Data from: OPTIMAP: A Dataset for Open Public Transport Infrastructure and...

zenodo.org
data.niaid.nih.gov

pdf

Updated Feb 1, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Maximilian T. Fischer; Maximilian T. Fischer; Daniel Fürst; Daniel Fürst; Yannick Metz; Yannick Metz; Manuel Schmidt; Manuel Schmidt; Julius Rauscher; Julius Rauscher; Daniel A. Keim; Daniel A. Keim (2025). OPTIMAP: A Dataset for Open Public Transport Infrastructure and Mobility Accessibility Profiles [Dataset]. http://doi.org/10.5281/zenodo.14772647

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14772647

Dataset updated

Feb 1, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically

Description

Introduction

This dataset provides a comprehensive assessment of public transport connectivity across Germany by analyzing both walking distances to the nearest public transport stops as well as the quality of public transport connections for daily usage scenarios with housing-level-granularity on a country-wide scale. The data was generated through a novel approach that integrates multiple open data sources, simulation models, and visual analytics techniques, enabling researchers, policymakers, and urban planners to identify gaps and opportunities for transit network improvements. ewline

Why does it matter?

Efficient and accessible public transportation is a critical component of sustainable urban development. However, many transit networks struggle to adequately serve diverse populations due to infrastructural, financial, and urban planning limitations. Traditional transit planning often relies on aggregated statistics, expert opinions, or limited surveys, making it difficult to assess transport accessibility at an individual household level. This dataset provides a data-driven and reproducible methodology for unbiased country-wide comparisons.

Find more information at https://mobility.dbvis.de.

Key Facts, Download, Citation

Title	OPTIMAP: A Dataset for Open Public Transport Infrastructure and Mobility Accessibility Profiles
Acronym	OPTIMAP
Download	https://mobility.dbvis.de/data-results/OPTIMAP_v2025-02-01.parquet (478MB, parquet)
License	Datenlizenz Deutschland - Namensnennung - Version 2.0 (dl-de-by/2.0)

Please cite the dataset as:

Maximilian T. Fischer, Daniel Fürst, Yannick Metz, Manuel Schmidt, Julius Rauscher, and Daniel A. Keim. OPTIMAP: A Dataset for Open Public Transport Infrastructure and Mobility Accessibility Profiles. Zenodo, 2025. doi: 10.5281/zenodo.14772646.

or, when using Bibtex

@dataset{MobilityProfiles.DatasetGermany.2025,
author = {Fischer, Maximilian T. and
Fürst, Daniel and
Metz, Yannick and
Schmidt, Manuel and
Rauscher, Julius and
Keim, Daniel A.},
title = {OPTIMAP: A Dataset for Open Public Transport Infrastructure and Mobility Accessibility Profiles},
year = 2025,
publisher = {Zenodo},
doi = {10.5281/zenodo.14772646}
}

Dataset Description

The dataset in the PARQUET format includes detailed accessibility measures for public transport at a fine-grained, housing-level resolution. It consists of four columns:

lat, lng (float32): GPS coordinates (EPSG:4326) of each house in Germany, expensively compiled from the house coordinates (HK-DE) data provided by the 16 federal states under the EU INSPIRE regulations.
MinDistanceWalking (int32): An approximate walking distance (in meters) to the nearest public transport stop from each registered building in Germany.
scores_OVERALL (float32): A simulated, demographic- and scenario-weighted measure of public transport quality for daily usage, considering travel times, frequency, and coverage across various daily scenarios (e.g., commuting, shopping, medical visits). The results are represented in an artificial time unit to allow comparative analysis across locations.

Methodology

The dataset was generated using a combination of open geospatial data and advanced transport simulation techniques.

Data Sources: Public transit information from the German national access point (DELFI NeTEx), housing geolocation data from various state authorities, and routing information from OpenStreetMap.
Walking Distance Calculation: The shortest path to the nearest transit stop was computed using the Dijkstra algorithm on a graph network of publicly available pathways sourced from OSM, considering the ten aerial-nearest public transport stops.
Public Transport Quality Estimation: The dataset incorporates a scenario-based simulation model, analyzing weight-averaged travel times and connection frequency to typical daily POIs such as the individually nearest train stations, kindergartens, schools, institutions of higher education, fitness centers, cinemas, places of worship, supermarkets, shopping malls, restaurants, doctors, parks, and cultural institutions. It includes walking distances to the start and from the destination public transport stops as well as the averaged travel and waiting times on the shortest route calculated via a modified Dijkstra algorithm. The results are aggregated using a demographically- and scenario-weighted metric to ensure comparability. The value is in the unit of time, although it should not be interpreted directly as real minutes.
Visualization and Validation: A WebGL-based interactive tool and static precomputed maps were developed to allow users to interactively explore transport accessibility metrics dynamically, available at https://mobility.dbvis.de.

Potential Applications

The dataset enables multiple use cases across research, policy, and urban planning:

Public Accessibility Studies: Provides insights into transport equity by evaluating mobility gaps affecting different demographic groups, different regional areas, and comparing county and state efforts in improving public transport quality.
Urban Planning and Transport Policy: Supports data-driven decision-making for optimizing transit networks, adjusting service schedules, or identifying underserved areas.
Smart City Development: Assists in integrating mobility analytics into broader smart city initiatives for efficient resource allocation and sustainability planning.
Academic Research: Facilitates studies in transportation engineering, urban geography, and mobility behavior analysis.

Conclusion

By offering high-resolution public transport accessibility data at housing-level granularity, this dataset contributes to a more transparent and objective understanding of urban mobility challenges. The integration of simulation models, demographic considerations, and scalable analytics provides a novel approach to evaluating and improving public transit systems. Researchers, city officials, and policymakers are encouraged to leverage this dataset to enhance transport infrastructure planning and accessibility.

This dataset contains both the approximate walking distances in meters and a weighted overall quality score in an artificial time unit for each individual house in Germany. More advanced versions are currently not publicly available. This base dataset is publicly available and adheres to open data licensing principles, enabling its reuse for scientific and policy-oriented studies.

Source Data Licenses

While not part of this dataset, the scientific simulation used to create the results leverages public transit information via the National Access Point (NAP) DELFI as NeTEx, provided via GTFS feeds of Germany (CC BY 4.0).

Also, routing information used during the processing was based on Open Street Map contributors (CC BY 4.0).

Primarily, this dataset contains original and slightly processed housing locations (lat, lng) that were made available as part of the EU INSPIRE regulations, based on Directive (EU) 2019/1024 (of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information (recast)).

In Germany, the respective data is provided individually by the 16 federal states, with the following required attributions and license indications:

BB: EU INSPIRE / © GeoBasis-DE/LGB, dl-de-by/2.0 (data modified)
BE: EU INSPIRE / © Geoportal Berlin / Hauskoordinaten, dl-de-by/2.0 (data modified)
BW: EU INSPIRE / © LGL, www.lgl-bw.de, <a

t
Programming Language Ecosystem Project TU Wien
test.researchdata.tuwien.at
csv, text/markdown
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer (2024). Programming Language Ecosystem Project TU Wien [Dataset]. http://doi.org/10.70124/gnbse-ts649
Explore at:
text/markdown, csvAvailable download formats
Unique identifier
https://doi.org/10.70124/gnbse-ts649
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Time period covered
Dec 12, 2023
Area covered
Vienna
Description
About Dataset
This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.
The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

About Data collection methodology
The dataset was created using the github repository above. As input data, three public datasets where used.
github_metadata
Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.
PYPL_survey_2004-2023
Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.
stack_overflow_developer_survey
Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.
All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above

Description of the data
The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

The languages that are going to be considered for the project can be seen here:
- Python
- C
- C++
- Java
- C#
- JavaScript
- PHP
- SQL
- Assembly
- Scratch
- Fortran
- Go
- Kotlin
- Delphi
- Swift
- Rust
- Ruby
- R
- COBOL
- F#
- Perl
- TypeScript
- Haskell
- Scala

License
This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.
TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

Acknowledgments
Thanks go out to
- stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.
- the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.
- Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.
T
civil_comments
tensorflow.org
huggingface.co
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
Explore at:
Dataset updated
Feb 28, 2023
Description
This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('civil_comments', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Best Books Ever Dataset
zenodo.org
csv
Updated Nov 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4265096
Dataset updated
Nov 10, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

The 25 fields of the dataset are:

| Attributes | Definition | Completeness | | ------------- | ------------- | ------------- | | bookId | Book Identifier as in goodreads.com | 100 | | title | Book title | 100 | | series | Series Name | 45 | | author | Book's Author | 100 | | rating | Global goodreads rating | 100 | | description | Book's description | 97 | | language | Book's language | 93 | | isbn | Book's ISBN | 92 | | genres | Book's genres | 91 | | characters | Main characters | 26 | | bookFormat | Type of binding | 97 | | edition | Type of edition (ex. Anniversary Edition) | 9 | | pages | Number of pages | 96 | | publisher | Editorial | 93 | | publishDate | publication date | 98 | | firstPublishDate | Publication date of first edition | 59 | | awards | List of awards | 20 | | numRatings | Number of total ratings | 100 | | ratingsByStars | Number of ratings by stars | 97 | | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 | | setting | Story setting | 22 | | coverImg | URL to cover image | 99 | | bbeScore | Score in Best Books Ever list | 100 | | bbeVotes | Number of votes in Best Books Ever list | 100 | | price | Book's price (extracted from Iberlibro) | 73 |
MSP430FR5969 Basic Block Worst Case Energy Consumption (WCEC) and Worst Case...
zenodo.org
data.niaid.nih.gov
bin, csv, xz
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugo Reymond; Hugo Reymond; Hector Chabot; Hector Chabot; Abderaouf Nassim Amalou; Abderaouf Nassim Amalou; Isabelle Puaut; Isabelle Puaut (2024). MSP430FR5969 Basic Block Worst Case Energy Consumption (WCEC) and Worst Case Execution Time (WCET) dataset [Dataset]. http://doi.org/10.5281/zenodo.11066623
Explore at:
csv, bin, xzAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11066623
Dataset updated
Dec 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hugo Reymond; Hugo Reymond; Hector Chabot; Hector Chabot; Abderaouf Nassim Amalou; Abderaouf Nassim Amalou; Isabelle Puaut; Isabelle Puaut
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains around 30 000 basic blocks whose energy consumption and execution time have been measured in isolation on the MSP430FR5969 microcontroller, at 1MHz. Basic blocks were executed in a worst case scenario regarding the MSP430 FRAM cache and CPU pipeline. The dataset creation process is described thoroughly in [1].

Folder structure

This dataset is composed of the following files:

basic_blocks.tar.xz contains all basic blocks (BB) used in the dataset, in a custom JSON format,

data.csv/data.xlsx contains the measured energy consumption and execution time for each basic block

We first details how the basic_blocks.tar.gz archive is organized, and then present the CSV/XSLX spreadsheet format.

Basic Blocks

We extracted the basic blocks from a subset of programs of the AnghaBench benchmark suite [2]. The basic_blocks.tar.gz archive consist of the extracted basic blocks organized as json files. Each json file correspond to a C source file from AnghaBench, and is given a unique identifier. An example json (137.json) is available here:

{ "extr_pfctl_altq.c_pfctl_altq_init": [ # Basic block 1 [ # Instruction 1 of BB1 [ "MOV.W", "#queue_map", "R13" ], # Instruction 2 of BB1 [ "MOV.B", "#0", "R14" ], # Instruction 3 of BB1 [ "CALL", "#hcreate_r", null ] ], # Basic block 2 [ .... ] ] }

The json contains a dict with only one key pointing to an array of basic blocks. This key is the name of the original C source file in AnghaBench from which the basic blocks were extracted (here extr_pfctl_altq.c_pfctl_altq_init.c). The array contains severals basic blocks, which are represented as an array of instructions, which are themselves represented as an array [OPCODE, OPERAND1, OPERAND2].

Then, each basic block can be identified uniquely using two ids : its file id and its offset in the file (id=). In our example, the basic block 1 can be identified by the json file id (137) and its offset in the file (0). Its ID is 137_0. This ID is used to make the mapping between a basic block and its energy consumption/execution time, with the data.csv/data.xlsx spreadsheet.

Energy Consumption and Execution Time

Energy consumption and execution time data are stored in the data.csv file. Here is the extract of the csv file corresponding to the basic block 137_0. The spreadsheet format is described below.

bb_id;nb_inst;max_energy;max_time;avg_time;avg_energy;energy_per_inst;nb_samples;unroll_factor 137_0;3;8.77;7.08;7.04;8.21;2.92;40;50

Spreadsheet format :

bb_id: the unique identifier of a basic block (cf. Basic Blocks)

nb_inst: the number of instructions in the basic block

max_energy: the maximum energy comsumption (in nJ) measured during the experiment

max_time: the maximum execution time (in us) measured during the experiment

avg_time: the average execution time (in us) measured during the experiment

avg_energy: the average energy comsumption (in nJ) measured during the experiment

energy_per_inst: the average energy consumption per instruction (correspond to avg_energy/nb_inst)

nb_samples: how much time the basic block energy consumption/execution time has been measured

unroll_factor: how much time the basic block was unrolled (cf Basic Block Unrolling)

Basic Block Unrolling

To measure the energy consumption and execution time of the msp430, we need to be able to handle the scale difference between the measurement tool and the basic block execution time. This is achieved by duplicating the basic block multiple times while making sure to keep the worst-case memory layout as explained in the paper. The number of time the basic block has been duplicated is called the unroll_factor.

Values of energy and time are always given per basic block, so they have already been divided by the unroll factor.

Dataset description

Features

The selected features after PCA analysis for both energy and time model are listed here: MOV.W_Rn_Rn, MOV.W_X(Rn)_X(Rn), CALL, MOV.B_#N_Rn, ADD.W_Rn_Rn, MOV.W_@Rn_Rn, MOV.W_X(Rn)_Rn, ADD.W_#N_Rn, PUSHM.W_#N_Rn, MOV.W_X(Rn)_ADDR, CMP.W_#N_Rn, MOV.W_&ADDR_X(Rn), MOV.W_Rn_X(Rn), BIS.W_Rn_Rn, RLAM.W_#N_Rn, SUB.W_#N_Rn, MOV.W_&ADDR_Rn, MOV.W_#N_X(Rn), CMP.W_Rn_Rn, BIT.W_ADDR_Rn, MOV.W_@Rn_X(Rn), ADD.W_#N_X(Rn), MOV.W_#N_Rn, AND.W_Rn_Rn, MOV.W_Rn_ADDR, SUB.W_Rn_Rn, MOV.W_ADDR_Rn, MOV.W_X(Rn)_&ADDR, MOV.W_ADDR_ADDR, JMP, ADD_#N_Rn, BIS.W_Rn_X(Rn), SUB_Rn_Rn, MOV.W_ADDR_X(Rn), ADDC_#N_X(Rn), MOV.B_Rn_Rn, CMP.W_X(Rn)_X(Rn), ADD_Rn_Rn, nb_inst, INV.W_Rn_, NOP_, ADD.W_X(Rn)_X(Rn), ADD.W_Rn_X(Rn), MOV.B_@Rn_Rn, BIS.W_X(Rn)_X(Rn), MOV.B_#N_X(Rn), MOV.W_#N_ADDR, AND.W_#N_ADDR, SUBC_X(Rn)_X(Rn), BIS.W_#N_X(Rn), SUB.W_X(Rn)_X(Rn), AND.B_#N_Rn, ADD_X(Rn)_X(Rn), MOV.W_@Rn_ADDR, MOV.W_&ADDR_ADDR, ADDC_Rn_Rn, AND.W_#N_X(Rn), SUB_#N_Rn, RRUM.W_#N_Rn, AND_ADDR_Rn, CMP.W_X(Rn)_ADDR, MOV.B_#N_ADDR, ADD.W_#N_ADDR, CMP.B_#N_Rn, SXT_Rn_, XOR.W_Rn_Rn, CMP.W_@Rn_Rn, ADD.W_@Rn_Rn, ADD.W_X(Rn)_Rn, AND.W_Rn_X(Rn), CMP.B_Rn_Rn, AND.W_X(Rn)_X(Rn), BIC.W_#N_Rn, BIS.W_#N_Rn, AND.B_#N_X(Rn), MOV.B_X(Rn)_X(Rn), AND.W_@Rn_Rn, MOV.W_#N_&ADDR, BIS.W_Rn_ADDR, SUB.W_X(Rn)_Rn, SUB.W_Rn_X(Rn), SUB_X(Rn)_X(Rn), MOV.B_@Rn_X(Rn), CMP.W_@Rn_X(Rn), ADD.W_X(Rn)_ADDR, CMP.W_Rn_X(Rn), BIS.W_@Rn_X(Rn), CMP.B_X(Rn)_X(Rn), RRC.W_Rn_, MOV.W_@Rn_&ADDR, CMP.W_#N_X(Rn), ADDC_X(Rn)_Rn, CMP.W_X(Rn)_Rn, BIS.W_X(Rn)_Rn, SUB_X(Rn)_Rn, MOV.B_X(Rn)_Rn, MOV.W_ADDR_&ADDR, AND.W_#N_Rn, RLA.W_Rn_, INV.W_X(Rn)_, XOR.W_#N_Rn, SUB.W_Rn_ADDR, BIC.W_#N_X(Rn), MOV.B_X(Rn)_ADDR, ADD_#N_X(Rn), SUB_Rn_X(Rn), MOV.B_&ADDR_Rn, MOV.W_Rn_&ADDR, ADD_X(Rn)_Rn, AND.W_X(Rn)_Rn, PUSHM.A_#N_Rn, RRAM.W_#N_Rn, AND.W_@Rn_X(Rn), BIS.B_Rn_X(Rn), SUB.W_@Rn_Rn, CLRC_, CMP.W_#N_ADDR, XOR.W_Rn_X(Rn), MOV.B_Rn_ADDR, CMP.B_X(Rn)_Rn, BIS.B_Rn_Rn, BIS.W_X(Rn)_ADDR, CMP.B_#N_X(Rn), CMP.W_Rn_ADDR, XOR.W_X(Rn)_Rn, MOV.B_Rn_X(Rn), ADD.B_#N_Rn

Code

The trained machine learning model, tests, and local explanation code can be generated and found here: WORTEX Machine learning code

Acknowledgment

This work has received a French government support granted to the Labex CominLabs excellence laboratory and managed by the National Research Agency in the “Investing for the Future” program under reference ANR-10-LABX-07-01

Licensing

Copyright 2024 Hector Chabot Copyright 2024 Abderaouf Nassim Amalou Copyright 2024 Hugo Reymond Copyright 2024 Isabelle Puaut

Licensed under the Creative Commons Attribution 4.0 International License

References

[1] Reymond, H., Amalou, A. N., Puaut, I. “WORTEX: Worst-Case Execution Time and Energy Estimation in Low-Power Microprocessors using Explainable ML” in 22nd International Workshop on Worst-Case Execution Time Analysis (WCET 2024)

[2] Da Silva, Anderson Faustino, et al. “Anghabench: A suite with one million compilable C benchmarks for code-size reduction.” 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2021.
R
Racconnall Dataset
universe.roboflow.com
zip
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
coding code (2024). Racconnall Dataset [Dataset]. https://universe.roboflow.com/coding-code-bzji8/racconnall
Explore at:
zipAvailable download formats
Dataset updated
Mar 11, 2024
Dataset authored and provided by
coding code
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
F Bounding Boxes
Description
Racconnall

## Overview Racconnall is a dataset for object detection tasks - it contains F annotations for 1,726 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Results files for Land-free Bioenergy From Circular Agroecology -- A Diverse...
data.europa.eu
unknown
Updated Mar 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Results files for Land-free Bioenergy From Circular Agroecology -- A Diverse Option Space and Trade-offs [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8246394?locale=hr
Explore at:
unknown(8362)Available download formats
Dataset updated
Mar 16, 2024
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the open data repository to support and reproduce results in the paper "Land-free Bioenergy From Circular Agroecology -- A Diverse Option Space and Trade-offs." There are three types of files here: 1. Ready-to-use final results files of all strategies and scenarios referred to in the paper. They can be downloaded and used directly without running any codes. They all have the same naming format for strategies/scenarios: Org = organic share, ConcRed = concentrate feeding reduction share, WasteRed = waste reduction share, and numbers refer to the share. E.g., Org0_ConcRed50_WasteRed75 is a strategy with 0% organic share, 50% concentrate feeding reduction, and 75% waste reduction. NationalAncillaryBioenergyPotential_EJ.csv: The national potential of ancillary bioenergy in 2050 from all scenarios. (Units: EJ). Same in both pathways. GlobalPotentialEnvironmentalImpacts_NutrientFirst.csv: Environmental impacts of all scenarios from the pathway NutrientFirst. The first three rows refer to the combination of agroecological practices in places, which allow you to explore environmental impacts grouped by, e.g., different organic shares. GlobalPotentialEnvironmentalImpacts_NegFirst.csv: Same structure as the file above, but from another pathway, NegativeFirst. 2. SOLmOutputs contains all original output files from our model SOLmV6. 3. DataCleaningKit has the Python codes and additional dataset of heat values to process 2. SOLmOutputs and spit 1. (Tip: One should adjust the input_path and output_path before running DataCleaning.py.) Fei Wu (fei.wu@usys.ethz.ch) Delft, August, 2023
Social Reward and Nonsocial Reward Processing Across the Adult Lifespan: An...
openneuro.org
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David V. Smith; Cooper J. Sharp; Abraham Dachs; James Wyngaarden; Daniel Sazhin; Jen Yang; Melanie Kos; Tia Tropea; Ishika Kohli; John A. Clithero; Ingrid Olson; Tania Giovannetti; Dominic Fareri; Johanna M. Jarcho (2024). Social Reward and Nonsocial Reward Processing Across the Adult Lifespan: An Interim Multi-echo fMRI and Diffusion Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds005123.v1.1.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds005123.v1.1.1
Dataset updated
Aug 1, 2024
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
David V. Smith; Cooper J. Sharp; Abraham Dachs; James Wyngaarden; Daniel Sazhin; Jen Yang; Melanie Kos; Tia Tropea; Ishika Kohli; John A. Clithero; Ingrid Olson; Tania Giovannetti; Dominic Fareri; Johanna M. Jarcho
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary

This is a preliminary release of a dataset supported by the National Institutes of Aging and National Insitutes of Health. The full dataset is described in a submission to Data in Brief.

Abstract

Social relationships change across the lifespan as social networks narrow and motivational priorities shift. These changes may affect, or reflect, differences in how older adults make decisions related to processing social and non-social rewards. While we have shown initial evidence that older adults have a blunted response to some features of social reward, further work in larger samples is needed to probe the extent to which age-related differences translate to real world consequences, such as financial exploitation. To address this gap, we are conducting a 5-year study funded by the National Institute on Aging (NIH R01-AG067011). Over the course of the funding period (2021-2026), this study seeks to: 1) characterize neural responses to social rewards across adulthood; 2) relate those responses to risk for financial exploitation and sociodemographic factors tied to risk; and 3) examine changes in risk for financial exploitation over time in healthy and vulnerable groups of older adults. This paper describes the preliminary release of data for the larger study. Adults (N=114; 40 male / 70 female / 4 other or non-binary; 21-80 years of age M = 42.78, SD = 17.13) were recruited from the community to undergo multi-echo fMRI while completing tasks that measure brain function during social reward and decision-making. Tasks probe neural response to social reward (e.g., peer vs. monetary feedback) and social context and closeness (e.g., sharing a monetary reward with a friend compared to a stranger). Neural response to social decision-making is probed via economic trust and ultimatum games. Functional data, are complimented by a T1 weighted anatomical scan, and diffusion-weighted imaging (DWI) to enable tractography. This dataset has extensive potential for re-use, including leveraging multimodal neuroimaging data, within subject measures of fMRI data from different tasks – data features that are rarely see in an adult lifespan dataset.

Expanded Task Names

doors and socialdoors: a task in which participants received well-matched social and monetary rewards and punishment;

ugdg: a strategic reward-based decision-making task with Ultimatum and Dictator Game conditions

trust: a task where participants choose an amount to invest in their partner (friend, stranger, or computer) and see wether or not that partner shared the tripled amount back

sharedreward: a task where participants shared rewards or losses with peers, strangers, or non-human partners

Additional Usage Notes

We note that participants 10584, 10951, and 11005 are missing dwi. This is due to chiller malfunctions during the sequence that halted data collection. We also note that not all participants have two runs of each task. This was due to time constraints during the scan visits.

Code related to this dataset can be found on GitHub (https://github.com/DVS-Lab/SRPAL-DataInBrief/code/).

Original sourcedata for behavioral data is included in the sourcedata folder. Due to privacy restrictions, we cannot release original sourcedata for the imaging data (i.e., DICOM files).
h
python-code-dataset-500k
huggingface.co
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James (2024). python-code-dataset-500k [Dataset]. https://huggingface.co/datasets/jtatman/python-code-dataset-500k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 22, 2024
Authors
James
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Attention: This dataset is a summary and reformat pulled from github code.

You should make your own assumptions based on this. In fact, there is another dataset I formed through parsing that addresses several points:

out of 500k python related items, most of them are python-ish, not pythonic the majority of the items here contain excessive licensing inclusion of original code the items here are sometimes not even python but have references There's a whole lot of gpl summaries… See the full description on the dataset page: https://huggingface.co/datasets/jtatman/python-code-dataset-500k.
o
Data from: Dataset and code for Parent–Child Adaptive Responses for Digital...
openicpsr.org
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Ziker; Jerry Fails; Kendall House; Hollie Abele (2025). Dataset and code for Parent–Child Adaptive Responses for Digital Resilience [Dataset]. http://doi.org/10.3886/E230863V1
Explore at:
Unique identifier
https://doi.org/10.3886/E230863V1
Dataset updated
May 22, 2025
Dataset provided by
Boise State University
Authors
John Ziker; Jerry Fails; Kendall House; Hollie Abele
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 2023 - Jul 2024
Description
This is the dataset and code needed to run the analyses for Study 3 highlighted in the article: Ziker, John P., Jerry Alan Fails, Kendall House, Jessi Boyer, Michael Wendell, Hollie Abele, Letizia Maukar, and Kayla Ramirez. 2025. “Parent–Child Adaptive Responses for Digital Resilience.” Social Sciences 14 (4): 1–24. https://doi.org/10.3390/socsci14040197.The dataset and code were originally made available here: https://github.com/johnziker/digitalResilienceofYouth
OpenGlue
kaggle.com
zip
Updated Apr 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
k_s (2022). OpenGlue [Dataset]. https://www.kaggle.com/datasets/ksork6s4/openglue
Explore at:
zip(340106 bytes)Available download formats
Dataset updated
Apr 26, 2022
Authors
k_s
Description
this dataset is clone of this repo

The following is the README of original repository.

=======================================

OpenGlue - Open Source Pipeline for Image Matching

This is an implementation of the training, inference and evaluation scripts for OpenGlue under an open source license, our paper - OpenGlue: Open Source Graph Neural Net Based Pipeline for Image Matching

Overview

SuperGlue - a method for learning feature matching using graph neural network, proposed by a team (Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, Andrew Rabinovich) from Magic Leap. Official full paper - SuperGlue: Learning Feature Matching with Graph Neural Networks.

We present OpenGlue: a free open-source framework for image matching, that uses a Graph Neural Network-based matcher inspired by SuperGlue. We show that including additional geometrical information, such as local feature scale, orientation, and affine geometry, when available (e.g. for SIFT features), significantly improves the performance of the OpenGlue matcher. We study the influence of the various attention mechanisms on accuracy and speed. We also present a simple architectural improvement by combining local descriptors with context-aware descriptors.

This repo is based on PyTorch Lightning framework and enables user to train, predict and evaluate the model.

For local feature extraction, our interface supports Kornia detectors and descriptors along with our version of SuperPoint.

We provide an instruction on how to launch training on MegaDepth dataset and test the trained models on Image Matching Challenge.

License

This code is licensed under the MIT License. Modifications, distribution, commercial and academic uses are permitted. More information in LICENSE file.

Data

Steps to prepare MegaDepth dataset for training

1) Create folder MegaDepth, where your dataset will be stored. mkdir MegaDepth && cd MegaDepth 2) Download and unzip MegaDepth_v1.tar.gz from official link. You should now be able to see MegaDepth/phoenix directory. 3) We provide the lists of pairs for training and validation, link to download. Each line corresponds to one pair and has the following structure: path_image_A path_image_B exif_rotationA exif_rotationB [KA_0 ... KA_8] [KB_0 ... KB_8] [T_AB_0 ... T_AB_15] overlap_AB overlap_AB - is a value of overlap between two images of the same scene, it shows how close (in position transformation) two images are.

The resulting directory structure should be as follows: MegaDepth/ - pairs/ | - 0000/ | | - sparse-txt/ | | | pairs.txt ... - phoenix/S6/zl548/MegaDepth_v1/ | -0000/ | | - dense0/ | | | - depths/ | | | | id.h5 ... | | | - images/ | | | | id.jpg ... | | - dense1/ ... ...

Steps to prepare Oxford-Paris dataset for pre-training

We also release the open-source weights for a pretrained OpenGlue on this dataset.

Usage

This repository is divided into several modules: * config - configuration files with training hyperparameters * data - preprocessing and dataset for MegaDepth * examples - code and notebooks with examples of applications * models - module with OpenGlue architecture and detector/descriptors methods * utils - losses, metrics and additional training utils

Dependencies

For all necessary modules refer to requirements.txt pip3 install -r requirements.txt

This code is compatible with Python >= 3.6.9 * PyTorch >= 1.10.0 * PyTorch Lightning >= 1.4.9 * Kornia >= 0.6.1 * OpenCV >= 4.5.4

Training

Extracting features

There are two options for feature extraction: 1) Extract features during training. No additional steps required before Launching training.

2) Extract and save features before training. We suggest using this approach, since training time is decreased immensely with pre-extracted features...
Z
Dataset for An Empirical Study of Hackathon Code Creation and Reuse
data.niaid.nih.gov
zenodo.org
Updated May 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Samir Imam Mahmoud; Tapajit Dey; Alexander Nolte; Audris Mockus; James D. Herbsleb (2022). Dataset for An Empirical Study of Hackathon Code Creation and Reuse [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6578707
Explore at:
Dataset updated
May 30, 2022
Dataset provided by
Carnegie Mellon University, Pittsburgh, PA, USA
University of Tartu, Estonia
Lero---the Irish Software Research Centre, University of Limerick, Ireland
University of Tennessee, Knoxville, TN, USA
University of Tartu, Estonia - Carnegie Mellon University, Pittsburgh, PA, USA
Authors
Ahmed Samir Imam Mahmoud; Tapajit Dey; Alexander Nolte; Audris Mockus; James D. Herbsleb
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is corresponds to our extended analysis done for “The Secret Life of Hackathon Code Where does it come from and where does it go?” (https://doi.org/10.1109/MSR52588.2021.00020, pre-print at: https://arxiv.org/abs/2103.01145) and “Tracking Hackathon Code Creation and Reuse” (https://doi.org/10.1109/MSR52588.2021.00085, pre-print at: https://arxiv.org/pdf/2103.10167). The replication package including the scripts used for generating this dataset from the “World of Code” (https://worldofcode.org/) dataset is available in Github link https://github.com/woc-hack/track_hack.

The dataset contains the blob hashes used in the scope of the analysis and the analysis outcome.

The columns are as following

DevpostID: Devpost identification for the hackathon project and it can be used to get the URL for the devpost.com website. Example DevpostID -q9nd5 can be translated to https://devpost.com/software/-q9nd5

ProjectID: The Github project name

HackathonEndDate: Hackathon event end date

BlobHash: The blob hash used in the analysis

BeforeHackathon-DuringHackathon-AfterHackathon: This column represents if the blob was first introduced before/during/after the hackathon (1: before, 2: during, 3: after)

SameAuthor-Contributor-OtherAuthor: This column represents if the blob was first created by someone in the hackathon team, or someone who was a contributor to a project in which one of the members of the hackathon project contributed to as well (contributor), or someone else outside of the hackathon team (1: Author is a hackathon team member, 2: Author Contributed before with a hackathon team member, 3: Author is not related to the hackathon team).

UsedBySmallProject-UsedByMediumProject-UsedByLargeProject: This column represents if the hackathon blob is reused again after the hackathon event and what is the project size that reused the code (1: not reused, 3: reused in small project, 4: reused in medium project, 5: reused in large project)

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:

zip(167219625372 bytes)Available download formats

Dataset updated

Nov 27, 2025

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Clear search

Close search

Google apps

Main menu

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

kbp37

Data from: A Neural Approach for Text Extraction from Scholarly Figures

A Neural Approach for Text Extraction from Scholarly Figures

Datasets

Testing

Validation

Training

Code

Can I Play It? (CIPI) Dataset

Can I Play It? (CIPI) dataset from Combining piano performance dimensions for score difficulty classification

Description

Overview

Dataset Creation

Contributions and Findings

Access and Usage

Citation

Contact

Make Data Count Dataset - MinerU Extraction

Dataset Description

Files and Structure

Data Mining Task

Training and Test Splits

Example

Zenodo Code Images

Code Images

Context

Content

Dataset Curation

Saving the Image

Loading the Image

CODE dataset

CODEBRIM: COncrete DEfect BRidge IMage Dataset

Data from: OPTIMAP: A Dataset for Open Public Transport Infrastructure and...

Introduction

Why does it matter?

Key Facts, Download, Citation

Dataset Description

Methodology

Potential Applications

Conclusion

Source Data Licenses

Programming Language Ecosystem Project TU Wien

About Dataset

About Data collection methodology

github_metadata

PYPL_survey_2004-2023

stack_overflow_developer_survey

Description of the data

License

Acknowledgments

civil_comments

Best Books Ever Dataset

MSP430FR5969 Basic Block Worst Case Energy Consumption (WCEC) and Worst Case...

Folder structure

Basic Blocks

Energy Consumption and Execution Time

Basic Block Unrolling

Dataset description

Features

Code

Acknowledgment

Licensing

References

Racconnall Dataset

Racconnall

Results files for Land-free Bioenergy From Circular Agroecology -- A Diverse...

Social Reward and Nonsocial Reward Processing Across the Adult Lifespan: An...

Summary

Abstract

Expanded Task Names

Additional Usage Notes

python-code-dataset-500k

Data from: Dataset and code for Parent–Child Adaptive Responses for Digital...