100+ datasets found

Z
Data from: A Large-scale Dataset of (Open Source) License Text Variants
data.niaid.nih.gov
Updated Mar 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
Explore at:
Dataset updated
Mar 31, 2022
Dataset authored and provided by
Stefano Zacchiroli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

For more details see the included README file and companion paper:

Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
Open Source And General Resource Software
catalog.data.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
+1more
Updated May 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.nasa.gov (2025). Open Source And General Resource Software [Dataset]. https://catalog.data.gov/dataset/open-source-and-general-resource-software
Explore at:
Dataset updated
May 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This dataset lists out all software in use by NASA
Data from: Standards Incorporated by Reference (SIBR) Database
catalog.data.gov
data.nist.gov
+1more
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Standards Incorporated by Reference (SIBR) Database [Dataset]. https://catalog.data.gov/dataset/standards-incorporated-by-reference-sibr-database
Explore at:
Dataset updated
Sep 30, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a searchable historical collection of standards referenced in regulations - Voluntary consensus standards, government-unique standards, industry standards, and international standards referenced in the Code of Federal Regulations (CFR).
I
Cline Center Coup d’État Project Dataset
databank.illinois.edu
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9651987_V7
Dataset updated
May 11, 2025
Authors
Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
Longitudinal Microbial Source Tracking Dataset
catalog.data.gov
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Longitudinal Microbial Source Tracking Dataset [Dataset]. https://catalog.data.gov/dataset/longitudinal-microbial-source-tracking-dataset
Explore at:
Dataset updated
Apr 25, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Dataset describes measurements of host-associated qPCR genetic markers along with other water quality parameters and precipitation from samples collected at marine, estuary, and freshwater recreational sites. Additional details provided in attached Dataset Description document. “This research dataset has been reviewed in accordance with U.S. Environmental Protection Agency (U.S. EPA), Office of Research and Development, and approved for release. Mention of brand names or vendors does not constitute an endorsement of products or services by the U.S. EPA.”
w
Dataset of news about Uzbekistan
workwithdata.com
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of news about Uzbekistan [Dataset]. https://www.workwithdata.com/datasets/news?f=1&fcol0=page_name&fop0=%3D&fval0=Uzbekistan
Explore at:
Dataset updated
May 16, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Uzbekistan
Description
This dataset is about news. It has 228 rows and is filtered where the keywords includes Uzbekistan. It features 10 columns including source, publication date, section, and news link.
Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21967265.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE
P
Webly-Reference SR Dataset Dataset
paperswithcode.com
Updated Jun 2, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuming Jiang; Kelvin C. K. Chan; Xintao Wang; Chen Change Loy; Ziwei Liu (2021). Webly-Reference SR Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/webly-reference-sr-dataset
Explore at:
Dataset updated
Jun 2, 2021
Authors
Yuming Jiang; Kelvin C. K. Chan; Xintao Wang; Chen Change Loy; Ziwei Liu
Description
Webly-Reference SR dataset is a test dataset for evaluating Ref-SR methods. It has the following advantages:

Collected in a more realistic way: For every input image, its reference image is searched using Google Image. More diverse than previous datasets.
i
Photoacoustic Source Detection and Reflection Artifact Deep Learning Dataset...
ieee-dataport.org
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Derek Allman (2025). Photoacoustic Source Detection and Reflection Artifact Deep Learning Dataset [Dataset]. https://ieee-dataport.org/open-access/photoacoustic-source-detection-and-reflection-artifact-deep-learning-dataset
Explore at:
Dataset updated
Jun 17, 2025
Authors
Derek Allman
Description
circular
w
Dataset of news where entities equals cryptos and section equals business
workwithdata.com
Updated May 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of news where entities equals cryptos and section equals business [Dataset]. https://www.workwithdata.com/datasets/news?f=2&fcol0=entities&fcol1=section&fop0=%3D&fop1=%3D&fval0=cryptos&fval1=business
Explore at:
Dataset updated
May 16, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about news. It has 2,973 rows and is filtered where the entities includes cryptos and the section is business. It features 10 columns including source, publication date, section, and news link.
a
Data from: LMR: A Large-Scale Multi-Reference Dataset for Reference-based...
academictorrents.com
bittorrent
Updated May 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2023). LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution [Dataset]. https://academictorrents.com/details/39424bb06d9172ac1c50fe4426eca51697bb4bfc
Explore at:
bittorrent(56048390273)Available download formats
Dataset updated
May 27, 2023
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
It is widely agreed that reference-based super-resolution (RefSR) achieves superior results by referring to similar high quality images, compared to single image super-resolution (SISR). Intuitively, the more references, the better performance. However, previous RefSR methods have all focused on single-reference image training, while multiple reference images are often available in testing or practical applications. The root cause of such training-testing mismatch is the absence of publicly available multi-reference SR training datasets, which greatly hinders research efforts on multi-reference super-resolution. To this end, we construct a large-scale, multi-reference super-resolution dataset, named LMR. It contains 112,142 groups of 300x300 training images, which is 10x of the existing largest RefSR dataset. The image size is also much larger. More importantly, each group is equipped with 5 reference images with different similarity levels. Furthermore, we propose a new baseline metho
A
‘All Datasets’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘All Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-all-datasets-0d4e/a1b5ff87/?iid=003-250&v=presentation
Explore at:
Dataset updated
Jan 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘All Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/5a546923a3a7295c2417f21f on 14 January 2022.

--- Dataset description provided by original source is as follows ---

This dataset represents the global coverage of navitia.io.

It contains all the datasets we have so far in our database.

--- Original source retains full ownership of the source dataset ---
Z
Global remote industrial heat sources dataset
data.niaid.nih.gov
zenodo.org
Updated Jan 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ma Caihong (2024). Global remote industrial heat sources dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8308132
Explore at:
Dataset updated
Jan 26, 2024
Dataset authored and provided by
Ma Caihong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data content: Based on the VIIRS (Visible Infrared Imaging Radiometer Suite) sensor medium resolution 375mNPP-VIIRS active thermal anomaly data, field research, and other big data of the earth, we constructed the global continental region of high-energy-consuming industrial heat source product data set, totaling 25,544 data. After validation 23232 items are industrial heat source objects, and the recognition accuracy is 90.95%. The output format is shapefile.

Time range of data:2012-2021 Spatial scope: Global continental area Projection method: WGS84 Volume of data: The total volume of data is about 3346kb. Type of data: Vector
w
Dataset of news about countries yearly
workwithdata.com
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of news about countries yearly [Dataset]. https://www.workwithdata.com/datasets/news?f=1&fcol0=entities&fop0=%3D&fval0=countries_yearly
Explore at:
Dataset updated
May 16, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about news. It has 4,538 rows and is filtered where the entities includes countries_yearly. It features 10 columns including source, publication date, section, and news link.
Sample Leads Dataset
kaggle.com
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ThatSean (2022). Sample Leads Dataset [Dataset]. https://www.kaggle.com/datasets/thatsean/sample-leads-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 24, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ThatSean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is based on the Sample Leads Dataset and is intended to allow some simple filtering by lead source. I had modified this dataset to support an upcoming Towards Data Science article walking through the process. Link to be shared once published.
d
GeoNatShapes: a natural feature reference dataset for mapping and AI...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). GeoNatShapes: a natural feature reference dataset for mapping and AI training [Dataset]. https://catalog.data.gov/dataset/geonatshapes-a-natural-feature-reference-dataset-for-mapping-and-ai-training
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
These data were compiled for the use of training natural feature machine learning (GeoAI) detection and delineation. The natural feature classes include the Geographic Names Information System (GNIS) feature types Basins, Bays, Bends, Craters, Gaps, Guts, Islands, Lakes, Ridges and Valleys, and are an areal representation of those GNIS point features. Features were produced using heads-up digitizing from 2018 to 2019 by Dr. Sam Arundel's team at the U.S. Geological Survey, Center of Excellence for Geospatial Information Science, Rolla, Missouri, USA, and Dr. Wenwen Li's team in the School of Geographical Sciences at Arizona State University, Tempe, Arizona, USA.
MultiCaRe
kaggle.com
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Nievas Offidani (2025). MultiCaRe [Dataset]. http://doi.org/10.34740/kaggle/ds/7190455
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/7190455
Dataset updated
Apr 19, 2025
Dataset provided by
Kaggle
Authors
Mauro Nievas Offidani
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The MultiCaRe dataset contains multi-modal data from over 70,000 open access and de-identified case reports from PubMed Central. The full dataset includes metadata, clinical cases, image captions and more than 130,000 images, but this Kaggle dataset contains only the textual clinical cases and their embeddings.

The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). The license information and the citation data of each article can be found in the metadata.parquet file from the Zenodo repository.
CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes...
zenodo.org
zip
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leon Moonen; Leon Moonen; Linas Vidziunas; Linas Vidziunas (2024). CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes from Open-Source Software [Dataset]. http://doi.org/10.5281/zenodo.13118970
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13118970
Dataset updated
Jul 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Leon Moonen; Leon Moonen; Linas Vidziunas; Linas Vidziunas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

This release, v1.0.8, covers all published CVEs up to 23 July 2024. All open-source projects that were reported in CVE records in the NVD in this time frame _and_ had publicly available git repositories were fetched and considered for the construction of this vulnerability dataset. The dataset is organized as a relational database and covers 12107 vulnerability fixing commits in 4249 open source projects for a total of 11873 CVEs in 272 different Common Weakness Enumeration (CWE) types. The dataset includes the source code before and after changing 51342 files and 138974 functions. The collection took 48 hours with 4 workers (AMD EPYC Genoa-X 9684X).

This repository includes the SQL dump of the dataset, as well as the JSON for the CVEs and XML of the CWEs at the time of collection. The complete process has been documented in the paper "CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software", which is published in the Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). You will find a copy of the paper in the Doc folder.

Citation and Zenodo links

Please cite this work by referring to the published paper:

Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21). ACM, 10 pages. https://doi.org/10.1145/3475960.3475985

@inproceedings{bhandari2021:cvefixes, title = {{CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software}}, booktitle = {{Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE '21)}}, author = {Bhandari, Guru and Naseer, Amara and Moonen, Leon}, year = {2021}, pages = {10}, publisher = {{ACM}}, doi = {10.1145/3475960.3475985}, copyright = {Open Access}, isbn = {978-1-4503-8680-7}, language = {en} }

The dataset has been released on Zenodo with DOI:10.5281/zenodo.4476563. The GitHub repository containing the code to automatically collect the dataset can be found at https://github.com/secureIT-project/CVEfixes, released with DOI:10.5281/zenodo.5111494.
Data from: Construction Motion Data Library: An Integrated Motion Dataset...
figshare.com
zip
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanyuan TIAN; Heng Li; Hongzhi Cui; jiayu Chen (2022). Construction Motion Data Library: An Integrated Motion Dataset for On-Site Activity Recognition [Dataset]. http://doi.org/10.6084/m9.figshare.20480787.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20480787.v3
Dataset updated
Oct 31, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yuanyuan TIAN; Heng Li; Hongzhi Cui; jiayu Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Through collecting 16 relatively small-scale motion datasets and conducting a series of in-lab expreiment, we established a 3D skeleton dataset for recognizing construction worker actions. All skeleton data were processed in four major steps, including uniform data extraction, skeleton structure alignment, resampling, and coordination transformation. Then all the aligned skeleton data will be manually annotated into four activity categories and assigned with labels. Experiment version: It contains over 61,275 samples (10 million frames) from 73 classes performed by about 300 different subjects.The dataset includes four fundamental categories of activities, including Production Activities(12), Unsafe Activities(38), Awkward Activities(10), and Common Activities(13).
However, We have carefully reviewed the licenses of all the current datasets. We found more than half of the datasets did not specify their licenses and usage policy. Therefore, in this version, we only shared the tagged and processed dataset that clearly allows redistribution and modification. For the rest of the datasets, we highlighted their URL and doi (all of them are publicly accessible and free for use). Instead of providing the processed data, we public the full preprocess codes on GitHub, which could be used to retag and process (such as converting to predefined .bvh files). All readers and users could process the source dataset by themselves. Public version： Construction Motion Data Library(CML) contains 6131 samples(ALL_DATA); among them, and 4333 samples are highly related to construction activities ( Construction_Related_Data). GitHub: https://github.com/YUANYUAN2222/Integrated-public-3D-skeleton-form-CML-library.
f
10 Years Bug-Fix Dataset (PROMISE'19)
figshare.com
search.datacite.org
zip
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renan Vieira (2021). 10 Years Bug-Fix Dataset (PROMISE'19) [Dataset]. http://doi.org/10.6084/m9.figshare.8852084.v5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8852084.v5
Dataset updated
Sep 27, 2021
Dataset provided by
figshare
Authors
Renan Vieira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package of the paper "From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects"ABSTRACT:Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git).We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.You can find the full paper at: https://doi.org/10.1145/3345629.3345639If you use this dataset for your research, please reference the following paper:@inproceedings{Vieira:2019:RBC:3345629.3345639, author = {Vieira, Renan and da Silva, Ant^{o}nio and Rocha, Lincoln and Gomes, Jo~{a}o Paulo}, title = {From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects}, booktitle = {Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering}, series = {PROMISE'19}, year = {2019}, isbn = {978-1-4503-7233-6}, location = {Recife, Brazil}, pages = {80--89}, numpages = {10}, url = {http://doi.acm.org/10.1145/3345629.3345639}, doi = {10.1145/3345629.3345639}, acmid = {3345639}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Bug-Fix Dataset, Mining Software Repositories, Software Traceability}, } P.S: We added a new dataset version (v1.0.1). In this version, we fix the git commit features that track the src and test files. More info can be found in the fix-script.py file.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163

Data from: A Large-scale Dataset of (Open Source) License Text Variants

Explore at:

Dataset updated

Mar 31, 2022

Dataset authored and provided by

Stefano Zacchiroli

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

For more details see the included README file and companion paper:

Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

Clear search

Close search

Google apps

Main menu

Data from: A Large-scale Dataset of (Open Source) License Text Variants

Open Source And General Resource Software

Data from: Standards Incorporated by Reference (SIBR) Database

Cline Center Coup d’État Project Dataset

Longitudinal Microbial Source Tracking Dataset

Dataset of news about Uzbekistan

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

Webly-Reference SR Dataset Dataset

Photoacoustic Source Detection and Reflection Artifact Deep Learning Dataset...

Dataset of news where entities equals cryptos and section equals business

Data from: LMR: A Large-Scale Multi-Reference Dataset for Reference-based...

‘All Datasets’ analyzed by Analyst-2

Global remote industrial heat sources dataset

Dataset of news about countries yearly

Sample Leads Dataset

GeoNatShapes: a natural feature reference dataset for mapping and AI...

MultiCaRe

CVEfixes Dataset: Automatically Collected Vulnerabilities and Their Fixes...

Data from: Construction Motion Data Library: An Integrated Motion Dataset...

10 Years Bug-Fix Dataset (PROMISE'19)

Data from: A Large-scale Dataset of (Open Source) License Text VariantsSee More Versions

Data from: A Large-scale Dataset of (Open Source) License Text Variants