100+ datasets found

kaggle api
kaggle.com
Updated Aug 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
v1nor1 (2021). kaggle api [Dataset]. https://www.kaggle.com/datasets/v1olet1nor1/kaggle-api/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
v1nor1
Description
Dataset

This dataset was created by v1nor1

Contents
i
Malware API Call Dataset
ieee-dataport.org
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferhat Ozgur Catak (2022). Malware API Call Dataset [Dataset]. https://ieee-dataport.org/open-access/malware-api-call-dataset
Explore at:
Dataset updated
May 18, 2022
Authors
Ferhat Ozgur Catak
Description
This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts.
FEC API
catalog.data.gov
cloud.csiss.gmu.edu
+2more
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Election Commission (2020). FEC API [Dataset]. https://catalog.data.gov/dataset/fec-api
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
Federal Election Commissionhttp://www.fec.gov/
Description
The FEC is A RESTful web service supporting full-text and field-specific searches on federal campaign finance data.
R
Api Dataset
universe.roboflow.com
zip
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unimap (2024). Api Dataset [Dataset]. https://universe.roboflow.com/unimap-l1jkd/api-uguqk
Explore at:
zipAvailable download formats
Dataset updated
Jan 5, 2024
Dataset authored and provided by
Unimap
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Api Bounding Boxes
Description
Api

## Overview Api is a dataset for object detection tasks - it contains Api annotations for 496 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
c
OpenFEMA Dataset Fields - API
s.cnmilf.com
catalog.data.gov
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unspecified (2022). OpenFEMA Dataset Fields - API [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/openfema-dataset-fields-api
Explore at:
Dataset updated
Oct 14, 2022
Dataset provided by
Unspecified
Description
The dataset lists the fields for each of the published data sets available via the OpenFEMA APIs

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

zenodo.org
explore.openaire.eu
+1more

csv, zip

Updated Jan 27, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. http://doi.org/10.5281/zenodo.5907002

Explore at:

zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5907002

Dataset updated

Jan 27, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.

The datasets are available under directory dataset. There are 4 datasets in this directory.

1. apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system.
2. apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).
3. apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.
4. apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.

In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.

The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.

More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).

References:

1. GumTree

* https://github.com/GumTreeDiff/gumtree

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324

2. PyDriller

* https://pydriller.readthedocs.io/en/latest/

* Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911

i
Web API and Mashup Dataset
ieee-dataport.org
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guosheng Kang (2021). Web API and Mashup Dataset [Dataset]. https://ieee-dataport.org/documents/web-api-and-mashup-dataset
Explore at:
Dataset updated
Mar 30, 2021
Authors
Guosheng Kang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Web API and Mushup dataset from ProgrammableWeb
api yolo v12
kaggle.com
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mengkoding 47 (2025). api yolo v12 [Dataset]. https://www.kaggle.com/datasets/mengkoding47/api-yolo-v12/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mengkoding 47
Description
Dataset

This dataset was created by mengkoding 47

Contents
i
CSDMC2010 Malware API Sequence Dataset
impactcybertrust.org
Updated Jul 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2023). CSDMC2010 Malware API Sequence Dataset [Dataset]. http://doi.org/10.23721/100/1504303
Explore at:
Unique identifier
https://doi.org/10.23721/100/1504303
Dataset updated
Jul 28, 2023
Authors
External Data Source
Description
Malware calls are classified and labeled '1' and benign software calls are labeled '0'. The calls are presented in sequential order. CSDM_API_Train.csv contains 388 logs. CSDM_API_TestData.csv contains 378 unclassified logs. CSDM_API_TestLable.csv contains the classifications for CSDM_API_TestData.csv. This data was collected by API monitors during a data mining competition at the International Conference on Neural Information Processing (ICNIP) in Sydney, Austrailia 2010.
Z
Data from: AOL Dataset for Browsing History and Topics of Interest
data.niaid.nih.gov
zenodo.org
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nunes, Gabriel Henrique (2024). AOL Dataset for Browsing History and Topics of Interest [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11029571
Explore at:
Dataset updated
Jun 24, 2024
Dataset authored and provided by
Nunes, Gabriel Henrique
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AOL Dataset for Browsing History and Topics of Interest

This record provides the datasets of the paper The Privacy-Utility Trade-off in the Topics API.

The datasets generating code and the experimental results can be found in 10.5281/zenodo.11032231 (github.com/nunesgh/topics-api-analysis).

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
f
ChatGPT API and BERT NLP
figshare.com
application/csv
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carmen Atkins (2024). ChatGPT API and BERT NLP [Dataset]. http://doi.org/10.6084/m9.figshare.25403407.v2
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25403407.v2
Dataset updated
Mar 13, 2024
Dataset provided by
figshare
Authors
Carmen Atkins
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
input_prompts.csv provides the inputs for the ChatGPT API (countries and their respective prompts).topic_consolidations.csv contains the 4,018 unique topics listed across all ChatGPT responses to prompts in our study and their corresponding cluster labels after applying K-means++ clustering (n = 50) via natural language processing with Bidirectional Encoder Representations from Transformers (BERT). ChatGPT response topics come from both versions (3.5 and 4) over 10 iterations each (per each country).ChatGPT_prompt_automation.ipynb is the Jupyter notebook of Python code used to run the API to prompt ChatGPT and gather responses.topic_consolidation_BERT.ipynb is the Jupyter notebook of Python code used to process the 4,018 unique topics gathered through BERT NLP. This code was adapted from Vimal Pillar on Kaggle (https://www.kaggle.com/code/vimalpillai/text-clustering-with-sentence-bert).
h
openai-moderation-api-evaluation
huggingface.co
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Mathys (2024). openai-moderation-api-evaluation [Dataset]. https://huggingface.co/datasets/mmathys/openai-moderation-api-evaluation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Authors
Max Mathys
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Evaluation dataset for the paper "A Holistic Approach to Undesired Content Detection"

The evaluation dataset data/samples-1680.jsonl.gz is the test set used in this paper. Each line contains information about one sample in a JSON object and each sample is labeled according to our taxonomy. The category label is a binary flag, but if it does not include in the JSON, it means we do not know the label.

Category Label Definition

sexual S Content meant to arouse sexual… See the full description on the dataset page: https://huggingface.co/datasets/mmathys/openai-moderation-api-evaluation.
i
Malware Analysis Datasets: API Call Sequences
ieee-dataport.org
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2019). Malware Analysis Datasets: API Call Sequences [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-api-call-sequences
Explore at:
Dataset updated
Dec 12, 2019
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
797 malware API call sequences and 1
Data.gov CKAN API
catalog.data.gov
datadiscoverystudio.org
+4more
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2020). Data.gov CKAN API [Dataset]. https://catalog.data.gov/dataset/data-gov-ckan-api
Explore at:
Dataset updated
Nov 10, 2020
Dataset provided by
Data.govhttps://data.gov/
Description
The data.gov catalog is powered by CKAN, a powerful open source data platform that includes a robust API. Please be aware that data.gov and the data.gov CKAN API only contain metadata about datasets. This metadata includes URLs and descriptions of datasets, but it does not include the actual data within each dataset.
Moby Bikes API - Dataset - data.gov.ie
data.gov.ie
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2025). Moby Bikes API - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/moby-bikes
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Moby is a licensed dockless bike-share scheme within the Dublin region. This page includes an API developed according to the General Bikeshare Feed Specification (GBFS) (e.g.) information about vehicles, stations, pricing, etc. The current location of the vehicles is updated every five minutes. In addition, this page includes historical files of bike location data. Disclaimer - Please note that some of the historical files are empty due to historical data issues.
Meteorite Landings API
catalog.data.gov
datasets.ai
+5more
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2025). Meteorite Landings API [Dataset]. https://catalog.data.gov/dataset/meteorite-landings-api
Explore at:
Dataset updated
May 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
API using comprehensive data set from The Meteoritical Society that contains information on all of the known meteorite landings.
Z
88.6 Million Developer Comments from GitHub
data.niaid.nih.gov
zenodo.org
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin S. Meyers (2024). 88.6 Million Developer Comments from GitHub [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5596536
Explore at:
Dataset updated
Jan 4, 2024
Dataset provided by
Andrew Meneely
Benjamin S. Meyers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This is a collection of developer comments from GitHub issues, commits, and pull requests. We collected 88,640,237 developer comments from 17,378 repositories. In total, this dataset includes:

54,252,380 issue comments (from 13,458,208 issues)

979,642 commit comments (from 49,710,108 commits)

33,408,215 pull request comments (from 12,680,373 pull requests)

Warning: The uploaded dataset is compressed from 185GB down to 25.1GB.

Purpose

The purpose of this dataset (corpus) is to provide a large dataset of software developer comments (natural language) for research. We intend to use this data in our own research, but we hope it will be helpful for other researchers.

Collection Process

Full implementation details can be found in the following publication:

Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023.

Data was downloaded using GitHub's GraphQL API via requests made with Python's requests library. We targeted 17,491 repositories with the following criteria:

At least 850 stars.

Primary language in the Top 50 from the TIOBE Index and/or listed as "popular" in GitHub's advanced search. Note that we collected the list of languages on August 31, 2021.

Due to design decisions made by GitHub, we could only get a list of at most 1,000 repositories for each target language. Comments from 113 repositories could not be downloaded for various reasons (failing API queries, JSONDecoderErrors, etc.). Eight target languages had no repositories matching the above criteria.

After collection using the GraphQL API, data was written to CSV using Python's csv.writer class. We highly recommend using Python's csv.reader to parse these CSV files as no newlines have been removed from developer comments.

88_million_developer_comments.zip

This zip file contains 135 CSV files; 3 per language. CSV names are formatted _.csv, with being the name of the primary language and being one of co (commits), is (issues), or pr (pull requests).

Languages included are: ABAP, Assembly, C, C# (C-Sharp), C++ (C-PlusPlus), Clojure, COBOL, CoffeeScript, CSS, Dart, D, DM, Elixir, Fortran, F# (F-Sharp), Go, Groovy, HTML, Java, JavaScript, Julia, Kotlin, Lisp, Lua, MATLAB, Nim, Objective-C, Pascal, Perl, PHP, PowerShell, Prolog, Python, R, Ruby, Rust, Scala, Scheme, Scratch, Shell, Swift, TSQL, TypeScript, VBScript, and VHDL.

Details on the columns in each CSV file are described in the provided README.md.

Detailed_Breakdown.ods

This spreadsheet contains specific details on how many repositories, commits, issues, pull requests, and comments are included in 88_million_developer_comments.zip.

Note On Completeness

We make no guarantee that every commit, issue, and/or pull request for each repository is included in this dataset. Due to the nature of the GraphQL API and data decoding difficulties, sometimes a query failed and that data is not included here.

Versioning

v1.1: The original corpus had duplicate header rows in the CSV files. This has been fixed.

v1.0: Original corpus.

Contact

Please contact Benjamin S. Meyers (email) with questions about this data and its collection.

Acknowledgments

Collection of this data has been sponsored in part by the National Science Foundation grant 1922169, and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).

This data was collected using the compute resources from the Research Computing department at the Rochester Institute of Technology. doi:10.34788/0S3G-QD15
R
Api Final Dataset
universe.roboflow.com
zip
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ABC (2025). Api Final Dataset [Dataset]. https://universe.roboflow.com/abc-kecbu/api-final
Explore at:
zipAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
ABC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Final Bounding Boxes
Description
Api Final

## Overview Api Final is a dataset for object detection tasks - it contains Final annotations for 210 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Met Éireann forecast API - Dataset - data.gov.ie
data.gov.ie
Updated Mar 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2019). Met Éireann forecast API - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/met-eireann-forecast-api
Explore at:
Dataset updated
Mar 28, 2019
Dataset provided by
data.gov.ie
Description
Disclaimer This API services and data offering is scheduled for upgrade starting Q1 2024. Every effort will be made to maintain data access during the upgrade period, and services/data will be provided on a best effort basis.
R
Asap Dan Api Dataset
universe.roboflow.com
zip
Updated Sep 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
smoke and fire (2024). Asap Dan Api Dataset [Dataset]. https://universe.roboflow.com/smoke-and-fire-hou5e/asap-dan-api/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 3, 2024
Dataset authored and provided by
smoke and fire
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Smoke Fire Smoke And Fire None
Description
Asap Dan Api

## Overview Asap Dan Api is a dataset for classification tasks - it contains Smoke Fire Smoke And Fire None annotations for 586 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Facebook

Twitter

Click to copy link

Link copied

Cite

v1nor1 (2021). kaggle api [Dataset]. https://www.kaggle.com/datasets/v1olet1nor1/kaggle-api/discussion

kaggle api

Explore at:

146 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 23, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

v1nor1

Description

Dataset

This dataset was created by v1nor1

Clear search

Close search

Google apps

Main menu

kaggle api

Dataset

Contents

Malware API Call Dataset

FEC API

Api Dataset

Api

OpenFEMA Dataset Fields - API

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

Web API and Mashup Dataset

api yolo v12

Dataset

Contents

CSDMC2010 Malware API Sequence Dataset

Data from: AOL Dataset for Browsing History and Topics of Interest

ChatGPT API and BERT NLP

openai-moderation-api-evaluation

Malware Analysis Datasets: API Call Sequences

Data.gov CKAN API

Moby Bikes API - Dataset - data.gov.ie

Meteorite Landings API

88.6 Million Developer Comments from GitHub

Api Final Dataset

Api Final

Met Éireann forecast API - Dataset - data.gov.ie

Asap Dan Api Dataset

Asap Dan Api

kaggle api

Dataset

Contents