100+ datasets found

h
code-text-java
huggingface.co
Updated Jul 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semeru Lab (2023). code-text-java [Dataset]. https://huggingface.co/datasets/semeru/code-text-java
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2023
Dataset authored and provided by
Semeru Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset is imported from CodeXGLUE and pre-processed using their script.

Where to find in Semeru:

The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-text/java in Semeru

CodeXGLUE -- Code-To-Text Task Definition

The task is to generate natural language comments for a code, and evaluted by smoothed bleu-4 score.

Dataset

The dataset we use comes from CodeSearchNet and we filter the dataset as the following:

Remove… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-text-java.
P
Vulnerability Java Dataset Dataset
paperswithcode.com
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexey Shestov; Rodion Levichev; Ravil Mussabayev; Evgeny Maslov; Anton Cheshkov; Pavel Zadorozhny (2024). Vulnerability Java Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/vulnerability-java-dataset
Explore at:
Dataset updated
Jul 4, 2024
Authors
Alexey Shestov; Rodion Levichev; Ravil Mussabayev; Evgeny Maslov; Anton Cheshkov; Pavel Zadorozhny
Description
The dataset consists of two versions: $X_1$ with $P_3$ and $X_1$ without $P_3$, where $P_3$ represents a set of random unchanged functions from vulnerability fixing commits. This dataset is designed for finetuning large language models to detect vulnerabilities in code. It can be used for training and evaluating models in automated vulnerability detection tasks.
h
instructional_code-search-net-java
huggingface.co
Updated May 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Tarin Morales (2023). instructional_code-search-net-java [Dataset]. https://huggingface.co/datasets/Nan-Do/instructional_code-search-net-java
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 24, 2023
Authors
Fernando Tarin Morales
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "instructional_code-search-net-java"

Dataset Summary

This is an instructional dataset for Java. The dataset contains two different kind of tasks:

Given a piece of code generate a description of what it does. Given a description generate a piece of code that fulfils the description.

Languages

The dataset is in English.

Data Splits

There are no splits.

Dataset Creation

May of 2023

Curation Rationale

This dataset… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/instructional_code-search-net-java.
Data from: DataTD: A Dataset of Java Projects Including Test Doubles
zenodo.org
zip
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mengzhen Li; Mattia Fazzini; Mengzhen Li; Mattia Fazzini (2025). DataTD: A Dataset of Java Projects Including Test Doubles [Dataset]. http://doi.org/10.5281/zenodo.14796282
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14796282
Dataset updated
Feb 3, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mengzhen Li; Mattia Fazzini; Mengzhen Li; Mattia Fazzini
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Description
This dataset contains 1,070 open-source Java projects that include test doubles. The projects were mined from GitHub. The dataset was built by selecting all projects whose primary language is Java and that had at least five stars as of October 29, 2023. This list of projects is available in java_repositories_with_five_stars.txt. The 1,070 projects in this dataset use Maven as their build system, contain JUnit tests, and utilize Mockito to create test doubles. The projects are available in the project.zip archive file. Additionally, the dataset includes metadata about the projects, stored in the projects.json file. This metadata describes the characteristics of each project, along with test double definitions, stubbings, and verifications. Finally, we also provide the source code used to build DataTD, enabling future research on expanding and utilizing the dataset.
P
DeepCom-Java Dataset
paperswithcode.com
Updated Jan 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). DeepCom-Java Dataset [Dataset]. https://paperswithcode.com/dataset/deepcom-java
Explore at:
Dataset updated
Jan 17, 2022
Description
The Java dataset introduced in DeepCom (Deep Code Comment Generation), commonly used to evaluate automated code summarization.
P
ManySStuBs4J Dataset
paperswithcode.com
opendatalab.com
Updated Jun 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael-Michael Karampatsis; Charles Sutton (2021). ManySStuBs4J Dataset [Dataset]. https://paperswithcode.com/dataset/manysstubs4j
Explore at:
Dataset updated
Jun 10, 2021
Authors
Rafael-Michael Karampatsis; Charles Sutton
Description
The ManySStuBs4J corpus is a collection of simple fixes to Java bugs, designed for evaluating program repair techniques. We collect all bug-fixing changes using the SZZ heuristic, and then filter these to obtain a data set of small bug fix changes. These are single statement fixes, classified where possible into one of 16 syntactic templates which we call SStuBs. The dataset contains simple statement bugs mined from open-source Java projects hosted in GitHub. There are two variants of the dataset. One mined from the 100 Java Maven Projects and one mined from the top 1000 Java Projects.

The dataest contains 153,652 single statement bugfix changes mined from 1,000 popular open-source Java projects, annotated by whether they match any of a set of 16 bug templates, inspired by state-of-the-art program repair techniques.
h
code-code-translation-java-csharp
huggingface.co
Updated Jul 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semeru Lab (2017). code-code-translation-java-csharp [Dataset]. https://huggingface.co/datasets/semeru/code-code-translation-java-csharp
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2017
Dataset authored and provided by
Semeru Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset is imported from CodeXGLUE and pre-processed using their script.

Where to find in Semeru:

The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-code/code-to-code-trans in Semeru

CodeXGLUE -- Code2Code Translation Task Definition

Code translation aims to migrate legacy software from one programming language in a platform toanother. In CodeXGLUE, given a piece of Java (C#) code, the task is to translate the code into C#… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-code-translation-java-csharp.
Data from: Dataset of Functionally Equivalent Java Methods
zenodo.org
zip
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoshiki Higo; Yoshiki Higo (2022). Dataset of Functionally Equivalent Java Methods [Dataset]. http://doi.org/10.5281/zenodo.5905349
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5905349
Dataset updated
Jan 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yoshiki Higo; Yoshiki Higo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of functionally equivalent Java methods.

This dataset is published as a supplemental data as the following submission.

Yoshiki Higo, Shinsuke Matsumoto, Shinji Kusumoto, and Kazuya Yasuda, "Constructing Dataset of Functionally Equivalent Java Methods Using Automated Test Generation Techniques", submitted to MSR 2022.

This dataset includes 276 groups of functionally equivalent Java methods, which have been manually verified by the authors.

The 276 groups include 728 Java methods in total.
w
Dataset of books called Java : a framework for programming and problem...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Java : a framework for programming and problem solving [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Java+%3A+a+framework+for+programming+and+problem+solving
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the book is Java : a framework for programming and problem solving. It features 7 columns including author, publication date, language, and book publisher.
N
Java, New York Median Income by Age Groups Dataset: A Comprehensive...
neilsberg.com
csv, json
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Java, New York Median Income by Age Groups Dataset: A Comprehensive Breakdown of Java town Annual Median Income Across 4 Key Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e93cbd70-f353-11ef-8577-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 25, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Java
Variables measured
Income for householder under 25 years, Income for householder 65 years and over, Income for householder between 25 and 44 years, Income for householder between 45 and 64 years
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across four age groups (Under 25 years, 25 to 44 years, 45 to 64 years, and 65 years and over) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the distribution of median household income among distinct age brackets of householders in Java town. Based on the latest 2019-2023 5-Year Estimates from the American Community Survey, it displays how income varies among householders of different ages in Java town. It showcases how household incomes typically rise as the head of the household gets older. The dataset can be utilized to gain insights into age-based household income trends and explore the variations in incomes across households.

Key observations: Insights from 2023

In terms of income distribution across age cohorts, in Java town, the median household income stands at $105,057 for householders within the 25 to 44 years age group, followed by $103,750 for the 45 to 64 years age group. Notably, householders within the 65 years and over age group, had the lowest median household income at $59,125.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Age groups classifications include:

Under 25 years

25 to 44 years

45 to 64 years

65 years and over

Variables / Data Columns

Age Of The Head Of Household: This column presents the age of the head of household

Median Household Income: Median household income, in 2023 inflation-adjusted dollars for the specific age group

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Java town median household income by age. You can refer the same here
o
Dataset of Functionally Equivalent Java Methods
explore.openaire.eu
Updated Jan 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoshiki Higo (2022). Dataset of Functionally Equivalent Java Methods [Dataset]. http://doi.org/10.5281/zenodo.5896268
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5896268
Dataset updated
Jan 24, 2022
Authors
Yoshiki Higo
Description
This is a dataset of functionally equivalent Java methods. This dataset is published as a supplemental data as the following submission. MSR2022: Constructing Dataset of Functionally Equivalent Java Methods Using Automated Test Generation Techniques (in the double-blined manner) This dataset includes 276 groups of functionally equivalent Java methods, which have been manually verified by the authors. The 276 groups include 728 Java methods in total.
E
GitHub Java Corpus
dtechtive.com
find.data.gov.scot
gz, txt
Updated Jan 10, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh: School of Informatics (2017). GitHub Java Corpus [Dataset]. http://doi.org/10.7488/ds/1690
Explore at:
gz(0.6836 MB), gz(1836.032 MB), txt(0.0028 MB), txt(0.0166 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/1690
Dataset updated
Jan 10, 2017
Dataset provided by
University of Edinburgh: School of Informatics
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GitHub Java Corpus is a snapshot of all open-source Java code on GitHub in October 2012 that is contained in open-source projects that at the time had at least one fork. It contains code from 14,785 projects amounting to about 352 million lines of code. The dataset has been used to study coding practice in Java at a large scale.
d
Java Ocean Atlas - Reid/Mantyla Section Data, a Library of More than 2000...
catalog.data.gov
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). Java Ocean Atlas - Reid/Mantyla Section Data, a Library of More than 2000 Oceanographic Sections Developed from the Cruises Used in the Reid/Mantyla Pre-WOCE Data Set (NCEI Accession 0001456) [Dataset]. https://catalog.data.gov/dataset/java-ocean-atlas-reid-mantyla-section-data-a-library-of-more-than-2000-oceanographic-sections-d
Explore at:
Dataset updated
Jun 1, 2025
Dataset provided by
(Point of Contact)
Description
This dataset includes data from approximately 12,000 stations that J. L. Reid and A. W. Mantyla have used in various world ocean studies. These data have been accumulated for the purpose of global ocean studies and are not intended for fine scale analyses. Each station represents the best station available for that locality at the time of the selection. The set was compiled over many years and from many sources and has been brought up to date as new data have become available. Most of the data were obtained from the National Oceanographic Data Center (NODC). The others came directly from various P.I.s in various formats and may lack some NODC parameters such as ship, country, and institution codes and NODC accession number. It should be noted that these are edited data files and an accurate account of deletions and corrections is, unfortunately, not available. In some cases these data may not agree exactly with versions published later or data supplied later by the NODC or an originator. Only stations that reach close to the bottom were chosen. This means, unfortunately, that the set is rather sparse near the equator. It is believed that the temperature and salinity measurements are acceptable. However, some of the oxygen and nutrient data are quite poor. They have not been eliminated from the data set, but simply ignore them in hand-contouring. They would have been eliminated if the troubles they cause in computer-contouring or instant atlases has been understood, but this set was begun before such methods were generally available. A few known systematic errors such as IGY oxygens or early Discovery oxygens and silicates have been adjusted, based upon deep comparisons with more modern data. In a few localities, stations have been reoccupied many times and a mean composite profile is given; at other localities, only the most recent, or the best sampled profile is saved and all others deleted. Because of the large-scale scope intended for this data array, some closely-spaced stations have been omitted. When needed, those stations can be retrieved from tapes of the entire cruise.
h
code-search-net-java
huggingface.co
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Tarin Morales (2025). code-search-net-java [Dataset]. https://huggingface.co/datasets/Nan-Do/code-search-net-java
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2025
Authors
Fernando Tarin Morales
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "code-search-net-java"

Dataset Summary

This dataset is the Java portion of the CodeSarchNet annotated with a summary column.The code-search-net dataset includes open source functions that include comments found at GitHub.The summary is a short description of what the function does.

Languages

The dataset's comments are in English and the functions are coded in Java

Data Splits

Train, test, validation labels are included in the… See the full description on the dataset page: https://huggingface.co/datasets/Nan-Do/code-search-net-java.
Atoms of Confusion Dataset in Java Programs
zenodo.org
zip
Updated Sep 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wendell Mendes; Oton Pinheiro; Windson Viana; Lincoln Rocha; Emanuele Santos; Wendell Mendes; Oton Pinheiro; Windson Viana; Lincoln Rocha; Emanuele Santos (2022). Atoms of Confusion Dataset in Java Programs [Dataset]. http://doi.org/10.5281/zenodo.7065842
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7065842
Dataset updated
Sep 10, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wendell Mendes; Oton Pinheiro; Windson Viana; Lincoln Rocha; Emanuele Santos; Wendell Mendes; Oton Pinheiro; Windson Viana; Lincoln Rocha; Emanuele Santos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Double-checked gold standard dataset of Atoms of Confusion in Java. Data extracted from the main source code package of four open-source projects, excluding the test files. This dataset also includes a sample created from two other open-source projects.

Project Version Repository
FastUtil 8.5.6 https://github.com/vigna/fastutil
Moshi 1.12.0 https://github.com/square/moshi
Jimfs 1.2 https://github.com/google/jimfs
uCrop 2.2.7 https://github.com/Yalantis/uCrop

The sample was created by extracting Java files from the following projects:

Project Version Repository
Guava 31.0.1 https://github.com/google/guava
Redisson 3.6.16 https://github.com/redisson/redisson
Z
Embedding Java Classes with code2vec - Java Datasets
data.niaid.nih.gov
zenodo.org
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2020). Embedding Java Classes with code2vec - Java Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3575196
Explore at:
Dataset updated
Jul 1, 2020
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a collection of Java class classification datasets (i.e., classify a class into one of a set of categories), collected for the research work 'Embedding Java Classes with code2vec: Improvements from Variable Obfuscation'. These are shared for further research in static code analysis tasks (malware classification, author attribution, etc).

Obfuscation & Pipeline Code: Download

code2vec Models: Download
w
Dataset of books called Java 9 modularity : patterns and practices for...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Java 9 modularity : patterns and practices for developing maintainable applications [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Java+9+modularity+%3A+patterns+and+practices+for+developing+maintainable+applications
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Java 9 modularity : patterns and practices for developing maintainable applications. It features 7 columns including author, publication date, language, and book publisher.
w
Dataset of books called Reactive programming with Java 9 : develop...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Reactive programming with Java 9 : develop concurrent and asynchronous applications with Java 9 [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Reactive+programming+with+Java+9+%3A+develop+concurrent+and+asynchronous+applications+with+Java+9
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Reactive programming with Java 9 : develop concurrent and asynchronous applications with Java 9. It features 7 columns including author, publication date, language, and book publisher.
P
CodeQA Dataset
paperswithcode.com
Updated Dec 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenxiao Liu; Xiaojun Wan (2023). CodeQA Dataset [Dataset]. https://paperswithcode.com/dataset/codeqa
Explore at:
Dataset updated
Dec 29, 2023
Authors
Chenxiao Liu; Xiaojun Wan
Description
CodeQA is a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs.

Description from: CodeQA: A Question Answering Dataset for Source Code Comprehension
P
Code comments in Java, Python, and Pharo Dataset
paperswithcode.com
Updated Apr 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Code comments in Java, Python, and Pharo Dataset [Dataset]. https://paperswithcode.com/dataset/code-comments-in-java-python-and-pharo
Explore at:
Dataset updated
Apr 26, 2023
Description
It contains the dataset of class comments extracted from various projects of three programming languages Java, Pharo, and Python

Project	Version	Repository
FastUtil	8.5.6	https://github.com/vigna/fastutil
Moshi	1.12.0	https://github.com/square/moshi
Jimfs	1.2	https://github.com/google/jimfs
uCrop	2.2.7	https://github.com/Yalantis/uCrop

Project	Version	Repository
Guava	31.0.1	https://github.com/google/guava
Redisson	3.6.16	https://github.com/redisson/redisson

Facebook

Twitter

Click to copy link

Link copied

Cite

Semeru Lab (2023). code-text-java [Dataset]. https://huggingface.co/datasets/semeru/code-text-java

code-text-java

semeru/code-text-java

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 6, 2023

Dataset authored and provided by

Semeru Lab

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset is imported from CodeXGLUE and pre-processed using their script.

  Where to find in Semeru:

The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-text/java in Semeru

  CodeXGLUE -- Code-To-Text





  Task Definition

The task is to generate natural language comments for a code, and evaluted by smoothed bleu-4 score.

  Dataset

The dataset we use comes from CodeSearchNet and we filter the dataset as the following:

Remove… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-text-java.

Clear search

Close search

Google apps

Main menu

code-text-java

Vulnerability Java Dataset Dataset

instructional_code-search-net-java

Data from: DataTD: A Dataset of Java Projects Including Test Doubles

DeepCom-Java Dataset

ManySStuBs4J Dataset

code-code-translation-java-csharp

Data from: Dataset of Functionally Equivalent Java Methods

Dataset of books called Java : a framework for programming and problem...

Java, New York Median Income by Age Groups Dataset: A Comprehensive...

About this dataset

Content

Inspiration

Recommended for further research

Dataset of Functionally Equivalent Java Methods

GitHub Java Corpus

Java Ocean Atlas - Reid/Mantyla Section Data, a Library of More than 2000...

code-search-net-java

Atoms of Confusion Dataset in Java Programs

Embedding Java Classes with code2vec - Java Datasets

Dataset of books called Java 9 modularity : patterns and practices for...

Dataset of books called Reactive programming with Java 9 : develop...

CodeQA Dataset

Code comments in Java, Python, and Pharo Dataset

code-text-javaSee More Versions

semeru/code-text-java

code-text-java