https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/
Dataset Card for "code_x_glue_cc_code_refinement"
Dataset Summary
CodeXGLUE code-refinement dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-refinement We use the dataset released by this paper(https://arxiv.org/pdf/1812.08693.pdf). The source side is a Java function with bugs and the target side is the refined one. All the function and variable names are normalized. Their dataset contains two subsets ( i.e.small and medium)… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_code_refinement.
https://github.com/microsoft/CodeXGLUE#licensehttps://github.com/microsoft/CodeXGLUE#license
CodeXGLUE Clone-detection-BigCloneBench dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench Given two codes as the input, the task is to do binary classification (0/1), where 1 stands for semantic equivalence and 0 for others. Models are evaluated by F1 score. The dataset we use is BigCloneBench and filtered following the paper Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree.
https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/
Dataset Card for "code_x_glue_cc_cloze_testing_all"
Dataset Summary
CodeXGLUE ClozeTesting-all dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/ClozeTesting-all Cloze tests are widely adopted in Natural Languages Processing to evaluate the performance of the trained language models. The task is aimed to predict the answers for the blank with the context of the blank, which can be formulated as a multi-choice classification problem.… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_cloze_testing_all.
A large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.
https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/
Dataset Card for "code_x_glue_tt_text_to_text"
Dataset Summary
CodeXGLUE text-to-text dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Text-Text/text-to-text The dataset we use is crawled and filtered from Microsoft Documentation, whose document located at https://github.com/MicrosoftDocs/.
Supported Tasks and Leaderboards
machine-translation: The dataset can be used to train a model for translating Technical documentation… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_tt_text_to_text.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset is imported from CodeXGLUE and pre-processed using their script.
Where to find in Semeru:
The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-code/CodeCompletion-token/dataset/py150 in Semeru
CodeXGLUE -- Code Completion (token level)
Update 2021.07.30: We update the code completion dataset with literals normalized to avoid sensitive information. Here is the introduction and pipeline for token level code completion… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-code-CodeCompletion-TokenLevel-Python.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/
Dataset Card for "code_x_glue_cc_code_refinement"
Dataset Summary
CodeXGLUE code-refinement dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-refinement We use the dataset released by this paper(https://arxiv.org/pdf/1812.08693.pdf). The source side is a Java function with bugs and the target side is the refined one. All the function and variable names are normalized. Their dataset contains two subsets ( i.e.small and medium)… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_code_refinement.