6 datasets found
  1. code_x_glue_cc_code_refinement

    • huggingface.co
    • opendatalab.com
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2024). code_x_glue_cc_code_refinement [Dataset]. https://huggingface.co/datasets/google/code_x_glue_cc_code_refinement
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2024
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/

    Description

    Dataset Card for "code_x_glue_cc_code_refinement"

      Dataset Summary
    

    CodeXGLUE code-refinement dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-refinement We use the dataset released by this paper(https://arxiv.org/pdf/1812.08693.pdf). The source side is a Java function with bugs and the target side is the refined one. All the function and variable names are normalized. Their dataset contains two subsets ( i.e.small and medium)… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_code_refinement.

  2. O

    code-x-glue-cc-clone-detection-big-clone-bench

    • opendatalab.com
    • huggingface.co
    zip
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Saskatchewan (2023). code-x-glue-cc-clone-detection-big-clone-bench [Dataset]. https://opendatalab.com/OpenDataLab/code-x-glue-cc-clone-detection-big-clone-bench
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Queen’s University
    University of Saskatchewan
    License

    https://github.com/microsoft/CodeXGLUE#licensehttps://github.com/microsoft/CodeXGLUE#license

    Description

    CodeXGLUE Clone-detection-BigCloneBench dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench Given two codes as the input, the task is to do binary classification (0/1), where 1 stands for semantic equivalence and 0 for others. Models are evaluated by F1 score. The dataset we use is BigCloneBench and filtered following the paper Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree.

  3. code_x_glue_cc_cloze_testing_all

    • hf-proxy-cf.effarig.site
    • huggingface.co
    Updated Feb 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). code_x_glue_cc_cloze_testing_all [Dataset]. https://hf-proxy-cf.effarig.site/datasets/google/code_x_glue_cc_cloze_testing_all
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2022
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/

    Description

    Dataset Card for "code_x_glue_cc_cloze_testing_all"

      Dataset Summary
    

    CodeXGLUE ClozeTesting-all dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/ClozeTesting-all Cloze tests are widely adopted in Natural Languages Processing to evaluate the performance of the trained language models. The task is aimed to predict the answers for the blank with the context of the blank, which can be formulated as a multi-choice classification problem.… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_cloze_testing_all.

  4. O

    CodeXGLUE-CONCODE

    • opendatalab.com
    • huggingface.co
    zip
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heriot-Watt University (2024). CodeXGLUE-CONCODE [Dataset]. https://opendatalab.com/OpenDataLab/CodeXGLUE-CONCODE
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Heriot-Watt University
    Description

    A large dataset with over 100,000 examples consisting of Java classes from online code repositories, and develop a new encoder-decoder architecture that models the interaction between the method documentation and the class environment.

  5. code_x_glue_tt_text_to_text

    • huggingface.co
    • opendatalab.com
    Updated Feb 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). code_x_glue_tt_text_to_text [Dataset]. https://huggingface.co/datasets/google/code_x_glue_tt_text_to_text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2022
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/

    Description

    Dataset Card for "code_x_glue_tt_text_to_text"

      Dataset Summary
    

    CodeXGLUE text-to-text dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Text-Text/text-to-text The dataset we use is crawled and filtered from Microsoft Documentation, whose document located at https://github.com/MicrosoftDocs/.

      Supported Tasks and Leaderboards
    

    machine-translation: The dataset can be used to train a model for translating Technical documentation… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_tt_text_to_text.

  6. h

    code-code-CodeCompletion-TokenLevel-Python

    • huggingface.co
    Updated Jan 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semeru Lab (2025). code-code-CodeCompletion-TokenLevel-Python [Dataset]. https://huggingface.co/datasets/semeru/code-code-CodeCompletion-TokenLevel-Python
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2025
    Dataset authored and provided by
    Semeru Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset is imported from CodeXGLUE and pre-processed using their script.

      Where to find in Semeru:
    

    The dataset can be found at /nfs/semeru/semeru_datasets/code_xglue/code-to-code/CodeCompletion-token/dataset/py150 in Semeru

      CodeXGLUE -- Code Completion (token level)
    

    Update 2021.07.30: We update the code completion dataset with literals normalized to avoid sensitive information. Here is the introduction and pipeline for token level code completion… See the full description on the dataset page: https://huggingface.co/datasets/semeru/code-code-CodeCompletion-TokenLevel-Python.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google (2024). code_x_glue_cc_code_refinement [Dataset]. https://huggingface.co/datasets/google/code_x_glue_cc_code_refinement
Organization logo

code_x_glue_cc_code_refinement

CodeXGlueCcCodeRefinement

google/code_x_glue_cc_code_refinement

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2024
Dataset authored and provided by
Googlehttp://google.com/
License

https://choosealicense.com/licenses/c-uda/https://choosealicense.com/licenses/c-uda/

Description

Dataset Card for "code_x_glue_cc_code_refinement"

  Dataset Summary

CodeXGLUE code-refinement dataset, available at https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-refinement We use the dataset released by this paper(https://arxiv.org/pdf/1812.08693.pdf). The source side is a Java function with bugs and the target side is the refined one. All the function and variable names are normalized. Their dataset contains two subsets ( i.e.small and medium)… See the full description on the dataset page: https://huggingface.co/datasets/google/code_x_glue_cc_code_refinement.

Search
Clear search
Close search
Google apps
Main menu