24 datasets found
  1. h

    codenet

    • huggingface.co
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    System K Dev. (2024). codenet [Dataset]. https://huggingface.co/datasets/systemk/codenet
    Explore at:
    Dataset updated
    Feb 14, 2024
    Dataset authored and provided by
    System K Dev.
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

      Dataset Sources [optional]
    

    Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/systemk/codenet.

  2. P

    Project CodeNet Dataset

    • paperswithcode.com
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruchir Puri; David S. Kung; Geert Janssen; Wei zhang; Giacomo Domeniconi; Vladimir Zolotov; Julian Dolby; Jie Chen; Mihir Choudhury; Lindsey Decker; Veronika Thost; Luca Buratti; Saurabh Pujar; Shyam Ramji; Ulrich Finkler; Susan Malaika; Frederick Reiss (2022). Project CodeNet Dataset [Dataset]. https://paperswithcode.com/dataset/project-codenet
    Explore at:
    Dataset updated
    Jun 10, 2022
    Authors
    Ruchir Puri; David S. Kung; Geert Janssen; Wei zhang; Giacomo Domeniconi; Vladimir Zolotov; Julian Dolby; Jie Chen; Mihir Choudhury; Lindsey Decker; Veronika Thost; Luca Buratti; Saurabh Pujar; Shyam Ramji; Ulrich Finkler; Susan Malaika; Frederick Reiss
    Description

    Project CodeNet is a large-scale dataset with approximately 14 million code samples, each of which is an intended solution to one of 4000 coding problems. The code samples are written in over 50 programming languages (although the dominant languages are C++, C, Python, and Java) and they are annotated with a rich set of information, such as its code size, memory footprint, cpu run time, and status, which indicates acceptance or error types. The dataset is accompanied by a repository, where we provide a set of tools to aggregate codes samples based on user criteria and to transform code samples into token sequences, simplified parse trees and other code graphs. A detailed discussion of Project CodeNet is available in this paper.

    The rich annotation of Project CodeNet enables research in code search, code completion, code-code translation, and a myriad of other use cases. We also extracted several benchmarks in Python, Java and C++ to drive innovation in deep learning and machine learning models in code classification and code similarity.

    Citation @inproceedings{puri2021codenet, author = {Ruchir Puri and David Kung and Geert Janssen and Wei Zhang and Giacomo Domeniconi and Vladmir Zolotov and Julian Dolby and Jie Chen and Mihir Choudhury and Lindsey Decker and Veronika Thost and Luca Buratti and Saurabh Pujar and Ulrich Finkler}, title = {Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks}, year = {2021}, }

  3. h

    codenet

    • huggingface.co
    Updated Mar 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBM Research - University of Illinois Urbana Champaign Discovery Accelerator Institute (2024). codenet [Dataset]. https://huggingface.co/datasets/iidai/codenet
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 24, 2024
    Dataset authored and provided by
    IBM Research - University of Illinois Urbana Champaign Discovery Accelerator Institute
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    iidai/codenet dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    CodeNet

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indraneil Paul (2025). CodeNet [Dataset]. https://huggingface.co/datasets/iNeil77/CodeNet
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2025
    Authors
    Indraneil Paul
    Description

    iNeil77/CodeNet dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    CodeNet-16K

    • huggingface.co
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumuk Shashidhar (2024). CodeNet-16K [Dataset]. https://huggingface.co/datasets/sumuks/CodeNet-16K
    Explore at:
    Dataset updated
    Nov 21, 2024
    Authors
    Sumuk Shashidhar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📊 Dataset Card for 🏆 CodeNet-16K

      Dataset Summary
    

    The 🏆 CodeNet-16K dataset consists of 16,500 Python attempts from the CodeNet dataset, which have been carefully filtered and deduplicated to create a high-quality dataset for code generation tasks. The dataset includes problem descriptions, input/output descriptions, and sample test cases for each problem.

      Dataset Details
    
    
    
    
    
      Dataset Sources
    

    Repository:… See the full description on the dataset page: https://huggingface.co/datasets/sumuks/CodeNet-16K.

  6. Data from: Lost in Translation: A Study of Bugs Introduced by Large Language...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reza Ibrahimzada; Ali Reza Ibrahimzada (2024). Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code [Dataset]. http://doi.org/10.5281/zenodo.10447705
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ali Reza Ibrahimzada; Ali Reza Ibrahimzada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 15, 2023
    Description

    Artifact repository for the paper Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code, accepted at ICSE 2024, Lisbon, Portugal. Authors are Rangeet Pan* Ali Reza Ibrahimzada*, Rahul Krishna, Divya Sankar, Lambert Pougeum Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand.

    Install

    This repository contains the source code for reproducing the results in our paper. Please start by cloning this repository:

    We recommend using a virtual environment for running the scripts. Please download conda 23.11.0 from this link. You can create a virtual environment using the following command:

    conda create -n plempirical python=3.10.13
    

    After creating the virtual environment, you can activate it using the following command:

    conda activate plempirical
    

    You can run the following command to make sure that you are using the correct version of Python:

    python3 --version && pip3 --version
    

    Dependencies

    To install all software dependencies, please execute the following command:

    pip3 install -r requirements.txt
    

    As for hardware dependencies, we used 16 NVIDIA A100 GPUs with 80GBs of memory for inferencing models. The models can be inferenced on any combination of GPUs as long as the reader can properly distribute the model weights across the GPUs. We did not perform weight distribution since we had enough memory (80 GB) per GPU.

    Moreover, for compiling and testing the generated translations, we used Python 3.10, g++ 11, GCC Clang 14.0, Java 11, Go 1.20, Rust 1.73, and .Net 7.0.14 for Python, C++, C, Java, Go, Rust, and C#, respectively. Overall, we recommend using a machine with Linux OS and at least 32GB of RAM for running the scripts.

    For running scripts of alternative approaches, you need to make sure you have installed C2Rust, CxGO, and Java2C# on your machine. Please refer to their repositories for installation instructions. For Java2C#, you need to create a .csproj file like below:

    
    

    Dataset

    We uploaded the dataset we used in our empirical study to Zenodo. The dataset is organized as follows:

    1. CodeNet
    2. AVATAR
    3. Evalplus
    4. Apache Commons-CLI
    5. Click

    Please download and unzip the dataset.zip file from Zenodo. After unzipping, you should see the following directory structure:

    PLTranslationEmpirical
    ├── dataset
      ├── codenet
      ├── avatar
      ├── evalplus
      ├── real-life-cli
    ├── ...
    

    The structure of each dataset is as follows:

    1. CodeNet & Avatar: Each directory in these datasets correspond to a source language where each include two directories Code and TestCases for code snippets and test cases, respectively. Each code snippet has an id in the filename, where the id is used as a prefix for test I/O files.

    2. Evalplus: The source language code snippets follow a similar structure as CodeNet and Avatar. However, as a one time effort, we manually created the test cases in the target Java language inside a maven project, evalplus_java. To evaluate the translations from an LLM, we recommend moving the generated Java code snippets to the src/main/java directory of the maven project and then running the command mvn clean test surefire-report:report -Dmaven.test.failure.ignore=true to compile, test, and generate reports for the translations.

    3. Real-life Projects: The real-life-cli directory represents two real-life CLI projects from Java and Python. These datasets only contain code snippets as files and no test cases. As mentioned in the paper, the authors manually evaluated the translations for these datasets.

    Scripts

    We provide bash scripts for reproducing our results in this work. First, we discuss the translation script. For doing translation with a model and dataset, first you need to create a .env file in the repository and add the following:

    OPENAI_API_KEY=

    1. Translation with GPT-4: You can run the following command to translate all Python -> Java code snippets in codenet dataset with the GPT-4 while top-k sampling is k=50, top-p sampling is p=0.95, and temperature=0.7:

    bash scripts/translate.sh GPT-4 codenet Python Java 50 0.95 0.7 0
    

    2. Translation with CodeGeeX: Prior to running the script, you need to clone the CodeGeeX repository from here and use the instructions from their artifacts to download their model weights. After cloning it inside PLTranslationEmpirical and downloading the model weights, your directory structure should be like the following:

    PLTranslationEmpirical
    ├── dataset
      ├── codenet
      ├── avatar
      ├── evalplus
      ├── real-life-cli
    ├── CodeGeeX
      ├── codegeex
      ├── codegeex_13b.pt # this file is the model weight
      ├── ...
    ├── ...
    

    You can run the following command to translate all Python -> Java code snippets in codenet dataset with the CodeGeeX while top-k sampling is k=50, top-p sampling is p=0.95, and temperature=0.2 on GPU gpu_id=0:

    bash scripts/translate.sh CodeGeeX codenet Python Java 50 0.95 0.2 0
    

    3. For all other models (StarCoder, CodeGen, LLaMa, TB-Airoboros, TB-Vicuna), you can execute the following command to translate all Python -> Java code snippets in codenet dataset with the StarCoder|CodeGen|LLaMa|TB-Airoboros|TB-Vicuna while top-k sampling is k=50, top-p sampling is p=0.95, and temperature=0.2 on GPU gpu_id=0:

    bash scripts/translate.sh StarCoder codenet Python Java 50 0.95 0.2 0
    

    4. For translating and testing pairs with traditional techniques (i.e., C2Rust, CxGO, Java2C#), you can run the following commands:

    bash scripts/translate_transpiler.sh codenet C Rust c2rust fix_report
    bash scripts/translate_transpiler.sh codenet C Go cxgo fix_reports
    bash scripts/translate_transpiler.sh codenet Java C# java2c# fix_reports
    bash scripts/translate_transpiler.sh avatar Java C# java2c# fix_reports
    

    5. For compile and testing of CodeNet, AVATAR, and Evalplus (Python to Java) translations from GPT-4, and generating fix reports, you can run the following commands:

    bash scripts/test_avatar.sh Python Java GPT-4 fix_reports 1
    bash scripts/test_codenet.sh Python Java GPT-4 fix_reports 1
    bash scripts/test_evalplus.sh Python Java GPT-4 fix_reports 1
    

    6. For repairing unsuccessful translations of Java -> Python in CodeNet dataset with GPT-4, you can run the following commands:

    bash scripts/repair.sh GPT-4 codenet Python Java 50 0.95 0.7 0 1 compile
    bash scripts/repair.sh GPT-4 codenet Python Java 50 0.95 0.7 0 1 runtime
    bash scripts/repair.sh GPT-4 codenet Python Java 50 0.95 0.7 0 1 incorrect
    

    7. For cleaning translations of open-source LLMs (i.e., StarCoder) in codenet, you can run the following command:

    bash scripts/clean_generations.sh StarCoder codenet
    

    Please note that for the above commands, you can change the dataset and model name to execute the same thing for other datasets and models. Moreover, you can refer to /prompts for different vanilla and repair prompts used in our study.

    Artifacts

    Please download the artifacts.zip file from our Zenodo repository. We have organized the artifacts as follows:

    1. RQ1 - Translations: This directory contains the translations

  7. h

    CodeNet-24K

    • huggingface.co
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumuk Shashidhar (2024). CodeNet-24K [Dataset]. https://huggingface.co/datasets/sumuks/CodeNet-24K
    Explore at:
    Dataset updated
    Mar 29, 2024
    Authors
    Sumuk Shashidhar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    sumuks/CodeNet-24K dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    CodeNet-B

    • huggingface.co
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoQuIR (2025). CodeNet-B [Dataset]. https://huggingface.co/datasets/CoQuIR/CodeNet-B
    Explore at:
    Dataset updated
    May 31, 2025
    Authors
    CoQuIR
    Description

    CoQuIR/CodeNet-B dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. hs-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, hs-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/hs-code.net/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 3, 2025
    Description

    Explore the historical Whois records related to hs-code.net (Domain). Get insights into ownership history and changes over time.

  10. c

    Finansijski podaci za CODENET D.O.O.

    • companywall.rs
    Updated Jan 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agencija za privredne registre - APR (2025). Finansijski podaci za CODENET D.O.O. [Dataset]. https://www.companywall.rs/firma/codenet-doo/MMxCtDOPC
    Explore at:
    Dataset updated
    Jan 13, 2025
    Dataset authored and provided by
    Agencija za privredne registre - APR
    License

    http://www.companywall.rs/Home/Licencehttp://www.companywall.rs/Home/Licence

    Description

    Ovaj skup podataka uključuje finansijske izvještaje, račune i blokade, te nekretnine. Podaci uključuju prihode, rashode, dobit, imovinu, obaveze i informacije o nekretninama u vlasništvu kompanije. Finansijski podaci, finansijski sažetak, sažetak kompanije, preduzetnik, zanatlija, udruženje, poslovni subjekti.

  11. w

    Dataset of ISO 3 country code, net migration and self-employed workers of...

    • workwithdata.com
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of ISO 3 country code, net migration and self-employed workers of countries in Caribbean [Dataset]. https://www.workwithdata.com/datasets/countries?col=country%2Ccountry_code_3%2Cnet_migration%2Cself_employed_pct&f=1&fcol0=region&fop0=%3D&fval0=Caribbean
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Caribbean
    Description

    This dataset is about countries in Caribbean. It has 13 rows. It features 4 columns: ISO 3 country code, self-employed workers, and net migration.

  12. chess-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Updated Nov 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2023). chess-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/chess-code.net/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jun 29, 2025
    Description

    Explore the historical Whois records related to chess-code.net (Domain). Get insights into ownership history and changes over time.

  13. code_contests

    • huggingface.co
    Updated Sep 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepmind (2022). code_contests [Dataset]. https://huggingface.co/datasets/deepmind/code_contests
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 17, 2022
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Authors
    Deepmind
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for CodeContests

      Dataset Summary
    

    CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode. It consists of programming problems, from a variety of sources:

    Site URL Source

    Aizu https://judge.u-aizu.ac.jp CodeNet

    AtCoder https://atcoder.jp CodeNet

    CodeChef https://www.codechef.com description2code

    Codeforces https://codeforces.com description2code and Codeforces

    HackerEarth… See the full description on the dataset page: https://huggingface.co/datasets/deepmind/code_contests.

  14. c-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, c-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/c-code.net/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 11, 2025
    Description

    Explore the historical Whois records related to c-code.net (Domain). Get insights into ownership history and changes over time.

  15. short-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Updated Jan 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2017). short-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/short-code.net/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 13, 2017
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 14, 2025
    Description

    Explore the historical Whois records related to short-code.net (Domain). Get insights into ownership history and changes over time.

  16. free-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, free-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/free-code.net/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 12, 2025
    Description

    Explore the historical Whois records related to free-code.net (Domain). Get insights into ownership history and changes over time.

  17. h

    codenet_python

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    codenet_python [Dataset]. https://huggingface.co/datasets/windchimeran/codenet_python
    Explore at:
    Authors
    Haoran Zhang
    Description

    This is dataset is extracted from CodeNet, python only. I merged the data into one single table, including metadata, problem description, test input output.

    small: accepted status only big: all status, including accepted

  18. w

    plasma-simulation-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, plasma-simulation-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/plasma-simulation-code.net/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jun 22, 2025
    Description

    Explore the historical Whois records related to plasma-simulation-code.net (Domain). Get insights into ownership history and changes over time.

  19. pop-code.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, pop-code.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/pop-code.net/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Jul 14, 2025
    Description

    Explore the historical Whois records related to pop-code.net (Domain). Get insights into ownership history and changes over time.

  20. h

    AIGCodeSet

    • huggingface.co
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Basak Demirok (2025). AIGCodeSet [Dataset]. https://huggingface.co/datasets/basakdemirok/AIGCodeSet
    Explore at:
    Dataset updated
    May 12, 2025
    Authors
    Basak Demirok
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    LLM vs Human Code Dataset

    A Benchmark Dataset for AI-generated and Human-written Code Classification

      Description
    

    This dataset contains code samples generated by various Large Language Models (LLMs), including CodeStral (Mistral AI), Gemini (Google DeepMind), and CodeLLaMA (Meta), along with human-written codes from CodeNet. The dataset is designed to support research on distinguishing LLM-generated code from human-written code.

      Dataset Structure
    
    
    
    
    
      1.… See the full description on the dataset page: https://huggingface.co/datasets/basakdemirok/AIGCodeSet.
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
System K Dev. (2024). codenet [Dataset]. https://huggingface.co/datasets/systemk/codenet

codenet

systemk/codenet

Explore at:
Dataset updated
Feb 14, 2024
Dataset authored and provided by
System K Dev.
License

https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

Description

Dataset Card for Dataset Name

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

  Dataset Details





  Dataset Description

Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

  Dataset Sources [optional]

Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/systemk/codenet.

Search
Clear search
Close search
Google apps
Main menu