3 datasets found
  1. h

    the-vault-class

    • huggingface.co
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FPT Software AI Center (2023). the-vault-class [Dataset]. https://huggingface.co/datasets/Fsoft-AIC/the-vault-class
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2023
    Dataset authored and provided by
    FPT Software AI Center
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Vault is a multilingual code-text dataset with over 40 million pairs covering 10 popular programming languages. It is the largest corpus containing parallel code-text data. By building upon The Stack, a massive raw code sample collection, the Vault offers a comprehensive and clean resource for advancing research in code understanding and generation. It provides a high-quality dataset that includes code-text pairs at multiple levels, such as class and inline-level, in addition to the function level. The Vault can serve many purposes at multiple levels.

  2. h

    the-vault-inline

    • huggingface.co
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FPT Software AI Center (2023). the-vault-inline [Dataset]. https://huggingface.co/datasets/Fsoft-AIC/the-vault-inline
    Explore at:
    Dataset updated
    Dec 14, 2023
    Dataset authored and provided by
    FPT Software AI Center
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Vault is a multilingual code-text dataset with over 34 million pairs covering 10 popular programming languages. It is the largest corpus containing parallel code-text data. By building upon The Stack, a massive raw code sample collection, the Vault offers a comprehensive and clean resource for advancing research in code understanding and generation. It provides a high-quality dataset that includes code-text pairs at multiple levels, such as class and inline-level, in addition to the function level. The Vault can serve many purposes at multiple levels.

  3. h

    the-vault-function

    • huggingface.co
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FPT Software AI Center (2023). the-vault-function [Dataset]. https://huggingface.co/datasets/Fsoft-AIC/the-vault-function
    Explore at:
    Dataset updated
    Dec 14, 2023
    Dataset authored and provided by
    FPT Software AI Center
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Vault is a multilingual code-text dataset with over 40 million pairs covering 10 popular programming languages. It is the largest corpus containing parallel code-text data. By building upon The Stack, a massive raw code sample collection, the Vault offers a comprehensive and clean resource for advancing research in code understanding and generation. It provides a high-quality dataset that includes code-text pairs at multiple levels, such as class and inline-level, in addition to the function level. The Vault can serve many purposes at multiple levels.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FPT Software AI Center (2023). the-vault-class [Dataset]. https://huggingface.co/datasets/Fsoft-AIC/the-vault-class

the-vault-class

The Vault Function

Fsoft-AIC/the-vault-class

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2023
Dataset authored and provided by
FPT Software AI Center
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

The Vault is a multilingual code-text dataset with over 40 million pairs covering 10 popular programming languages. It is the largest corpus containing parallel code-text data. By building upon The Stack, a massive raw code sample collection, the Vault offers a comprehensive and clean resource for advancing research in code understanding and generation. It provides a high-quality dataset that includes code-text pairs at multiple levels, such as class and inline-level, in addition to the function level. The Vault can serve many purposes at multiple levels.

Search
Clear search
Close search
Google apps
Main menu