1 dataset found
  1. h

    the-stack

    • huggingface.co
    • opendatalab.com
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2022). the-stack [Dataset]. https://huggingface.co/datasets/bigcode/the-stack
    Explore at:
    Dataset updated
    Oct 27, 2022
    Dataset authored and provided by
    BigCode
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for The Stack

      Changelog
    

    Release Description

    v1.0 Initial release of the Stack. Included 30 programming languages and 18 permissive licenses. Note: Three included licenses (MPL/EPL/LGPL) are considered weak copyleft licenses. The resulting near-deduplicated dataset is 3TB in size.

    v1.1 The three copyleft licenses ((MPL/EPL/LGPL) were excluded and the list of permissive licenses extended to 193 licenses in total. The list of programming languages… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BigCode (2022). the-stack [Dataset]. https://huggingface.co/datasets/bigcode/the-stack

the-stack

The-Stack

bigcode/the-stack

Explore at:
52 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 27, 2022
Dataset authored and provided by
BigCode
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

Dataset Card for The Stack

  Changelog

Release Description

v1.0 Initial release of the Stack. Included 30 programming languages and 18 permissive licenses. Note: Three included licenses (MPL/EPL/LGPL) are considered weak copyleft licenses. The resulting near-deduplicated dataset is 3TB in size.

v1.1 The three copyleft licenses ((MPL/EPL/LGPL) were excluded and the list of permissive licenses extended to 193 licenses in total. The list of programming languages… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack.

Search
Clear search
Close search
Google apps
Main menu