99 datasets found
  1. h

    the-stack-v2

    • huggingface.co
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2024). the-stack-v2 [Dataset]. https://huggingface.co/datasets/bigcode/the-stack-v2
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    BigCode
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    The Stack v2

    The dataset consists of 4 versions:

    bigcode/the-stack-v2: the full "The Stack v2" dataset <-- you are here bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-train-full-ids: based on the bigcode/the-stack-v2-dedup dataset but further filtered with heuristics and spanning 600+ programming languages. The data is grouped into repositories.bigcode/the-stack-v2-train-smol-ids: based on the… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-v2.

  2. h

    the-stack-dedup

    • huggingface.co
    • opendatalab.com
    Updated Oct 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2022). the-stack-dedup [Dataset]. https://huggingface.co/datasets/bigcode/the-stack-dedup
    Explore at:
    Dataset updated
    Oct 27, 2022
    Dataset authored and provided by
    BigCode
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for The Stack

      Changelog
    

    Release Description

    v1.0 Initial release of the Stack. Included 30 programming languages and 18 permissive licenses. Note: Three included licenses (MPL/EPL/LGPL) are considered weak copyleft licenses. The resulting near-deduplicated dataset is 1.5TB in size.

    v1.1 The three copyleft licenses ((MPL/EPL/LGPL) were excluded and the list of permissive licenses extended to 193 licenses in total. The list of programming… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-dedup.

  3. P

    The Stack Dataset

    • paperswithcode.com
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denis Kocetkov; Raymond Li; Loubna Ben allal; Jia Li; Chenghao Mou; Carlos Muñoz Ferrandis; Yacine Jernite; Margaret Mitchell; Sean Hughes; Thomas Wolf; Dzmitry Bahdanau; Leandro von Werra; Harm de Vries (2022). The Stack Dataset [Dataset]. https://paperswithcode.com/dataset/the-stack
    Explore at:
    Dataset updated
    Oct 28, 2022
    Authors
    Denis Kocetkov; Raymond Li; Loubna Ben allal; Jia Li; Chenghao Mou; Carlos Muñoz Ferrandis; Yacine Jernite; Margaret Mitchell; Sean Hughes; Thomas Wolf; Dzmitry Bahdanau; Leandro von Werra; Harm de Vries
    Description

    The Stack contains over 3TB of permissively-licensed source code files covering 30 programming languages crawled from GitHub. The dataset was created as part of the BigCode Project, an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs).

  4. h

    the-stack-v2-java

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShangyiGeng, the-stack-v2-java [Dataset]. https://huggingface.co/datasets/Reset23/the-stack-v2-java
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    ShangyiGeng
    Description

    Reset23/the-stack-v2-java dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. w

    Dataset of stack of companies in Dearborn

    • workwithdata.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies in Dearborn [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack&f=1&fcol0=city&fop0=%3D&fval0=Dearborn
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dearborn
    Description

    This dataset is about companies in Dearborn. It has 129 rows. It features 2 columns including stack.

  6. w

    Dataset of stack of companies

    • workwithdata.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about companies. It has 3,456,808 rows. It features 2 columns including stack. It is 81% filled with non-null values.

  7. w

    Dataset of stack of companies in Brasília

    • workwithdata.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies in Brasília [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack&f=1&fcol0=city&fop0=%3D&fval0=Bras%C3%ADlia
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brasília
    Description

    This dataset is about companies in Brasília. It has 215 rows. It features 2 columns including stack.

  8. w

    Dataset of stack of companies in Mosta

    • workwithdata.com
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies in Mosta [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack&f=1&fcol0=city&fop0=%3D&fval0=Mosta
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mosta
    Description

    This dataset is about companies in Mosta. It has 1 row. It features 2 columns including stack.

  9. h

    the-stack-v2-filtered2-cpp

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShangyiGeng, the-stack-v2-filtered2-cpp [Dataset]. https://huggingface.co/datasets/Reset23/the-stack-v2-filtered2-cpp
    Explore at:
    Authors
    ShangyiGeng
    Description

    Reset23/the-stack-v2-filtered2-cpp dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. w

    Dataset of stack of companies in Bubikon

    • workwithdata.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies in Bubikon [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack&f=1&fcol0=city&fop0=%3D&fval0=Bubikon
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bubikon
    Description

    This dataset is about companies in Bubikon. It has 10 rows. It features 2 columns including stack.

  11. w

    Dataset of stack of companies in Derby

    • workwithdata.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies in Derby [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack&f=1&fcol0=city&fop0=%3D&fval0=Derby
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about companies in Derby. It has 391 rows. It features 2 columns including stack.

  12. f

    NDPI-Sample-2-Stack-9 (WSI)

    • figshare.com
    application/dicom
    Updated Feb 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubraj Gupta (2022). NDPI-Sample-2-Stack-9 (WSI) [Dataset]. http://doi.org/10.6084/m9.figshare.19134380.v1
    Explore at:
    application/dicomAvailable download formats
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    figshare
    Authors
    Yubraj Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dicomized distinct proprietary files of microscope imaging modalities

  13. NDPI-Sample-2-Stack-3 (WSI)

    • figshare.com
    application/dicom
    Updated Feb 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubraj Gupta (2022). NDPI-Sample-2-Stack-3 (WSI) [Dataset]. http://doi.org/10.6084/m9.figshare.19134341.v1
    Explore at:
    application/dicomAvailable download formats
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yubraj Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dicomized distinct proprietary files of microscope imaging modalities

  14. f

    NDPI-Sample-2-Stack-5 (WSI)

    • figshare.com
    application/dicom
    Updated Feb 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubraj Gupta (2022). NDPI-Sample-2-Stack-5 (WSI) [Dataset]. http://doi.org/10.6084/m9.figshare.19134347.v1
    Explore at:
    application/dicomAvailable download formats
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    figshare
    Authors
    Yubraj Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dicomized distinct proprietary files of microscope imaging modalities

  15. P

    JHU CoSTAR Block Stacking Dataset Dataset

    • paperswithcode.com
    Updated Mar 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Hundt; Varun Jain; Chia-Hung Lin; Chris Paxton; Gregory D. Hager (2018). JHU CoSTAR Block Stacking Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/jhu-costar-block-stacking-dataset
    Explore at:
    Dataset updated
    Mar 14, 2019
    Authors
    Andrew Hundt; Varun Jain; Chia-Hung Lin; Chris Paxton; Gregory D. Hager
    Description

    Involves data where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data.

  16. Stack Overflow Developer Survey, 2017 A look into the lives of over 64,000...

    • dataandsons.com
    csv, zip
    Updated Jun 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verka Bicic (2018). Stack Overflow Developer Survey, 2017 A look into the lives of over 64,000 Stack Overflow developers [Dataset]. https://www.dataandsons.com/categories/surveys/stack-overflow-developer-survey-2017-a-look-into-the-lives-of-over-64-000-stack-overflow-developers
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 28, 2018
    Dataset provided by
    Authors
    Verka Bicic
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2017 - Nov 5, 2017
    Description

    About this Dataset

    Every year, Stack Overflow conducts a massive survey of people on the site, covering all sorts of information like programming languages, salary, code style and various other information. This year, they amassed more than 64,000 responses fielded from 213 countries. Data The data is made up of two files: 1. survey_results_public.csv - CSV file with main survey results, one respondent per row and one column per answer 2. survey_results_schema.csv - CSV file with survey schema, i.e., the questions that correspond to each column name m Acknowledgements Data is directly taken from StackOverflow and licensed under the ODbL license.

    Category

    Surveys

    Keywords

    internet,Information Technology,coding

    Row Count

    51248

    Price

    Free

  17. w

    Dataset of stack of companies in China

    • workwithdata.com
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of stack of companies in China [Dataset]. https://www.workwithdata.com/datasets/companies?col=company%2Cstack&f=1&fcol0=country&fop0=%3D&fval0=China
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset is about companies in China. It has 32,433 rows. It features 2 columns including stack.

  18. How solution snippets are presented in answers posted on Stack Overflow and...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). How solution snippets are presented in answers posted on Stack Overflow and how they could be potentially reused. [Dataset]. http://doi.org/10.5281/zenodo.5819318
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Software engineering Q&A websites (e.g., Stack Overflow), harness the collective expertise of users to address technical queries. Over time, these platforms evolve into valuable repositories of software engineering knowledge. Such repositories serve as essential resources for developers looking for solutions to common programming problems. In Stack Overflow, developers may approach answering questions in various ways. Gaining insight into how developers formulate their answers on Stack Overflow can enhance knowledge sharing and streamline the process of finding solutions. Furthermore, such insights could also inform improvements in Generative Artificial Intelligence (GenAI) tools to better align generated source code for comprehension and understandability, as AI-generated answers are known to include irrelevant information and hallucinations. In this study, we seek to deepen the understanding of how solutions are presented on Stack Overflow. We conducted an empirical study that investigates programming questions that are answered with a Solution Snippet to understand how a Solution Snippet is presented, and the ways how it should be adapted when it is reused. Our study resulted in two categorizations: 1) eight categories of how Solution Snippets are presented on Stack Overflow answers and 2) five categories of how Solution Snippets could be adapted for reuse. Then, we analyzed these categorizations and discussed the implications. We anticipate that Stack Overflow will remain a valuable resource for the foreseeable future, and the insights revealed in our paper lay the groundwork for improving program comprehension of Solution Snippets on Stack Overflow and GenAI tools.

  19. (Table 2) SPECMAP stack of stable oxygen isotopes covering the last 300 000...

    • doi.pangaea.de
    • search.dataone.org
    html, tsv
    Updated 1987
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicklas G Pisias; John D Imbrie; Douglas G Martinson; James D Hays; Theodore C Moore; Nicholas J Shackleton (1987). (Table 2) SPECMAP stack of stable oxygen isotopes covering the last 300 000 years [Dataset]. http://doi.org/10.1594/PANGAEA.56039
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    1987
    Dataset provided by
    PANGAEA
    Authors
    Nicklas G Pisias; John D Imbrie; Douglas G Martinson; James D Hays; Theodore C Moore; Nicholas J Shackleton
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Variables measured
    AGE, δ18O, Age, error, Age, comment, DEPTH, sediment/rock
    Description

    This dataset is about: (Table 2) SPECMAP stack of stable oxygen isotopes covering the last 300 000 years. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.726602 for more information.

  20. NDPI-Sample-2-Stack-6 (WSI)

    • figshare.com
    application/dicom
    Updated Feb 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubraj Gupta (2022). NDPI-Sample-2-Stack-6 (WSI) [Dataset]. http://doi.org/10.6084/m9.figshare.19134353.v1
    Explore at:
    application/dicomAvailable download formats
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    figshare
    Authors
    Yubraj Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dicomized distinct proprietary files of microscope imaging modalities

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BigCode (2024). the-stack-v2 [Dataset]. https://huggingface.co/datasets/bigcode/the-stack-v2

the-stack-v2

The-Stack-v2

bigcode/the-stack-v2

Explore at:
498 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 1, 2024
Dataset authored and provided by
BigCode
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

The Stack v2

The dataset consists of 4 versions:

bigcode/the-stack-v2: the full "The Stack v2" dataset <-- you are here bigcode/the-stack-v2-dedup: based on the bigcode/the-stack-v2 but further near-deduplicated bigcode/the-stack-v2-train-full-ids: based on the bigcode/the-stack-v2-dedup dataset but further filtered with heuristics and spanning 600+ programming languages. The data is grouped into repositories.bigcode/the-stack-v2-train-smol-ids: based on the… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-v2.

Search
Clear search
Close search
Google apps
Main menu