100+ datasets found
  1. Data from: Stack Overflow

    • console.cloud.google.com
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Ab1KXg (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
    Explore at:
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. h

    stackoverflow-questions

    • huggingface.co
    Updated Sep 5, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paco Valdez (2012). stackoverflow-questions [Dataset]. https://huggingface.co/datasets/pacovaldez/stackoverflow-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 5, 2012
    Authors
    Paco Valdez
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for [Stackoverflow Post Questions]

      Dataset Description
    

    Companies that sell Open-source software tools usually hire an army of Customer representatives to try to answer every question asked about their tool. The first step in this process is the prioritization of the question. The classification scale usually consists of 4 values, P0, P1, P2, and P3, with different meanings across every participant in the industry. On the other hand, every software developer… See the full description on the dataset page: https://huggingface.co/datasets/pacovaldez/stackoverflow-questions.

  3. h

    stackoverflow-chat-dutch

    • huggingface.co
    Updated Jan 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bram Vanroy (2024). stackoverflow-chat-dutch [Dataset]. http://doi.org/10.57967/hf/0529
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2024
    Authors
    Bram Vanroy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Stack Overflow Chat Dutch

      Dataset Summary
    

    This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset. ☕ Want to help me out? Translating the data with the OpenAI API, and prompt testing, cost me 💸$133.60💸. If you like this dataset, please consider buying… See the full description on the dataset page: https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch.

  4. P

    Federated Stack Overflow Dataset

    • paperswithcode.com
    Updated Nov 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Federated Stack Overflow Dataset [Dataset]. https://paperswithcode.com/dataset/federated-stack-overflow
    Explore at:
    Dataset updated
    Nov 9, 2021
    Description

    This dataset is derived from the Stack Overflow Data hosted by kaggle.com and available to query through Kernels using the BigQuery API: https://www.kaggle.com/stackoverflow/stackoverflow

  5. h

    Data from: stackoverflow

    • huggingface.co
    Updated Dec 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development (2024). stackoverflow [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/stackoverflow
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2024
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/stackoverflow dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. P

    60k Stack Overflow Questions Dataset

    • paperswithcode.com
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Issa Annamoradnejad; Jafar Habibi; Mohammadamin Fazli (2022). 60k Stack Overflow Questions Dataset [Dataset]. https://paperswithcode.com/dataset/60k-stack-overflow-questions
    Explore at:
    Dataset updated
    Apr 8, 2022
    Authors
    Issa Annamoradnejad; Jafar Habibi; Mohammadamin Fazli
    Description

    The dataset contains 60,000 Stack Overflow questions from 2016-2020, classified into three categories:

    HQ: High-quality posts without a single edit. LQ_EDIT: Low-quality posts with a negative score, and multiple community edits. However, they still remain open after those changes. LQ_CLOSE: Low-quality posts that were closed by the community without a single edit.

    Notes

    Questions are sorted according to Question Id. Question body is in HTML format. All dates are in UTC format. The dataset is also accessible at https://www.kaggle.com/imoore/60k-stack-overflow-questions-with-quality-rate

    How to cite This is an original dataset, published under MIT License. Please cite the dataset for your usage as the following:

    @article{annamoradnejad2022multiview, title={Multi-View Approach to Suggest Moderation Actions in Community Question Answering Sites}, author={Annamoradnejad, Issa and Habibi, Jafar and Fazli, Mohammadamin}, journal = {Information Sciences}, volume = {600}, pages = {144-154}, year = {2022}, issn = {0020-0255}, doi = {https://doi.org/10.1016/j.ins.2022.03.085}, url = {https://www.sciencedirect.com/science/article/pii/S0020025522003127} }

  7. h

    ru_stackoverflow

    • huggingface.co
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilya Gusev (2024). ru_stackoverflow [Dataset]. https://huggingface.co/datasets/IlyaGusev/ru_stackoverflow
    Explore at:
    Dataset updated
    Jun 15, 2024
    Authors
    Ilya Gusev
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Russian StackOverflow dataset

      Description
    

    Summary: Dataset of questions, answers, and comments from ru.stackoverflow.com. Script: create_stackoverflow.py Point of Contact: Ilya Gusev Languages: The dataset is in Russian with some programming code.

      Usage
    

    Prerequisites: pip install datasets zstandard jsonlines pysimdjson

    Loading: from datasets import load_dataset dataset = load_dataset('IlyaGusev/ru_stackoverflow', split="train") for example in dataset:… See the full description on the dataset page: https://huggingface.co/datasets/IlyaGusev/ru_stackoverflow.

  8. Reddit and StackOverflow dataset (Programming languages)

    • zenodo.org
    zip
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele De Vinco; Daniele De Vinco; Alessia Antelmi; Alessia Antelmi (2023). Reddit and StackOverflow dataset (Programming languages) [Dataset]. http://doi.org/10.5281/zenodo.7685062
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniele De Vinco; Daniele De Vinco; Alessia Antelmi; Alessia Antelmi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains anonymized data collected from Reddit (via the Pushshift API) and StackOverflow (from Kaggle's dataset).

    Each folder includes the data split by trimester. The schema of StackOverflow and Reddit-related files follows:

    • Fields from StackOverflow
      • question_id
      • answer_id
      • creation_date - answer creation_date
      • score - score of the question/answer
      • tags - all tags flagged for a question
      • answer_count - number of answers for a question
      • start_question - question's time of creation
      • last_activity_date - last update on the question
      • new_id - hashed id of the answerer
      • q_new_id - hashed id of the questioner
    • Fields from Reddit
      • comment_id
      • submission_id
      • score - score of the question/submission
      • subreddit
      • created_utc - time of creation (unrelated to last modified comments)
      • new_id - hashed id

    The .txt files represent the structure of the corresponding hypergraphs.

  9. g

    Data from: Stack Overflow Dataset

    • gts.ai
    json
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Stack Overflow Dataset [Dataset]. https://gts.ai/dataset-download/stack-overflow-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    Description

    The Stack Overflow dataset, a detailed archive of posts, votes, tags, and badges from the world’s largest programmer community.

  10. E

    Stack Overflow Statistics And Facts (2025)

    • electroiq.com
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electro IQ (2025). Stack Overflow Statistics And Facts (2025) [Dataset]. https://electroiq.com/stats/stack-overflow-statistics/
    Explore at:
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Electro IQ
    License

    https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Stack Overflow Statistics: The 2024 Stack Overflow Developer Survey offers a comprehensive snapshot of the global developer community, compiling insights from 65,437 respondents across 185 countries. Conducted between May 19 and June 20, 2024, the survey had a median completion time of approximately 21 minutes.

    A significant 76% of developers reported using or planning to use AI tools in their development processes, marking an increase from 70% in 2023. However, trust in AI tool accuracy remains divided, with only 43% expressing confidence in their outputs. Despite this, 81% of developers identified increased productivity as the primary benefit of integrating AI tools into their workflows.

    Educational backgrounds among respondents show that 66% hold a Bachelor's or Master's degree, even though only 49% learned to code through formal education.

    Geographically, the United States accounted for 18.9% of respondents, followed by Germany at 8.4% and India at 7.2%, highlighting the survey's extensive international reach.

    This year's survey underscores the evolving landscape of software development, emphasizing the growing integration of AI tools, the shift towards self-directed learning, and the diverse global composition of the developer community.

    This article will highlight the Stack Overflow statistics and its performance.

  11. Z

    Replication package for the paper "What do Developers Discuss about Code...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
    Explore at:
    Dataset updated
    Jun 30, 2021
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RP-commenting-practices-multiple-sources

    Replication package for the paper "What do Developers Discuss about Code Comments?"

    Structure

    Appendix.pdf
    Tags-topics.md
    Stack-exchange-query.md
    
    RQ1/
      LDA_input/
        combined-so-quora-mallet-metadata.csv
        topic-input.mallet
    
      LDA_output/
        Mallet/
          output_csv/
            docs-in-topics.csv
            topic-words.csv
            topics-in-docs.csv
            topics-metadata.csv
          output_html/
            all_topics.html
            Docs/
            Topics/
    
    RQ2/
      datasource_rawdata/
        quora.csv
        stackoverflow.csv
      manual_analysis_output/
        stackoverflow_quora_taxonomy.xlsx
    

    Contents of the Replication Package

    • Appendix.pdf- Appendix of the paper containing supplement tables

    • Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

    • Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

    • RQ1/ - contains the data used to answer RQ1

      • LDA_input/ - input data used for LDA analysis
      • combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis
      • topic-input.mallet - input file to the mallet tool
      • LDA_output/
      • Mallet/ - contains the LDA output generated by MALLET tool
        • output_csv/
          • docs-in-topics.csv - documents per topic
          • topic-words.csv - most relevant topic words
          • topics-in-docs.csv - topic probability per document
          • topics-metadata.csv - metadata per document and topic probability
        • output_html/ - Browsable results of mallet output
          • all_topics.html
          • Docs/
          • Topics/
    • RQ2/ - contains the data used to answer RQ2

      • datasource_rawdata/ - contains the raw data for each source
      • quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
      • stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
      • manual_analysis_output/
      • stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
        • Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.
        • stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

          - quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

  12. P

    Data from: Stackoverflow

    • data.pldn.nl
    application/n-quads +5
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Bakker (2023). Stackoverflow [Dataset]. https://data.pldn.nl/JesseBakker/Stackoverflow
    Explore at:
    application/trig, application/n-triples, application/n-quads, jsonld, ttl, application/sparql-results+jsonAvailable download formats
    Dataset updated
    Jan 7, 2023
    Authors
    Jesse Bakker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Stackoverflow

  13. h

    StackOverflow-QA-C-Language-40k

    • huggingface.co
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Zhang (2023). StackOverflow-QA-C-Language-40k [Dataset]. https://huggingface.co/datasets/Mxode/StackOverflow-QA-C-Language-40k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2023
    Authors
    Max Zhang
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a collection of ~40k QA's in C Language from StackOverflow. The data has been initially cleaned, and each response is with Accepted Answer. All data is <1000 in length. The questions and answers were organized into a one-line format. A sample format is shown below: { "question": "``` FILE* file = fopen(some file)

    pcap_t* pd = pcap_fopen_offline(file)

    pcap_close(pd)

    fclose(file) ```

    This code occurs double free error.

    Could you explain about this happening?

    My… See the full description on the dataset page: https://huggingface.co/datasets/Mxode/StackOverflow-QA-C-Language-40k.

  14. A

    ‘Stack Overflow Tags Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Stack Overflow Tags Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-stack-overflow-tags-data-8194/ace9c36b/
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Stack Overflow Tags Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/isaacwen/stack-overflow-tags-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

    One such metric that can be used to define a 'popular' programming language is the number of posts relating to that language on public forums. With Stack Overflow being perhaps the most commonly used forum for questions related to programming languages, analyzing the number of posts and other metrics for specific programming languages on Stack Overflow can be a good indicator for the popularity of a language.

    Content

    This dataset contains statistics about posts, views, answers, comments, and favorites relating to the 1000 most popular tags on Stack Overflow, including those designated for questions relating to specific programming languages such as 'python' and 'javascript'. The data is from 2008 to 2021, and is sorted into rows for each tag, for each year.

    Source

    This data was queried and aggregated from BigQuery's public stackoverflow dataset.

    --- Original source retains full ownership of the source dataset ---

  15. R and Python Stack Overflow Answers + Sentiment

    • kaggle.com
    Updated May 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OJ Watson (2019). R and Python Stack Overflow Answers + Sentiment [Dataset]. https://www.kaggle.com/datasets/ojwatson/stack-overflow-output
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    OJ Watson
    Description

    Context

    This is the output of the Stack Rudeness kernel (https://www.kaggle.com/ojwatson/stack-rudeness), as saved in Cell 17.

    Content

    Stack Overflow answers by the Top 10 r and python users extracted using BigQuery. Also includes data on whether the answer was accepted and some additional data based on sentiment analysis of the answer text.

    Acknowledgements

    BigQuery and StackOverflow

  16. StackSample: 10% of Stack Overflow Q&A

    • kaggle.com
    Updated Oct 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). StackSample: 10% of Stack Overflow Q&A [Dataset]. https://www.kaggle.com/stackoverflow/stacksample/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 8, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Stack Overflow
    Description

    Dataset with the text of 10% of questions and answers from the Stack Overflow programming Q&A website.

    This is organized as three tables:

    • Questions contains the title, body, creation date, closed date (if applicable), score, and owner ID for all non-deleted Stack Overflow questions whose Id is a multiple of 10.
    • Answers contains the body, creation date, score, and owner ID for each of the answers to these questions. The ParentId column links back to the Questions table.
    • Tags contains the tags on each of these questions

    Datasets of all R questions and all Python questions are also available on Kaggle, but this dataset is especially useful for analyses that span many languages.

    Example projects include:

    • Identifying tags from question text
    • Predicting whether questions will be upvoted, downvoted, or closed based on their text
    • Predicting how long questions will take to answer

    License

    All Stack Overflow user contributions are licensed under CC-BY-SA 3.0 with attribution required.

  17. t

    StackOverflow - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). StackOverflow - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/stackover-ow
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.

  18. i

    Stack Overflow Dataset for User Engagement

    • ieee-dataport.org
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Okpanachi (2025). Stack Overflow Dataset for User Engagement [Dataset]. https://ieee-dataport.org/documents/stack-overflow-dataset-user-engagement-technology-and-emotion-analysis
    Explore at:
    Dataset updated
    Mar 17, 2025
    Authors
    Linda Okpanachi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    post tags

  19. Time Series of Social Media Activity. YouTube, Usenet, Stack-Overflow, PLOS...

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José María Miotto; Eduardo Altmann (2023). Time Series of Social Media Activity. YouTube, Usenet, Stack-Overflow, PLOS ONE. [Dataset]. http://doi.org/10.6084/m9.figshare.1160515.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    José María Miotto; Eduardo Altmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Datasets of Time Series of Social Media Activity. It includes 16.2 million YouTube videos, 0.8 million Usenet threads, 4.6 million Stack-Overflow questions and 70 thousands PLOS ONE papers. This data is used in JM Miotto, EG Altmann, 'Predictability of extreme events in social media', arXiv:1403.3616.

  20. a

    Stack Overflow data dump 2022-06

    • academictorrents.com
    bittorrent
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2023). Stack Overflow data dump 2022-06 [Dataset]. https://academictorrents.com/details/7210f09cc2d2e63a15663981f384fe21702b1456
    Explore at:
    bittorrent(59345626171)Available download formats
    Dataset updated
    Nov 12, 2023
    Authors
    None
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Stack Overflow 2022-06 data dump in a SQL Server database # Stack Overflow SQL Server Database - 2022-06 Version For more information and the latest release: Imported from the Stack Exchange Data Dump as of June 2022: Imported using the Stack Overflow Data Dump Importer: This database is in Microsoft SQL Server 2016 format, which means you can attach it to any SQL Server 2016 or newer instance. To keep the size small but let you get started fast: * All tables have a clustered index with page compression on * No nonclustered or full text indexes are included * The log file is small, and you should grow it out if you plan to modify data * It s distributed as an mdf/ldf so you don t need space to restore it * It only includes StackOverflow.com data, not data for other Stack sites As with the original data dump, this is provided under cc-by-sa 4.0 license:

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Ab1KXg (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
Organization logo

Data from: Stack Overflow

Related Article
Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 4, 2020
Dataset provided by
Googlehttp://google.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu