100+ datasets found

Data from: Stack Overflow
console.cloud.google.com
Updated Mar 4, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Ab1KXg (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
Explore at:
Dataset updated
Mar 4, 2020
Dataset provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
P
60k Stack Overflow Questions Dataset
paperswithcode.com
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Issa Annamoradnejad; Jafar Habibi; Mohammadamin Fazli (2022). 60k Stack Overflow Questions Dataset [Dataset]. https://paperswithcode.com/dataset/60k-stack-overflow-questions
Explore at:
Dataset updated
Apr 8, 2022
Authors
Issa Annamoradnejad; Jafar Habibi; Mohammadamin Fazli
Description
The dataset contains 60,000 Stack Overflow questions from 2016-2020, classified into three categories:

HQ: High-quality posts without a single edit. LQ_EDIT: Low-quality posts with a negative score, and multiple community edits. However, they still remain open after those changes. LQ_CLOSE: Low-quality posts that were closed by the community without a single edit.

Notes

Questions are sorted according to Question Id. Question body is in HTML format. All dates are in UTC format. The dataset is also accessible at https://www.kaggle.com/imoore/60k-stack-overflow-questions-with-quality-rate

How to cite This is an original dataset, published under MIT License. Please cite the dataset for your usage as the following:

@article{annamoradnejad2022multiview, title={Multi-View Approach to Suggest Moderation Actions in Community Question Answering Sites}, author={Annamoradnejad, Issa and Habibi, Jafar and Fazli, Mohammadamin}, journal = {Information Sciences}, volume = {600}, pages = {144-154}, year = {2022}, issn = {0020-0255}, doi = {https://doi.org/10.1016/j.ins.2022.03.085}, url = {https://www.sciencedirect.com/science/article/pii/S0020025522003127} }
E
Stack Overflow Statistics And Facts (2025)
electroiq.com
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Electro IQ (2025). Stack Overflow Statistics And Facts (2025) [Dataset]. https://electroiq.com/stats/stack-overflow-statistics/
Explore at:
Dataset updated
Jul 2, 2025
Dataset authored and provided by
Electro IQ
License
https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

Stack Overflow Statistics: The 2024 Stack Overflow DeveloperÂ Survey offers a comprehensive snapshot of the global developer community, compiling insights from 65,437 respondents across 185 countries. Conducted between May 19 and June 20, 2024, the survey had a median completion time of approximately 21 minutes.

A significant 76% of developers reported using or planning to use AI tools in their development processes, marking an increase from 70% in 2023. However, trust in AI tool accuracy remains divided, with only 43% expressing confidence in their outputs. Despite this, 81% of developers identified increased productivity as the primary benefit of integrating AI tools into their workflows.

Educational backgrounds among respondents show that 66% hold a Bachelor's or Master's degree, even though only 49% learned to code through formal education.

Geographically, the United States accounted for 18.9% of respondents, followed by Germany at 8.4% and India at 7.2%, highlighting the survey's extensive international reach.

This year's survey underscores the evolving landscape of software development, emphasizing the growing integration of AI tools, the shift towards self-directed learning, and the diverse global composition of the developer community.

This article will highlight the Stack Overflow statistics and its performance.
h
stackoverflow-questions
huggingface.co
Updated Sep 5, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paco Valdez (2012). stackoverflow-questions [Dataset]. https://huggingface.co/datasets/pacovaldez/stackoverflow-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 5, 2012
Authors
Paco Valdez
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for [Stackoverflow Post Questions]

Dataset Description

Companies that sell Open-source software tools usually hire an army of Customer representatives to try to answer every question asked about their tool. The first step in this process is the prioritization of the question. The classification scale usually consists of 4 values, P0, P1, P2, and P3, with different meanings across every participant in the industry. On the other hand, every software developer… See the full description on the dataset page: https://huggingface.co/datasets/pacovaldez/stackoverflow-questions.
g
Data from: Stack Overflow Dataset
gts.ai
json
Updated Dec 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Stack Overflow Dataset [Dataset]. https://gts.ai/dataset-download/stack-overflow-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Dec 19, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
Description
The Stack Overflow dataset, a detailed archive of posts, votes, tags, and badges from the world’s largest programmer community.
h
stack-overflow-description
huggingface.co
Updated Oct 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TPP-LLM (2024). stack-overflow-description [Dataset]. https://huggingface.co/datasets/tppllm/stack-overflow-description
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2024
Dataset authored and provided by
TPP-LLM
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Stack Overflow Description Dataset

This dataset contains badge awards earned by users on Stack Overflow between January 1, 2022, and December 31, 2023. It includes 3,336 sequences with 187,836 events and 25 badge types, derived from the Stack Exchange Data Dump under the CC BY-SA 4.0 license. The detailed data preprocessing steps used to create this dataset can be found in the TPP-LLM paper and TPP-LLM-Embedding paper. If you find this dataset useful, we kindly invite you to cite… See the full description on the dataset page: https://huggingface.co/datasets/tppllm/stack-overflow-description.
R and Python Stack Overflow Answers + Sentiment
kaggle.com
Updated May 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OJ Watson (2019). R and Python Stack Overflow Answers + Sentiment [Dataset]. https://www.kaggle.com/datasets/ojwatson/stack-overflow-output
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
OJ Watson
Description
Context

This is the output of the Stack Rudeness kernel (https://www.kaggle.com/ojwatson/stack-rudeness), as saved in Cell 17.

Content

Stack Overflow answers by the Top 10 r and python users extracted using BigQuery. Also includes data on whether the answer was accepted and some additional data based on sentiment analysis of the answer text.

Acknowledgements

BigQuery and StackOverflow
h
stackoverflow-chat-dutch
huggingface.co
Updated Jan 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bram Vanroy (2024). stackoverflow-chat-dutch [Dataset]. http://doi.org/10.57967/hf/0529
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/0529
Dataset updated
Jan 24, 2024
Authors
Bram Vanroy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Stack Overflow Chat Dutch

Dataset Summary

This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset. ☕ Want to help me out? Translating the data with the OpenAI API, and prompt testing, cost me 💸$133.60💸. If you like this dataset, please consider buying… See the full description on the dataset page: https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch.
i
Stack Overflow Dataset for User Engagement
ieee-dataport.org
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linda Okpanachi (2025). Stack Overflow Dataset for User Engagement [Dataset]. https://ieee-dataport.org/documents/stack-overflow-dataset-user-engagement-technology-and-emotion-analysis
Explore at:
Dataset updated
Mar 17, 2025
Authors
Linda Okpanachi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
post tags
Z
Replication package for the paper "What do Developers Discuss about Code...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
Explore at:
Dataset updated
Jun 30, 2021
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RP-commenting-practices-multiple-sources

Replication package for the paper "What do Developers Discuss about Code Comments?"

Structure

Appendix.pdf Tags-topics.md Stack-exchange-query.md RQ1/ LDA_input/ combined-so-quora-mallet-metadata.csv topic-input.mallet LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

Appendix.pdf- Appendix of the paper containing supplement tables

Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

RQ1/ - contains the data used to answer RQ1

LDA_input/ - input data used for LDA analysis

combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis

topic-input.mallet - input file to the mallet tool

LDA_output/

Mallet/ - contains the LDA output generated by MALLET tool

output_csv/

docs-in-topics.csv - documents per topic

topic-words.csv - most relevant topic words

topics-in-docs.csv - topic probability per document

topics-metadata.csv - metadata per document and topic probability

output_html/ - Browsable results of mallet output

all_topics.html

Docs/

Topics/

RQ2/ - contains the data used to answer RQ2

datasource_rawdata/ - contains the raw data for each source

quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

manual_analysis_output/

stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.

Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.

stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
A
‘Stack Overflow Tags Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Stack Overflow Tags Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-stack-overflow-tags-data-8194/ace9c36b/
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Stack Overflow Tags Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/isaacwen/stack-overflow-tags-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

One such metric that can be used to define a 'popular' programming language is the number of posts relating to that language on public forums. With Stack Overflow being perhaps the most commonly used forum for questions related to programming languages, analyzing the number of posts and other metrics for specific programming languages on Stack Overflow can be a good indicator for the popularity of a language.

Content

This dataset contains statistics about posts, views, answers, comments, and favorites relating to the 1000 most popular tags on Stack Overflow, including those designated for questions relating to specific programming languages such as 'python' and 'javascript'. The data is from 2008 to 2021, and is sorted into rows for each tag, for each year.

Source

This data was queried and aggregated from BigQuery's public stackoverflow dataset.

--- Original source retains full ownership of the source dataset ---
StackSample: 10% of Stack Overflow Q&A
kaggle.com
Updated Oct 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stack Overflow (2019). StackSample: 10% of Stack Overflow Q&A [Dataset]. https://www.kaggle.com/stackoverflow/stacksample/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 8, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Stack Overflow
Description
Dataset with the text of 10% of questions and answers from the Stack Overflow programming Q&A website.

This is organized as three tables:

Questions contains the title, body, creation date, closed date (if applicable), score, and owner ID for all non-deleted Stack Overflow questions whose Id is a multiple of 10.

Answers contains the body, creation date, score, and owner ID for each of the answers to these questions. The ParentId column links back to the Questions table.

Tags contains the tags on each of these questions

Datasets of all R questions and all Python questions are also available on Kaggle, but this dataset is especially useful for analyses that span many languages.

Example projects include:

Identifying tags from question text

Predicting whether questions will be upvoted, downvoted, or closed based on their text

Predicting how long questions will take to answer

License

All Stack Overflow user contributions are licensed under CC-BY-SA 3.0 with attribution required.
tags-stack-overflow
zenodo.org
json
Updated Nov 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Lucas; Maxime Lucas (2023). tags-stack-overflow [Dataset]. http://doi.org/10.5281/zenodo.10155885
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10155885
Dataset updated
Nov 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maxime Lucas; Maxime Lucas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview
This dataset is derived from tags on Stack Overflow posts. Each hyperedge corresponds to all of the tags used in a post, and each node in a hyperedge corresponds to a tag. The timestamps of the posts are in millisecond resolution, are adjusted so that the time of the earliest tag starts at 0, and are in ISO8601 format.
Statistics
Some basic statistics of this dataset are:
number of nodes: 49,998
number of timestamped hyperedges: 14,458,875
number of unique hyperedges: 5,675,497
Component sizes:
Component size, number
49931, 1
2, 7
1, 53
Source of original data
tags-stack-overflow dataset
StackExchange
References
If you use this data, please cite the following paper:
Simplicial closure and higher-order link prediction. Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. Proceedings of the National Academy of Sciences (PNAS), 2018.
Stack Overflow Developer Survey, 2017 A look into the lives of over 64,000...
dataandsons.com
csv, zip
Updated Jun 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verka Bicic (2018). Stack Overflow Developer Survey, 2017 A look into the lives of over 64,000 Stack Overflow developers [Dataset]. https://www.dataandsons.com/categories/surveys/stack-overflow-developer-survey-2017-a-look-into-the-lives-of-over-64-000-stack-overflow-developers
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 28, 2018
Dataset provided by
Authors
Verka Bicic
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Jan 1, 2017 - Nov 5, 2017
Description
About this Dataset

Every year, Stack Overflow conducts a massive survey of people on the site, covering all sorts of information like programming languages, salary, code style and various other information. This year, they amassed more than 64,000 responses fielded from 213 countries. Data The data is made up of two files: 1. survey_results_public.csv - CSV file with main survey results, one respondent per row and one column per answer 2. survey_results_schema.csv - CSV file with survey schema, i.e., the questions that correspond to each column name m Acknowledgements Data is directly taken from StackOverflow and licensed under the ODbL license.

Category

Surveys

Keywords

internet,Information Technology,coding

Row Count

51248

Price

Free
h
Data from: stackoverflow
huggingface.co
Updated Dec 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development (2024). stackoverflow [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/stackoverflow
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/stackoverflow dataset hosted on Hugging Face and contributed by the HF Datasets community
h
StackOverflow-QA-C-Language-40k
huggingface.co
Updated Oct 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Zhang (2023). StackOverflow-QA-C-Language-40k [Dataset]. https://huggingface.co/datasets/Mxode/StackOverflow-QA-C-Language-40k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2023
Authors
Max Zhang
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This is a collection of ~40k QA's in C Language from StackOverflow. The data has been initially cleaned, and each response is with Accepted Answer. All data is <1000 in length. The questions and answers were organized into a one-line format. A sample format is shown below: { "question": "``` FILE* file = fopen(some file)

pcap_t* pd = pcap_fopen_offline(file)

pcap_close(pd)

fclose(file) ```

This code occurs double free error.

Could you explain about this happening?

My… See the full description on the dataset page: https://huggingface.co/datasets/Mxode/StackOverflow-QA-C-Language-40k.
Z
GPT vs Stack Overflow: data collection (A2I2 T2 2023)
data.niaid.nih.gov
zenodo.org
Updated Oct 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heath, Mark (2023). GPT vs Stack Overflow: data collection (A2I2 T2 2023) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8403467
Explore at:
Dataset updated
Oct 6, 2023
Dataset authored and provided by
Heath, Mark
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
About

The dataset components produced by this repo. Please see the documentation there for more information.

Each CSV has been individually zipped so that you only have to download the specific file(s) that you want.

Overview of Files

From using the Stack Exchange Data Dump as the data source (these zip files have a DD_ prefix):

Raw dataset before processing: saved_dataset.csv (DD_saved_dataset.zip)

Completed tag count: tag_count.csv (DD_tag_count.zip)

Processed dataset with completed evaluations: dataset_results.csv (DD_dataset_results.zip)

From using Google BigQuery as the data source (these zip files have a BQ_ prefix):

Raw dataset before processing: saved_dataset.csv (BQ_saved_dataset.zip)

Completed tag count: tag_count.csv (BQ_tag_count.zip)

No large-scale evaluation was completed when using BigQuery as a data source.

As noted in the linked repo, the use of Google BigQuery as a data source is not recommended for this work, but the working code and dataset have nonetheless been provided for completeness.

License

This dataset is licensed under the CC BY-SA 4.0 license, the same license used by the Stack Exchange Data Dump.
P
StaQC Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziyu Yao; Daniel S. Weld; Wei-Peng Chen; Huan Sun, StaQC Dataset [Dataset]. https://paperswithcode.com/dataset/staqc
Explore at:
Authors
Ziyu Yao; Daniel S. Weld; Wei-Peng Chen; Huan Sun
Description
StaQC (Stack Overflow Question-Code pairs) is a large dataset of around 148K Python and 120K SQL domain question-code pairs, which are automatically mined from StackOverflow.
Reddit and StackOverflow dataset (Programming languages)
zenodo.org
zip
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele De Vinco; Daniele De Vinco; Alessia Antelmi; Alessia Antelmi (2023). Reddit and StackOverflow dataset (Programming languages) [Dataset]. http://doi.org/10.5281/zenodo.7685062
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7685062
Dataset updated
Mar 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniele De Vinco; Daniele De Vinco; Alessia Antelmi; Alessia Antelmi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains anonymized data collected from Reddit (via the Pushshift API) and StackOverflow (from Kaggle's dataset).

Each folder includes the data split by trimester. The schema of StackOverflow and Reddit-related files follows:

Fields from StackOverflow

question_id

answer_id

creation_date - answer creation_date

score - score of the question/answer

tags - all tags flagged for a question

answer_count - number of answers for a question

start_question - question's time of creation

last_activity_date - last update on the question

new_id - hashed id of the answerer

q_new_id - hashed id of the questioner

Fields from Reddit

comment_id

submission_id

score - score of the question/submission

subreddit

created_utc - time of creation (unrelated to last modified comments)

new_id - hashed id

The .txt files represent the structure of the corresponding hypergraphs.
a
Stack Overflow data dump 2022-06
academictorrents.com
bittorrent
Updated Nov 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2023). Stack Overflow data dump 2022-06 [Dataset]. https://academictorrents.com/details/7210f09cc2d2e63a15663981f384fe21702b1456
Explore at:
bittorrent(59345626171)Available download formats
Dataset updated
Nov 12, 2023
Authors
None
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Stack Overflow 2022-06 data dump in a SQL Server database # Stack Overflow SQL Server Database - 2022-06 Version For more information and the latest release: Imported from the Stack Exchange Data Dump as of June 2022: Imported using the Stack Overflow Data Dump Importer: This database is in Microsoft SQL Server 2016 format, which means you can attach it to any SQL Server 2016 or newer instance. To keep the size small but let you get started fast: * All tables have a clustered index with page compression on * No nonclustered or full text indexes are included * The log file is small, and you should grow it out if you plan to modify data * It s distributed as an mdf/ldf so you don t need space to restore it * It only includes StackOverflow.com data, not data for other Stack sites As with the original data dump, this is provided under cc-by-sa 4.0 license:

Facebook

Twitter

Click to copy link

Link copied

Cite

https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Ab1KXg (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow

Data from: Stack Overflow

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Mar 4, 2020

Dataset provided by

Googlehttp://google.com/

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Clear search

Close search

Google apps

Main menu

Data from: Stack Overflow

60k Stack Overflow Questions Dataset

Stack Overflow Statistics And Facts (2025)

Introduction

stackoverflow-questions

Data from: Stack Overflow Dataset

stack-overflow-description

R and Python Stack Overflow Answers + Sentiment

Context

Content

Acknowledgements

stackoverflow-chat-dutch

Stack Overflow Dataset for User Engagement

Replication package for the paper "What do Developers Discuss about Code...

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

‘Stack Overflow Tags Data’ analyzed by Analyst-2

Context

Content

Source

StackSample: 10% of Stack Overflow Q&A

License

tags-stack-overflow

Overview

Statistics

Source of original data

References

Stack Overflow Developer Survey, 2017 A look into the lives of over 64,000...

About this Dataset

Category

Keywords

Row Count

Price

Data from: stackoverflow

StackOverflow-QA-C-Language-40k

GPT vs Stack Overflow: data collection (A2I2 T2 2023)

StaQC Dataset

Reddit and StackOverflow dataset (Programming languages)

Stack Overflow data dump 2022-06

Data from: Stack OverflowSee More Versions

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

Data from: Stack Overflow