100+ datasets found

Stack Overflow Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
Stack Overflowhttp://stackoverflow.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Context

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

Content

Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

Fork this kernel to get started with this dataset.

Acknowledgements

Dataset Source: https://archive.org/download/stackexchange

https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

https://cloud.google.com/bigquery/public-data/stackoverflow

Banner Photo by Caspar Rubin from Unplash.

Inspiration

What is the percentage of questions that have been answered over the years?

What is the reputation and badge count of users across different tenures on StackOverflow?

What are 10 of the “easier” gold badges to earn?

Which day of the week has most questions answered within an hour?
h
stackoverflow-posts
huggingface.co
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike (2023). stackoverflow-posts [Dataset]. https://huggingface.co/datasets/mikex86/stackoverflow-posts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 14, 2023
Authors
Mike
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
StackOverflow Posts Markdown

Dataset Summary

This dataset contains all posts submitted to StackOverflow before the 14th of June 2023 formatted as Markdown text. The dataset contains ~60 Million posts, totaling ~35GB in size and ~65 billion characters of text. The data is sourced from Internet Archive StackExchange Data Dump.

Dataset Structure

Each record corresponds to one post of a particular type. Original ordering from the data dump is not exactly preserved… See the full description on the dataset page: https://huggingface.co/datasets/mikex86/stackoverflow-posts.
h
stackoverflow-dataset
huggingface.co
Updated Oct 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunny Bhaveen Chandra (2024). stackoverflow-dataset [Dataset]. https://huggingface.co/datasets/c17hawke/stackoverflow-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2024
Authors
Sunny Bhaveen Chandra
Description
c17hawke/stackoverflow-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: StackOverflow Dataset
kaggle.com
zip
Updated Dec 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ming Myung (2023). StackOverflow Dataset [Dataset]. https://www.kaggle.com/datasets/vanhaminhquan/stackoverflow-dataset
Explore at:
zip(20901098 bytes)Available download formats
Dataset updated
Dec 31, 2023
Authors
Ming Myung
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
The Public 2023 Stack Overflow Developer Survey Results

Description:

The enclosed data set is the complete, cleaned results of the 2023 Stack Overflow Developer Survey. Free response submissions have been removed. There are three files besides this README:

survey_results_public.csv - CSV file with main survey results, one respondent per row and one column per answer

survey_results_schema.csv - CSV file with survey schema, i.e., the questions that correspond to each column name

so_survey_2022.pdf - PDF file of the survey instrument

The survey was fielded from May 8, 2023 to May 19, 2023. The median time spent on the survey for qualified responses was 17 minutes.

Respondents were recruited primarily through channels owned by Stack Overflow. The top 5 sources of respondents were onsite messaging, blog posts, email lists, meta.stackoverflow posts, banner ads, and social media posts. Since respondents were recruited in this way, highly engaged users on Stack Overflow were more likely to notice the links for the survey and click to begin it.

You can find the official published results here:

https://survey.stackoverflow.co/2023/

Find previous survey results here:

https://insights.stackoverflow.com/survey

Legal:

This database - The Public 2023 Stack Overflow Developer Survey Results - is made available under the Open Database License (ODbL): http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/

TLDR: You are free to share, adapt, and create derivative works from The Public 2023 Stack Overflow Developer Survey Results as long as you attribute Stack Overflow, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

Acknowledgment:

Massive, heartfelt thanks to all Stack Overflow contributors and lurking developers of the world who took part in the survey this year. We value your generous participation more than you know. <3
h
Data from: stackoverflow
huggingface.co
Updated Dec 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ML Foundations Development (2024). stackoverflow [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/stackoverflow
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Dataset authored and provided by
ML Foundations Development
Description
mlfoundations-dev/stackoverflow dataset hosted on Hugging Face and contributed by the HF Datasets community
StackOverflow Survey Dataset(2024)
kaggle.com
zip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soham Khopkar (2025). StackOverflow Survey Dataset(2024) [Dataset]. https://www.kaggle.com/datasets/sohamkhopkar/stackoverflow-survey-dataset2025
Explore at:
zip(3230606 bytes)Available download formats
Dataset updated
Feb 18, 2025
Authors
Soham Khopkar
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Stack Overflow Developer Survey dataset includes comprehensive information about developers' education, learning methods, professional experiences, and basic demographic and employment-related details. It covers various aspects such as formal education, how developers learn to code, the online resources they use, technical documentation sources, years of coding experience, current job roles, organisational size, influence over technology purchases, and annual compensation. Additionally, it provides information on developers' age, employment status, remote work situation, and coding activities outside of work.
Stack Overflow Annual Developer Survey 2024
kaggle.com
zip
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berkay Alan (2024). Stack Overflow Annual Developer Survey 2024 [Dataset]. https://www.kaggle.com/datasets/berkayalan/stack-overflow-annual-developer-survey-2024
Explore at:
zip(18374043 bytes)Available download formats
Dataset updated
Aug 10, 2024
Authors
Berkay Alan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
In May 2024 over 65,000 developers responded to Stack Overflow's annual survey about coding, working, AI and how they feel about all of those topics and more.

There were seven sections in this survey. The 2nd, 3rd, 4th, and 5th sections will appear in a random order.

Basic Information

Education, Work, and Career

Technology and Tech Culture

Stack Overflow Usage + Community

Artificial Intelligence

Professional Developer Series (Optional)

Final Questions
Stack Overflow Developer Survey, 2017
kaggle.com
zip
Updated Jun 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stack Overflow (2017). Stack Overflow Developer Survey, 2017 [Dataset]. https://www.kaggle.com/datasets/stackoverflow/so-survey-2017
Explore at:
zip(10156126 bytes)Available download formats
Dataset updated
Jun 15, 2017
Dataset authored and provided by
Stack Overflowhttp://stackoverflow.com/
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Every year, Stack Overflow conducts a massive survey of people on the site, covering all sorts of information like programming languages, salary, code style and various other information. This year, they amassed more than 64,000 responses fielded from 213 countries.

Data

The data is made up of two files:
1. survey_results_public.csv - CSV file with main survey results, one respondent per row and one column per answer
2. survey_results_schema.csv - CSV file with survey schema, i.e., the questions that correspond to each column name m

Acknowledgements

Data is directly taken from StackOverflow and licensed under the ODbL license.
Reddit and StackOverflow dataset (Programming languages)
zenodo.org
zip
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele De Vinco; Daniele De Vinco; Alessia Antelmi; Alessia Antelmi (2023). Reddit and StackOverflow dataset (Programming languages) [Dataset]. http://doi.org/10.5281/zenodo.7685062
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7685062
Dataset updated
Mar 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniele De Vinco; Daniele De Vinco; Alessia Antelmi; Alessia Antelmi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains anonymized data collected from Reddit (via the Pushshift API) and StackOverflow (from Kaggle's dataset).

Each folder includes the data split by trimester. The schema of StackOverflow and Reddit-related files follows:

Fields from StackOverflow

question_id

answer_id

creation_date - answer creation_date

score - score of the question/answer

tags - all tags flagged for a question

answer_count - number of answers for a question

start_question - question's time of creation

last_activity_date - last update on the question

new_id - hashed id of the answerer

q_new_id - hashed id of the questioner

Fields from Reddit

comment_id

submission_id

score - score of the question/submission

subreddit

created_utc - time of creation (unrelated to last modified comments)

new_id - hashed id

The .txt files represent the structure of the corresponding hypergraphs.
e
stackoverflow.com Traffic Analytics Data
analytics.explodingtopics.com
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). stackoverflow.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/stackoverflow.com
Explore at:
Dataset updated
Oct 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Computer Software & Development Category Rank
Description
Traffic analytics, rankings, and competitive metrics for stackoverflow.com as of October 2025
Z
Replication package for the paper "What do Developers Discuss about Code...
data.niaid.nih.gov
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
Explore at:
Dataset updated
Jun 30, 2021
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RP-commenting-practices-multiple-sources

Replication package for the paper "What do Developers Discuss about Code Comments?"

Structure

Appendix.pdf Tags-topics.md Stack-exchange-query.md RQ1/ LDA_input/ combined-so-quora-mallet-metadata.csv topic-input.mallet LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

Appendix.pdf- Appendix of the paper containing supplement tables

Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

RQ1/ - contains the data used to answer RQ1

LDA_input/ - input data used for LDA analysis

combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis

topic-input.mallet - input file to the mallet tool

LDA_output/

Mallet/ - contains the LDA output generated by MALLET tool

output_csv/

docs-in-topics.csv - documents per topic

topic-words.csv - most relevant topic words

topics-in-docs.csv - topic probability per document

topics-metadata.csv - metadata per document and topic probability

output_html/ - Browsable results of mallet output

all_topics.html

Docs/

Topics/

RQ2/ - contains the data used to answer RQ2

datasource_rawdata/ - contains the raw data for each source

quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

manual_analysis_output/

stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.

Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.

stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
h
stackoverflow-chat-dutch
huggingface.co
Updated Jan 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bram Vanroy (2024). stackoverflow-chat-dutch [Dataset]. http://doi.org/10.57967/hf/0529
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/0529
Dataset updated
Jan 24, 2024
Authors
Bram Vanroy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Stack Overflow Chat Dutch

Dataset Summary

This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset. ☕ Want to help me out? Translating the data with the OpenAI API, and prompt testing, cost me 💸$133.60💸. If you like this dataset, please consider buying… See the full description on the dataset page: https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch.
h
stackoverflow-qa
huggingface.co
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2024). stackoverflow-qa [Dataset]. https://huggingface.co/datasets/mteb/stackoverflow-qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 5, 2024
Dataset authored and provided by
Massive Text Embedding Benchmark
Description
mteb/stackoverflow-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
t
StackOverﬂow - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). StackOverﬂow - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/stackover-ow
Explore at:
Dataset updated
Dec 2, 2024
Description
The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
w
Websites using Stackoverflow
webtechsurvey.com
csv
Updated Oct 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites using Stackoverflow [Dataset]. https://webtechsurvey.com/technology/stackoverflow
Explore at:
csvAvailable download formats
Dataset updated
Oct 13, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites using the Stackoverflow technology, compiled through global website indexing conducted by WebTechSurvey.
Stackoverflow Developer Suvery 2022
kaggle.com
zip
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soheil Tehranipour (2022). Stackoverflow Developer Suvery 2022 [Dataset]. https://www.kaggle.com/datasets/soheiltehranipour/stackoverflow-developer-suvery-2022
Explore at:
zip(12299650 bytes)Available download formats
Dataset updated
Dec 19, 2022
Authors
Soheil Tehranipour
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Over 70,000 developers told us how they learn and level up, which tools they’re using, and what they want.

Stackoverflow : The questions we ask in our annual survey help us improve the Stack Overflow community and the platform that serves them. The challenge and opportunity for us is to continue expanding and improving our ability to help all developers and to make them feel welcome in our community. Read on for more great insights about the attitudes, tools, and environments that are shaping the art and practice of software today.
stackoverflow users dataset
kaggle.com
zip
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chandan B Reddy (2023). stackoverflow users dataset [Dataset]. https://www.kaggle.com/datasets/chandanreddy10/stackoverflow-users-dataset
Explore at:
zip(64721 bytes)Available download formats
Dataset updated
Jan 1, 2023
Authors
Chandan B Reddy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset is about the users on stackoverflow. it contains 2 columns, first is the location which contains the user's location and second is the tags, these contain the top 5 tags of user.
Stack Overflow BigQuery Dataset
live.european-language-grid.eu
Updated Dec 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stack Overflow (2018). Stack Overflow BigQuery Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5094
Explore at:
Dataset updated
Dec 30, 2018
Dataset authored and provided by
Stack Overflowhttp://stackoverflow.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges.
Time Series of Social Media Activity. YouTube, Usenet, Stack-Overflow, PLOS...
figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José María Miotto; Eduardo Altmann (2023). Time Series of Social Media Activity. YouTube, Usenet, Stack-Overflow, PLOS ONE. [Dataset]. http://doi.org/10.6084/m9.figshare.1160515.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1160515.v4
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
José María Miotto; Eduardo Altmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Datasets of Time Series of Social Media Activity. It includes 16.2 million YouTube videos, 0.8 million Usenet threads, 4.6 million Stack-Overflow questions and 70 thousands PLOS ONE papers. This data is used in JM Miotto, EG Altmann, 'Predictability of extreme events in social media', arXiv:1403.3616.
Stack Overflow Developer Survey Dataset
kaggle.com
zip
Updated Jan 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palvinder (2024). Stack Overflow Developer Survey Dataset [Dataset]. https://www.kaggle.com/datasets/palvinder2006/stackoverflow
Explore at:
zip(9459089 bytes)Available download formats
Dataset updated
Jan 8, 2024
Authors
Palvinder
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview The Stack Overflow Developer Survey Dataset represents one of the most trusted and comprehensive sources of information about the global developer community. Collected by Stack Overflow through its annual survey, the dataset provides insights into the demographics, preferences, habits, and career paths of developers.

This dataset is frequently used for: - Analyzing trends in programming languages, tools, and technologies. - Understanding developer job satisfaction, compensation, and work environments. - Studying global and regional differences in developer demographics and experience.

The data has of two CSV files, "survey_results_public" that consist of data and "survey_results_schema" that describes each column in detail.

Data Dictionary: All the details are in "survey_results_schema.csv"

Features of the Stack Overflow Developer Survey Dataset

Demographic & Background Information - Respondent: A unique identifier for each survey participant. - MainBranch: Describes whether the respondent is a professional developer, student, hobbyist, etc. - Country: The country where the respondent lives. - Age: The respondent's age. - Gender: The gender identity of the respondent. - Ethnicity: Ethnic background (when available). - EdLevel: The highest level of formal education completed. - UndergradMajor: The respondent's undergraduate major. - Hobbyist: Indicates whether the person codes as a hobby (Yes/No).

Employment & Professional Experience - Employment: Employment status (full-time, part-time, unemployed, student, etc.). - DevType: Types of developer roles the respondent identifies with (e.g., Web Developer, Data Scientist). - YearsCode: Number of years the respondent has been coding. - YearsCodePro: Number of years coding professionally. - JobSat: Job satisfaction level. - CareerSat: Career satisfaction level. - WorkWeekHrs: Approximate hours worked per week. - RemoteWork: Whether the respondent works remotely and how frequently.

Compensation - CompTotal: Total compensation in USD (including salary, bonuses, etc.). - CompFreq: Frequency of compensation (e.g., yearly, monthly).

Learning & Education - LearnCode: How the respondent first learned to code (e.g., online courses, university). - LearnCodeOnline: Online resources used (e.g., YouTube, freeCodeCamp). - LearnCodeCoursesCert: Whether the respondent has taken online courses or earned certifications.

Technology & Tools - LanguageHaveWorkedWith: Programming languages the respondent has used. - LanguageWantToWorkWith: Languages the respondent is interested in learning or using more. - DatabaseHaveWorkedWith: Databases the respondent has experience with. - PlatformHaveWorkedWith: Platforms used (e.g., Linux, AWS, Android). - OpSys: The operating system used most often. - NEWCollabToolsHaveWorkedWith: Collaboration tools used (e.g., Slack, Teams, Zoom). - NEWStuck: How often the respondent feels stuck when coding. - ToolsTechHaveWorkedWith: Frameworks and technologies respondents have worked with.

Online Presence & Community - SOAccount: Whether the respondent has a Stack Overflow account. - SOPartFreq: How often the respondent participates on Stack Overflow. - SOVisitFreq: Frequency of visiting Stack Overflow. - SOComm: Whether the respondent feels welcome in the Stack Overflow community. - OpenSourcer: Level of involvement in open-source contributions.

Opinions & Preferences - WorkChallenge: Challenges faced at work (e.g., unclear requirements, unrealistic expectations). - JobFactors: Important job factors (e.g., salary, work-life balance, technologies used). - MentalHealth: Questions on how mental health affects or is affected by their job.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow

Stack Overflow Data

Stack Overflow Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Mar 20, 2019

Dataset authored and provided by

Stack Overflowhttp://stackoverflow.com/

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Context

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

Content

Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

Fork this kernel to get started with this dataset.

Acknowledgements

Dataset Source: https://archive.org/download/stackexchange

https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

https://cloud.google.com/bigquery/public-data/stackoverflow

Banner Photo by Caspar Rubin from Unplash.

Inspiration

What is the percentage of questions that have been answered over the years?

What is the reputation and badge count of users across different tenures on StackOverflow?

What are 10 of the “easier” gold badges to earn?

Which day of the week has most questions answered within an hour?

Clear search

Close search

Google apps

Main menu

Stack Overflow Data

Context

Content

Acknowledgements

Inspiration

stackoverflow-posts

stackoverflow-dataset

Data from: StackOverflow Dataset

Data from: stackoverflow

StackOverflow Survey Dataset(2024)

Stack Overflow Annual Developer Survey 2024

Stack Overflow Developer Survey, 2017

Data

Acknowledgements

Reddit and StackOverflow dataset (Programming languages)

stackoverflow.com Traffic Analytics Data

Replication package for the paper "What do Developers Discuss about Code...

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

stackoverflow-chat-dutch

stackoverflow-qa

StackOverﬂow - Dataset - LDM

Websites using Stackoverflow

Stackoverflow Developer Suvery 2022

stackoverflow users dataset

Stack Overflow BigQuery Dataset

Time Series of Social Media Activity. YouTube, Usenet, Stack-Overflow, PLOS...

Stack Overflow Developer Survey Dataset

Features of the Stack Overflow Developer Survey Dataset

Stack Overflow Data

Stack Overflow Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.