100+ datasets found
  1. Data from: Stack Overflow

    • console.cloud.google.com
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=AbyocA (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
    Explore at:
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. E

    Stack Overflow Statistics And Facts (2025)

    • electroiq.com
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electro IQ (2025). Stack Overflow Statistics And Facts (2025) [Dataset]. https://electroiq.com/stats/stack-overflow-statistics/
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Electro IQ
    License

    https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Stack Overflow Statistics: The 2024 Stack Overflow Developer Survey offers a comprehensive snapshot of the global developer community, compiling insights from 65,437 respondents across 185 countries. Conducted between May 19 and June 20, 2024, the survey had a median completion time of approximately 21 minutes.

    A significant 76% of developers reported using or planning to use AI tools in their development processes, marking an increase from 70% in 2023. However, trust in AI tool accuracy remains divided, with only 43% expressing confidence in their outputs. Despite this, 81% of developers identified increased productivity as the primary benefit of integrating AI tools into their workflows.

    Educational backgrounds among respondents show that 66% hold a Bachelor's or Master's degree, even though only 49% learned to code through formal education.

    Geographically, the United States accounted for 18.9% of respondents, followed by Germany at 8.4% and India at 7.2%, highlighting the survey's extensive international reach.

    This year's survey underscores the evolving landscape of software development, emphasizing the growing integration of AI tools, the shift towards self-directed learning, and the diverse global composition of the developer community.

    This article will highlight the Stack Overflow statistics and its performance.

  3. h

    stackoverflow-questions

    • huggingface.co
    Updated Sep 5, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paco Valdez (2012). stackoverflow-questions [Dataset]. https://huggingface.co/datasets/pacovaldez/stackoverflow-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 5, 2012
    Authors
    Paco Valdez
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for [Stackoverflow Post Questions]

      Dataset Description
    

    Companies that sell Open-source software tools usually hire an army of Customer representatives to try to answer every question asked about their tool. The first step in this process is the prioritization of the question. The classification scale usually consists of 4 values, P0, P1, P2, and P3, with different meanings across every participant in the industry. On the other hand, every software developer… See the full description on the dataset page: https://huggingface.co/datasets/pacovaldez/stackoverflow-questions.

  4. g

    Data from: Stack Overflow Dataset

    • gts.ai
    json
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Stack Overflow Dataset [Dataset]. https://gts.ai/dataset-download/stack-overflow-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    Description

    The Stack Overflow dataset, a detailed archive of posts, votes, tags, and badges from the world’s largest programmer community.

  5. h

    stack-overflow-description

    • huggingface.co
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TPP-LLM (2024). stack-overflow-description [Dataset]. https://huggingface.co/datasets/tppllm/stack-overflow-description
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2024
    Dataset authored and provided by
    TPP-LLM
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Stack Overflow Description Dataset

    This dataset contains badge awards earned by users on Stack Overflow between January 1, 2022, and December 31, 2023. It includes 3,336 sequences with 187,836 events and 25 badge types, derived from the Stack Exchange Data Dump under the CC BY-SA 4.0 license. The detailed data preprocessing steps used to create this dataset can be found in the TPP-LLM paper and TPP-LLM-Embedding paper. If you find this dataset useful, we kindly invite you to cite… See the full description on the dataset page: https://huggingface.co/datasets/tppllm/stack-overflow-description.

  6. i

    Stack Overflow Dataset for User Engagement

    • ieee-dataport.org
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Okpanachi (2025). Stack Overflow Dataset for User Engagement [Dataset]. https://ieee-dataport.org/documents/stack-overflow-dataset-user-engagement-technology-and-emotion-analysis
    Explore at:
    Dataset updated
    Mar 17, 2025
    Authors
    Linda Okpanachi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    post tags

  7. Subset Stack Overflow Survey -- 2017-2022

    • kaggle.com
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tony Fraser (2023). Subset Stack Overflow Survey -- 2017-2022 [Dataset]. https://www.kaggle.com/datasets/tonyfraser/formatted-stack-overflow-survey-2017-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tony Fraser
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    We, team KriJuDaTo, four students from the City University of New York, used the stack overflow survey data here https://survey.stackoverflow.co for a class project. We removed a bunch of the columns and exploded some. Our code for this processing is here, in functions.r. https://github.com/tonythor/krijudato/

    Thank you Stack Overflow! And everybody reading this next time the survey comes out, please sit down and fill out in detail!

  8. o

    Replication package for the paper "What do Developers Discuss about Code...

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. http://doi.org/10.5281/zenodo.5044270
    Explore at:
    Dataset updated
    Jun 30, 2021
    Description

    RP-commenting-practices-multiple-sources Replication package for the paper "What do Developers Discuss about Code Comments?" ## Structure Appendix.pdf Tags-topics.md Stack-exchange-query.md RQ1/ LDA_input/ combined-so-quora-mallet-metadata.csv topic-input.mallet LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx ## Contents of the Replication Package --- - Appendix.pdf- Appendix of the paper containing supplement tables - Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2) - Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer. - RQ1/ - contains the data used to answer RQ1 - LDA_input/ - input data used for LDA analysis - combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis - topic-input.mallet - input file to the mallet tool - LDA_output/ - Mallet/ - contains the LDA output generated by MALLET tool - output_csv/ - docs-in-topics.csv - documents per topic - topic-words.csv - most relevant topic words - topics-in-docs.csv - topic probability per document - topics-metadata.csv - metadata per document and topic probability - output_html/ - Browsable results of mallet output - all_topics.html - Docs/ - Topics/ - RQ2/ - contains the data used to answer RQ2 - datasource_rawdata/ - contains the raw data for each source - quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool. - stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool. - manual_analysis_output/ - stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy. - Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol. - stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories. - quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories. ---

  9. h

    stackoverflow-chat-dutch

    • huggingface.co
    • data.niaid.nih.gov
    • +1more
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bram Vanroy (2024). stackoverflow-chat-dutch [Dataset]. http://doi.org/10.57967/hf/0529
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2024
    Authors
    Bram Vanroy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Stack Overflow Chat Dutch

      Dataset Summary
    

    This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset. ☕ Want to help me out? Translating the data with the OpenAI API, and prompt testing, cost me 💸$133.60💸. If you like this dataset, please consider buying… See the full description on the dataset page: https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch.

  10. A

    ‘Stack Overflow Tags Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Stack Overflow Tags Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-stack-overflow-tags-data-8194/ace9c36b/
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Stack Overflow Tags Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/isaacwen/stack-overflow-tags-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

    One such metric that can be used to define a 'popular' programming language is the number of posts relating to that language on public forums. With Stack Overflow being perhaps the most commonly used forum for questions related to programming languages, analyzing the number of posts and other metrics for specific programming languages on Stack Overflow can be a good indicator for the popularity of a language.

    Content

    This dataset contains statistics about posts, views, answers, comments, and favorites relating to the 1000 most popular tags on Stack Overflow, including those designated for questions relating to specific programming languages such as 'python' and 'javascript'. The data is from 2008 to 2021, and is sorted into rows for each tag, for each year.

    Source

    This data was queried and aggregated from BigQuery's public stackoverflow dataset.

    --- Original source retains full ownership of the source dataset ---

  11. StackLite: Stack Overflow questions and tags

    • kaggle.com
    Updated Feb 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2017). StackLite: Stack Overflow questions and tags [Dataset]. https://www.kaggle.com/stackoverflow/stacklite/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Stack Overflow
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    A dataset of Stack Overflow programming questions. For each question, it includes:

    • Question ID
    • Creation date
    • Closed date, if applicable
    • Score
    • Owner user ID
    • Number of answers
    • Tags

    This dataset is ideal for answering questions such as:

    • The increase or decrease in questions in each tag over time
    • Correlations among tags on questions
    • Which tags tend to get higher or lower scores
    • Which tags tend to be asked on weekends vs weekdays

    This dataset was extracted from the Stack Overflow database at 2016-10-13 18:09:48 UTC and contains questions up to 2016-10-12. This includes 12583347 non-deleted questions, and 3654954 deleted ones.

    This is all public data within the Stack Exchange Data Dump, which is much more comprehensive (including question and answer text), but also requires much more computational overhead to download and process. This dataset is designed to be easy to read in and start analyzing. Similarly, this data can be examined within the Stack Exchange Data Explorer, but this offers analysts the chance to work with it locally using their tool of choice.

    Note that for space reasons only non-deleted questions are included in the sqllite dataset, but the csv.gz files include deleted questions as well (with an additional DeletionDate file).

    See the GitHub repo for more.

  12. h

    Data from: stackoverflow

    • huggingface.co
    Updated Dec 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML Foundations Development (2024). stackoverflow [Dataset]. https://huggingface.co/datasets/mlfoundations-dev/stackoverflow
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2024
    Dataset authored and provided by
    ML Foundations Development
    Description

    mlfoundations-dev/stackoverflow dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. GPT vs Stack Overflow: data collection (A2I2 T2 2023)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Heath; Mark Heath (2023). GPT vs Stack Overflow: data collection (A2I2 T2 2023) [Dataset]. http://doi.org/10.5281/zenodo.8403468
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 6, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mark Heath; Mark Heath
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    About

    The dataset components produced by this repo. Please see the documentation there for more information.

    Each CSV has been individually zipped so that you only have to download the specific file(s) that you want.

    Overview of Files

    From using the Stack Exchange Data Dump as the data source (these zip files have a DD_ prefix):

    • Raw dataset before processing: saved_dataset.csv (DD_saved_dataset.zip)
    • Completed tag count: tag_count.csv (DD_tag_count.zip)
    • Processed dataset with completed evaluations: dataset_results.csv (DD_dataset_results.zip)

    From using Google BigQuery as the data source (these zip files have a BQ_ prefix):

    • Raw dataset before processing: saved_dataset.csv (BQ_saved_dataset.zip)
    • Completed tag count: tag_count.csv (BQ_tag_count.zip)
    • No large-scale evaluation was completed when using BigQuery as a data source.

    As noted in the linked repo, the use of Google BigQuery as a data source is not recommended for this work, but the working code and dataset have nonetheless been provided for completeness.

    License

    This dataset is licensed under the CC BY-SA 4.0 license, the same license used by the Stack Exchange Data Dump.

  14. h

    StackOverflow-QA-C-Language-40k

    • huggingface.co
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Zhang (2023). StackOverflow-QA-C-Language-40k [Dataset]. https://huggingface.co/datasets/Mxode/StackOverflow-QA-C-Language-40k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2023
    Authors
    Max Zhang
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This is a collection of ~40k QA's in C Language from StackOverflow. The data has been initially cleaned, and each response is with Accepted Answer. All data is <1000 in length. The questions and answers were organized into a one-line format. A sample format is shown below: { "question": "``` FILE* file = fopen(some file)

    pcap_t* pd = pcap_fopen_offline(file)

    pcap_close(pd)

    fclose(file) ```

    This code occurs double free error.

    Could you explain about this happening?

    My… See the full description on the dataset page: https://huggingface.co/datasets/Mxode/StackOverflow-QA-C-Language-40k.

  15. Data from: Stack Overflow Dataset

    • kaggle.com
    Updated May 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sathishkumar (2021). Stack Overflow Dataset [Dataset]. https://www.kaggle.com/datasets/klmsathishkumar/stack-overflow-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sathishkumar
    Description

    Dataset

    This dataset was created by Sathishkumar

    Contents

  16. h

    stackoverflow_linux

    • huggingface.co
    Updated Oct 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konrad Szafer (2023). stackoverflow_linux [Dataset]. https://huggingface.co/datasets/KonradSzafer/stackoverflow_linux
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2023
    Authors
    Konrad Szafer
    Description

    Dataset Card for "stackoverflow_linux"

    Dataset information:

    Source: Stack Overflow Category: Linux Number of samples: 300 Train/Test split: 270/30 Quality: Data come from the top 1k most upvoted questions

      Additional Information
    
    
    
    
    
      License
    

    All Stack Overflow user contributions are licensed under CC-BY-SA 3.0 with attribution required. More Information needed

  17. h

    ru_stackoverflow

    • huggingface.co
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilya Gusev (2024). ru_stackoverflow [Dataset]. https://huggingface.co/datasets/IlyaGusev/ru_stackoverflow
    Explore at:
    Dataset updated
    Jun 15, 2024
    Authors
    Ilya Gusev
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Russian StackOverflow dataset

      Description
    

    Summary: Dataset of questions, answers, and comments from ru.stackoverflow.com. Script: create_stackoverflow.py Point of Contact: Ilya Gusev Languages: The dataset is in Russian with some programming code.

      Usage
    

    Prerequisites: pip install datasets zstandard jsonlines pysimdjson

    Loading: from datasets import load_dataset dataset = load_dataset('IlyaGusev/ru_stackoverflow', split="train") for example in dataset:… See the full description on the dataset page: https://huggingface.co/datasets/IlyaGusev/ru_stackoverflow.

  18. o

    Data from: Are comments on Stack Overflow well organized for easy retrieval...

    • explore.openaire.eu
    • zenodo.org
    Updated Aug 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoxiang Zhang; Shaowei Wang; Tse-Hsun (Peter) Chen; Ahmed E. Ahmed E. Hassan (2020). Are comments on Stack Overflow well organized for easy retrieval by developers? [Dataset]. http://doi.org/10.5281/zenodo.4001369
    Explore at:
    Dataset updated
    Aug 26, 2020
    Authors
    Haoxiang Zhang; Shaowei Wang; Tse-Hsun (Peter) Chen; Ahmed E. Ahmed E. Hassan
    Description

    Many Stack Overflow answers have associated informative comments that can strengthen them and assist developers. A prior study found that comments can provide additional information to point out issues in their associated answer, such as the obsolescence of an answer. By showing more informative comments (e.g., the ones with higher scores) and hiding less informative ones, developers can more effectively retrieve information from the comments that are associated with an answer. Currently, Stack Overflow prioritizes the display of comments and as a result, 4.4 million comments (possibly including informative comments) are hidden by default from developers. In this study, we investigate whether this mechanism effectively organizes informative comments. We find that: 1) The current comment organization mechanism does not work well due to the large amount of tie-scored comments (e.g., 87% of the comments have 0-score). 2) In 97.3% of answers with hidden comments, at least one comment that is possibly informative is hidden while another comment with the same score is shown (i.e., unfairly hidden comments). The longest unfairly hidden comment is more likely to be informative than the shortest one. Our findings highlight that Stack Overflow should consider adjusting the comment organization mechanism to help developers effectively retrieve informative comments. Furthermore, we build a classifier that can effectively distinguish informative comments from uninformative comments. We also evaluate two alternative comment organization mechanisms (i.e., the Length mechanism and the Random mechanism) based on text similarity and the prediction of our classifier.

  19. o

    Data from: Does Location Influence Code Quality? Mining Stack Overflow...

    • ourarchive.otago.ac.nz
    • data.niaid.nih.gov
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elijah Zolduoarrati; Sherlock Licorish; Nigel Stanger (2025). Does Location Influence Code Quality? Mining Stack Overflow Snippets Across the United States – Replication Package [Dataset]. https://ourarchive.otago.ac.nz/esploro/outputs/dataset/Does-Location-Influence-Code-Quality-Mining/9926743827501891
    Explore at:
    Dataset updated
    May 3, 2025
    Dataset provided by
    Zenodo
    Authors
    Elijah Zolduoarrati; Sherlock Licorish; Nigel Stanger
    Time period covered
    May 3, 2025
    Area covered
    United States
    Description

    Developers routinely integrate Stack Overflow code snippets into their codebases. However, the quality of snippets embedded in users’ answers remain elusive, and existing evaluations of code quality tend to be language or context-specific. Moreover, literature have found that contribution patterns vary depending on geographical locales, creating an unexplained rift between code quality, user location, and latent contextual regional factors. The proposed study evaluates the quality of SQL, JavaScript, Python, Ruby, and Java snippets across reliability, readability, performance, and security dimensions, benchmarking findings across states in the USA and investigating how different diversity indicators correlate against code quality violations. The study culminates in a series of inductive content analyses that qualitatively supplement prior quality dimensions. This replication package is provided for those interested in further examining our research methodology.

  20. o

    Data from: Stack Overflow's Hidden Nuances: How Does Zip Code Define User...

    • ourarchive.otago.ac.nz
    Updated Oct 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elijah Zolduoarrati; Sherlock A. Licorish; Nigel Stanger (2024). Stack Overflow's Hidden Nuances: How Does Zip Code Define User Contribution? – Replication Package [Dataset]. https://ourarchive.otago.ac.nz/esploro/outputs/dataset/Stack-Overflows-Hidden-Nuances-How-Does/9926623828201891
    Explore at:
    Dataset updated
    Oct 25, 2024
    Dataset provided by
    Zenodo
    Authors
    Elijah Zolduoarrati; Sherlock A. Licorish; Nigel Stanger
    Time period covered
    Oct 25, 2024
    Description

    Collective intelligence constitutes a foundational element within online community question-and-answering (CQA) platforms, such as Stack Overflow, being the source of most programming-related issues. Despite this relevance, concerns remain regarding issues surrounding user participation. Precedent research tends to focus on simple numerical measurements to analyse participation, which may sideline the inherent, subtler aspects. The proposed study aims to bridge this gap by operationalising 11 distinct metrics to represent user participation, behaviour, and community value across different regions of the USA. The study also conducts inductive content analysis to understand the impact of regional contextual factors on users' knowledge sharing patterns. This replication package is provided for those interested in further examining our research methodology.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=AbyocA (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
Organization logo

Data from: Stack Overflow

Related Article
Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 4, 2020
Dataset provided by
Googlehttp://google.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu