5 datasets found
  1. Z

    Replication package for the paper "What do Developers Discuss about Code...

    • data.niaid.nih.gov
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
    Explore at:
    Dataset updated
    Jun 30, 2021
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RP-commenting-practices-multiple-sources

    Replication package for the paper "What do Developers Discuss about Code Comments?"

    Structure

    Appendix.pdf
    Tags-topics.md
    Stack-exchange-query.md
    
    RQ1/
      LDA_input/
        combined-so-quora-mallet-metadata.csv
        topic-input.mallet
    
      LDA_output/
        Mallet/
          output_csv/
            docs-in-topics.csv
            topic-words.csv
            topics-in-docs.csv
            topics-metadata.csv
          output_html/
            all_topics.html
            Docs/
            Topics/
    
    RQ2/
      datasource_rawdata/
        quora.csv
        stackoverflow.csv
      manual_analysis_output/
        stackoverflow_quora_taxonomy.xlsx
    

    Contents of the Replication Package

    • Appendix.pdf- Appendix of the paper containing supplement tables

    • Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

    • Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

    • RQ1/ - contains the data used to answer RQ1

      • LDA_input/ - input data used for LDA analysis
      • combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis
      • topic-input.mallet - input file to the mallet tool
      • LDA_output/
      • Mallet/ - contains the LDA output generated by MALLET tool
        • output_csv/
          • docs-in-topics.csv - documents per topic
          • topic-words.csv - most relevant topic words
          • topics-in-docs.csv - topic probability per document
          • topics-metadata.csv - metadata per document and topic probability
        • output_html/ - Browsable results of mallet output
          • all_topics.html
          • Docs/
          • Topics/
    • RQ2/ - contains the data used to answer RQ2

      • datasource_rawdata/ - contains the raw data for each source
      • quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
      • stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
      • manual_analysis_output/
      • stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
        • Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.
        • stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

          - quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

  2. d

    Economic Development Department Definition Guide

    • catalog.data.gov
    • datahub.austintexas.gov
    • +1more
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Economic Development Department Definition Guide [Dataset]. https://catalog.data.gov/dataset/economic-development-department-definition-guide
    Explore at:
    Dataset updated
    Oct 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    Frequently used terms and phrases in various Program Guidelines and Applications. For additional information, visit the department Funding page: https://www.austintexas.gov/department/economic-development/funding

  3. Data from: Need for Tweet: How Open Source Developers Talk About Their...

    • zenodo.org
    csv
    Updated Mar 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu; Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu (2020). Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter [Dataset]. http://doi.org/10.5281/zenodo.3711500
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 16, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu; Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 70,427 cross-linked Twitter-[GHTorrent](http://ghtorrent.org) user pairs identified as likely belonging to the same users. The dataset accompanies our research paper:

    @inproceedings{fang2020tweet,

    author = {Fang, Hongbo and Klug, Daniel and Lamba, Hemank and Herbsleb, James and Vasilescu, Bogdan},

    title = {Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter},

    booktitle = {International Conference on Mining Software Repositories (MSR)},

    year = {2020},

    pages = {to appear},

    publisher = {ACM},

    }

    The data cannot be used for any purpose other than conducting research.

    Due to privacy concerns, we only release the user IDs in Twitter and GHTorrent, respectively. We expect that users of this dataset will be able to collect other data using the Twitter API and GHTorrent, as needed. Please see below for an example.

    To query the Twitter API for a given user_id, you can:

    • Apply for Twitter developer account here.

    • Create an APP with your Twitter developer account, and create “API key” and “API secret key”.

    • Obtain an access token. Given the previous

    curl -u "

    The response looks like this: {"token_type":"bearer","access_token":"<...>"}

    Copy the "access_token".

    • Given the previous access token, run:

    curl --request GET --url "https://api.twitter.com/1.1/users/show.json?user_id=

    The GHTorrent user ids map to the users table in the MySQL version of GHTorrent. To use GHTorrent, please follow instructions on the GHTorrent website.

  4. Z

    Dataset from "What do developers talk about open source software licensing?...

    • data.niaid.nih.gov
    Updated Jun 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia M. Kapitsaki; Maria Papoutsoglou; Daniel German; Lefteris Angelis (2020). Dataset from "What do developers talk about open source software licensing? " - SEAA2020 [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3871564
    Explore at:
    Dataset updated
    Jun 1, 2020
    Dataset provided by
    Aristotle University of Thessaloniki
    University of Victoria
    University of Cyprus
    Authors
    Georgia M. Kapitsaki; Maria Papoutsoglou; Daniel German; Lefteris Angelis
    Description

    This is the dataset used in the respective research work. The abstract is available below.

    If you want to cite this work, please use:

    Georgia M. Kapitsaki, Maria Papoutsoglou, Daniel German and Lefteris Angelis, What do developers talk about open source software licensing?, to appear in the Proceedings of the Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020.

    Free and open source software has gained a lot of momentum in the industry and the research community. Open source licenses determine the rules, under which the open source software can be further used and distributed. Previous works have examined the usage of open source licenses in the framework of specific projects or online social coding platforms, examining developers specific licensing views for specific software. However, the questions practitioners ask about licenses and licensing as captured in Question and Answer websites also constitute an important aspect toward understanding practitioners general licenses and licensing concerns. In this paper, we investigate open source license discussions using data from the Software Engineering, Open Source and Law Stack Exchange sites that contain relevant data. We describe the process used for the data collection and analysis, and discuss the main results. Our results indicate that clarifications about specific licenses and specific license terms are required. The results can be useful for developers, educators and license authors.

  5. d

    Economic Development Department Definition Guide

    • catalog.data.gov
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Economic Development Department Definition Guide [Dataset]. https://catalog.data.gov/dataset/economic-development-department-definition-guide-8d17b
    Explore at:
    Dataset updated
    Oct 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    Economic Development Department Definition Guide

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125

Replication package for the paper "What do Developers Discuss about Code Comments"

Explore at:
Dataset updated
Jun 30, 2021
Dataset authored and provided by
Anonymous
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

RP-commenting-practices-multiple-sources

Replication package for the paper "What do Developers Discuss about Code Comments?"

Structure

Appendix.pdf
Tags-topics.md
Stack-exchange-query.md

RQ1/
  LDA_input/
    combined-so-quora-mallet-metadata.csv
    topic-input.mallet

  LDA_output/
    Mallet/
      output_csv/
        docs-in-topics.csv
        topic-words.csv
        topics-in-docs.csv
        topics-metadata.csv
      output_html/
        all_topics.html
        Docs/
        Topics/

RQ2/
  datasource_rawdata/
    quora.csv
    stackoverflow.csv
  manual_analysis_output/
    stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

  • Appendix.pdf- Appendix of the paper containing supplement tables

  • Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

  • Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

  • RQ1/ - contains the data used to answer RQ1

    • LDA_input/ - input data used for LDA analysis
    • combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis
    • topic-input.mallet - input file to the mallet tool
    • LDA_output/
    • Mallet/ - contains the LDA output generated by MALLET tool
      • output_csv/
        • docs-in-topics.csv - documents per topic
        • topic-words.csv - most relevant topic words
        • topics-in-docs.csv - topic probability per document
        • topics-metadata.csv - metadata per document and topic probability
      • output_html/ - Browsable results of mallet output
        • all_topics.html
        • Docs/
        • Topics/
  • RQ2/ - contains the data used to answer RQ2

    • datasource_rawdata/ - contains the raw data for each source
    • quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
    • stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
    • manual_analysis_output/
    • stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
      • Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.
      • stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

        - quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

Search
Clear search
Close search
Google apps
Main menu