5 datasets found

Z
Replication package for the paper "What do Developers Discuss about Code...
data.niaid.nih.gov
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
Explore at:
Dataset updated
Jun 30, 2021
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RP-commenting-practices-multiple-sources

Replication package for the paper "What do Developers Discuss about Code Comments?"

Structure

Appendix.pdf Tags-topics.md Stack-exchange-query.md RQ1/ LDA_input/ combined-so-quora-mallet-metadata.csv topic-input.mallet LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

Appendix.pdf- Appendix of the paper containing supplement tables

Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

RQ1/ - contains the data used to answer RQ1

LDA_input/ - input data used for LDA analysis

combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis

topic-input.mallet - input file to the mallet tool

LDA_output/

Mallet/ - contains the LDA output generated by MALLET tool

output_csv/

docs-in-topics.csv - documents per topic

topic-words.csv - most relevant topic words

topics-in-docs.csv - topic probability per document

topics-metadata.csv - metadata per document and topic probability

output_html/ - Browsable results of mallet output

all_topics.html

Docs/

Topics/

RQ2/ - contains the data used to answer RQ2

datasource_rawdata/ - contains the raw data for each source

quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

manual_analysis_output/

stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.

Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.

stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
d
Economic Development Department Definition Guide
catalog.data.gov
datahub.austintexas.gov
+1more
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). Economic Development Department Definition Guide [Dataset]. https://catalog.data.gov/dataset/economic-development-department-definition-guide
Explore at:
Dataset updated
Oct 25, 2025
Dataset provided by
data.austintexas.gov
Description
Frequently used terms and phrases in various Program Guidelines and Applications. For additional information, visit the department Funding page: https://www.austintexas.gov/department/economic-development/funding
Data from: Need for Tweet: How Open Source Developers Talk About Their...
zenodo.org
csv
Updated Mar 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu; Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu (2020). Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter [Dataset]. http://doi.org/10.5281/zenodo.3711500
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3711500
Dataset updated
Mar 16, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu; Hongbo Fang; Daniel Klug; Hemank Lamba; James Herbsleb; Bogdan Vasilescu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 70,427 cross-linked Twitter-[GHTorrent](http://ghtorrent.org) user pairs identified as likely belonging to the same users. The dataset accompanies our research paper:

@inproceedings{fang2020tweet,

author = {Fang, Hongbo and Klug, Daniel and Lamba, Hemank and Herbsleb, James and Vasilescu, Bogdan},

title = {Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter},

booktitle = {International Conference on Mining Software Repositories (MSR)},

year = {2020},

pages = {to appear},

publisher = {ACM},

}

The data cannot be used for any purpose other than conducting research.

Due to privacy concerns, we only release the user IDs in Twitter and GHTorrent, respectively. We expect that users of this dataset will be able to collect other data using the Twitter API and GHTorrent, as needed. Please see below for an example.

To query the Twitter API for a given user_id, you can:

Apply for Twitter developer account here.

Create an APP with your Twitter developer account, and create “API key” and “API secret key”.

Obtain an access token. Given the previous

curl -u "

The response looks like this: {"token_type":"bearer","access_token":"<...>"}

Copy the "access_token".

Given the previous access token, run:

curl --request GET --url "https://api.twitter.com/1.1/users/show.json?user_id=

The GHTorrent user ids map to the users table in the MySQL version of GHTorrent. To use GHTorrent, please follow instructions on the GHTorrent website.
Z
Dataset from "What do developers talk about open source software licensing?...
data.niaid.nih.gov
Updated Jun 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia M. Kapitsaki; Maria Papoutsoglou; Daniel German; Lefteris Angelis (2020). Dataset from "What do developers talk about open source software licensing? " - SEAA2020 [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3871564
Explore at:
Dataset updated
Jun 1, 2020
Dataset provided by
Aristotle University of Thessaloniki
University of Victoria
University of Cyprus
Authors
Georgia M. Kapitsaki; Maria Papoutsoglou; Daniel German; Lefteris Angelis
Description
This is the dataset used in the respective research work. The abstract is available below.

If you want to cite this work, please use:

Georgia M. Kapitsaki, Maria Papoutsoglou, Daniel German and Lefteris Angelis, What do developers talk about open source software licensing?, to appear in the Proceedings of the Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020.

Free and open source software has gained a lot of momentum in the industry and the research community. Open source licenses determine the rules, under which the open source software can be further used and distributed. Previous works have examined the usage of open source licenses in the framework of specific projects or online social coding platforms, examining developers specific licensing views for specific software. However, the questions practitioners ask about licenses and licensing as captured in Question and Answer websites also constitute an important aspect toward understanding practitioners general licenses and licensing concerns. In this paper, we investigate open source license discussions using data from the Software Engineering, Open Source and Law Stack Exchange sites that contain relevant data. We describe the process used for the data collection and analysis, and discuss the main results. Our results indicate that clarifications about specific licenses and specific license terms are required. The results can be useful for developers, educators and license authors.
d
Economic Development Department Definition Guide
catalog.data.gov
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). Economic Development Department Definition Guide [Dataset]. https://catalog.data.gov/dataset/economic-development-department-definition-guide-8d17b
Explore at:
Dataset updated
Oct 25, 2025
Dataset provided by
data.austintexas.gov
Description
Economic Development Department Definition Guide
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125

Replication package for the paper "What do Developers Discuss about Code Comments"

Explore at:

Dataset updated

Jun 30, 2021

Dataset authored and provided by

Anonymous

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

RP-commenting-practices-multiple-sources

Replication package for the paper "What do Developers Discuss about Code Comments?"

Structure

Appendix.pdf
Tags-topics.md
Stack-exchange-query.md

RQ1/
  LDA_input/
    combined-so-quora-mallet-metadata.csv
    topic-input.mallet

  LDA_output/
    Mallet/
      output_csv/
        docs-in-topics.csv
        topic-words.csv
        topics-in-docs.csv
        topics-metadata.csv
      output_html/
        all_topics.html
        Docs/
        Topics/

RQ2/
  datasource_rawdata/
    quora.csv
    stackoverflow.csv
  manual_analysis_output/
    stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

Appendix.pdf- Appendix of the paper containing supplement tables
Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)
Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.
RQ1/ - contains the data used to answer RQ1
- LDA_input/ - input data used for LDA analysis
- combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis
- topic-input.mallet - input file to the mallet tool
- LDA_output/
- Mallet/ - contains the LDA output generated by MALLET tool
  - output_csv/
    - docs-in-topics.csv - documents per topic
    - topic-words.csv - most relevant topic words
    - topics-in-docs.csv - topic probability per document
    - topics-metadata.csv - metadata per document and topic probability
  - output_html/ - Browsable results of mallet output
    - all_topics.html
    - Docs/
    - Topics/
RQ2/ - contains the data used to answer RQ2
- datasource_rawdata/ - contains the raw data for each source
- quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
- stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
- manual_analysis_output/
- stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
  - Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.
  - stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
    - quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

Clear search

Close search

Google apps

Main menu

Replication package for the paper "What do Developers Discuss about Code...

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

Economic Development Department Definition Guide

Data from: Need for Tweet: How Open Source Developers Talk About Their...

Dataset from "What do developers talk about open source software licensing?...

Economic Development Department Definition Guide

Replication package for the paper "What do Developers Discuss about Code Comments"

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

Replication package for the paper "What do Developers Discuss about Code...

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

Economic Development Department Definition Guide

Data from: Need for Tweet: How Open Source Developers Talk About Their...

Dataset from "What do developers talk about open source software licensing?...

Economic Development Department Definition Guide

Replication package for the paper "What do Developers Discuss about Code Comments"

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.