Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication package for the paper "What do Developers Discuss about Code Comments?"
Appendix.pdf
Tags-topics.md
Stack-exchange-query.md
RQ1/
LDA_input/
combined-so-quora-mallet-metadata.csv
topic-input.mallet
LDA_output/
Mallet/
output_csv/
docs-in-topics.csv
topic-words.csv
topics-in-docs.csv
topics-metadata.csv
output_html/
all_topics.html
Docs/
Topics/
RQ2/
datasource_rawdata/
quora.csv
stackoverflow.csv
manual_analysis_output/
stackoverflow_quora_taxonomy.xlsx
Appendix.pdf- Appendix of the paper containing supplement tables
Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)
Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.
RQ1/ - contains the data used to answer RQ1
combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysistopic-input.mallet - input file to the mallet tooldocs-in-topics.csv - documents per topictopic-words.csv - most relevant topic wordstopics-in-docs.csv - topic probability per documenttopics-metadata.csv - metadata per document and topic probabilityall_topics.htmlDocs/Topics/RQ2/ - contains the data used to answer RQ2
quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol. stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
Facebook
TwitterFrequently used terms and phrases in various Program Guidelines and Applications. For additional information, visit the department Funding page: https://www.austintexas.gov/department/economic-development/funding
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 70,427 cross-linked Twitter-[GHTorrent](http://ghtorrent.org) user pairs identified as likely belonging to the same users. The dataset accompanies our research paper:
@inproceedings{fang2020tweet,
author = {Fang, Hongbo and Klug, Daniel and Lamba, Hemank and Herbsleb, James and Vasilescu, Bogdan},
title = {Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter},
booktitle = {International Conference on Mining Software Repositories (MSR)},
year = {2020},
pages = {to appear},
publisher = {ACM},
}
The data cannot be used for any purpose other than conducting research.
Due to privacy concerns, we only release the user IDs in Twitter and GHTorrent, respectively. We expect that users of this dataset will be able to collect other data using the Twitter API and GHTorrent, as needed. Please see below for an example.
To query the Twitter API for a given user_id, you can:
Apply for Twitter developer account here.
Create an APP with your Twitter developer account, and create “API key” and “API secret key”.
Obtain an access token. Given the previous
curl -u "
The response looks like this: {"token_type":"bearer","access_token":"<...>"}
Copy the "access_token".
Given the previous access token, run:
curl --request GET --url "https://api.twitter.com/1.1/users/show.json?user_id=
The GHTorrent user ids map to the users table in the MySQL version of GHTorrent. To use GHTorrent, please follow instructions on the GHTorrent website.
Facebook
TwitterThis is the dataset used in the respective research work. The abstract is available below.
If you want to cite this work, please use:
Georgia M. Kapitsaki, Maria Papoutsoglou, Daniel German and Lefteris Angelis, What do developers talk about open source software licensing?, to appear in the Proceedings of the Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2020.
Free and open source software has gained a lot of momentum in the industry and the research community. Open source licenses determine the rules, under which the open source software can be further used and distributed. Previous works have examined the usage of open source licenses in the framework of specific projects or online social coding platforms, examining developers specific licensing views for specific software. However, the questions practitioners ask about licenses and licensing as captured in Question and Answer websites also constitute an important aspect toward understanding practitioners general licenses and licensing concerns. In this paper, we investigate open source license discussions using data from the Software Engineering, Open Source and Law Stack Exchange sites that contain relevant data. We describe the process used for the data collection and analysis, and discuss the main results. Our results indicate that clarifications about specific licenses and specific license terms are required. The results can be useful for developers, educators and license authors.
Facebook
TwitterEconomic Development Department Definition Guide
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication package for the paper "What do Developers Discuss about Code Comments?"
Appendix.pdf
Tags-topics.md
Stack-exchange-query.md
RQ1/
LDA_input/
combined-so-quora-mallet-metadata.csv
topic-input.mallet
LDA_output/
Mallet/
output_csv/
docs-in-topics.csv
topic-words.csv
topics-in-docs.csv
topics-metadata.csv
output_html/
all_topics.html
Docs/
Topics/
RQ2/
datasource_rawdata/
quora.csv
stackoverflow.csv
manual_analysis_output/
stackoverflow_quora_taxonomy.xlsx
Appendix.pdf- Appendix of the paper containing supplement tables
Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)
Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.
RQ1/ - contains the data used to answer RQ1
combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysistopic-input.mallet - input file to the mallet tooldocs-in-topics.csv - documents per topictopic-words.csv - most relevant topic wordstopics-in-docs.csv - topic probability per documenttopics-metadata.csv - metadata per document and topic probabilityall_topics.htmlDocs/Topics/RQ2/ - contains the data used to answer RQ2
quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol. stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.