34 datasets found

Quora Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jun 22, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2016). Quora Dataset [Dataset]. https://brightdata.com/products/datasets/quora
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jun 22, 2016
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
We'll tailor a Quora dataset to meet your unique needs, encompassing question topics, user engagement metrics, answer quality, demographic data of users, upvote counts, comment statistics, and other pertinent metrics. Leverage our Quora datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to grasp user preferences and community trends, facilitating nuanced content development and engagement initiatives. Customize your access to the entire dataset or specific subsets as per your business requisites. Popular use cases involve optimizing content strategy based on question popularity and user engagement, enhancing marketing strategies through targeted audience segmentation, and identifying and forecasting trends in community discussions to stay ahead in the knowledge-sharing landscape.
m
Quora Statistics and Facts
market.biz
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.biz (2025). Quora Statistics and Facts [Dataset]. https://market.biz/quora-statistics/
Explore at:
Dataset updated
Aug 4, 2025
Dataset provided by
Market.biz
License
https://market.biz/privacy-policyhttps://market.biz/privacy-policy
Time period covered
2022 - 2032
Area covered
Europe, Australia, North America, South America, Africa, ASIA
Description
Introduction

Quora Statistics: Quora stands as the biggest question and answer platform in the world, has more than 400 MAUs worldwide. Additionally, Quora has made substantial investments in its AI chat platform, POE. So, these developments have led to Quoraâ€™s valuation reaching $500 million. It is ranked as the 17th most widespread social media platform based on user numbers.

The platform involves more than 300,000 topics, spanning technology, science, the arts, and beyond, allowing users to dig into a broad spectrum of subjects. Daily, individuals pose between 5,000 and 7,000 questions on Quora, developing significant engagement. On average, users dedicate approximately 8 minutes per visit. Approximately 60% of Quora's user base consists of men, illustrating the platformâ€™s varied demographic.
h
Data from: quora-question-answer-dataset
huggingface.co
Updated Sep 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory Bizup (2023). quora-question-answer-dataset [Dataset]. https://huggingface.co/datasets/toughdata/quora-question-answer-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2023
Authors
Gregory Bizup
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
Quora Question Answer Dataset (Quora-QuAD) contains 56,402 question-answer pairs scraped from Quora.

Usage:

For instructions on fine-tuning a model (Flan-T5) with this dataset, please check out the article: https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset
e
quora.com Traffic Analytics Data
analytics.explodingtopics.com
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). quora.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/quora.com
Explore at:
Dataset updated
Aug 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
Description
Traffic analytics, rankings, and competitive metrics for quora.com as of August 2025
h
quora-duplicates
huggingface.co
Updated Apr 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sentence Transformers (2024). quora-duplicates [Dataset]. https://huggingface.co/datasets/sentence-transformers/quora-duplicates
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2024
Dataset authored and provided by
Sentence Transformers
Description
Dataset Card for Quora Duplicate Questions

This dataset contains the Quora Question Pairs dataset in four formats that are easily used with Sentence Transformers to train embedding models. The data was originally created by Quora for this Kaggle Competition.

Dataset Subsets pair-class subset

Columns: "sentence1", "sentence2", "label" Column types: str, str, class with {"0": "different", "1": "duplicate"} Examples:{ 'sentence1': 'What is the step by step… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/quora-duplicates.
Change in Quora Ads budgets among PPC marketers worldwide 2024
statista.com
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Change in Quora Ads budgets among PPC marketers worldwide 2024 [Dataset]. https://www.statista.com/statistics/1549661/change-quora-ads-budgets-worldwide/
Explore at:
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2, 2023 - Dec 24, 2023
Area covered
Worldwide
Description
During a December 2023 survey, ** percent of responding pay-per-click marketers worldwide said they planned to maintain their Quora Ads budgets. Slight increases were reported by *** percent, while *** percent planned significant increases. Additionally, ** percent and ** percent stated they intended to significantly or slightly decrease their budgets, respectively.
p
Quora Secondary School
publicschoolreview.com
json, xml
Updated Sep 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review (2025). Quora Secondary School [Dataset]. https://www.publicschoolreview.com/quora-secondary-school-profile
Explore at:
json, xmlAvailable download formats
Dataset updated
Sep 5, 2025
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2009 - Dec 31, 2025
Description
Historical Dataset of Quora Secondary School is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2009-2023),Total Classroom Teachers Trends Over Years (2009-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2009-2023),American Indian Student Percentage Comparison Over Years (2013-2023),Asian Student Percentage Comparison Over Years (2013-2023),Hispanic Student Percentage Comparison Over Years (2009-2023),Black Student Percentage Comparison Over Years (2009-2023),White Student Percentage Comparison Over Years (2009-2023),Two or More Races Student Percentage Comparison Over Years (2012-2023),Diversity Score Comparison Over Years (2009-2023),Free Lunch Eligibility Comparison Over Years (2009-2023),Reduced-Price Lunch Eligibility Comparison Over Years (2009-2023),Reading and Language Arts Proficiency Comparison Over Years (2011-2022),Math Proficiency Comparison Over Years (2012-2023),Science Proficiency Comparison Over Years (2021-2022),Overall School Rank Trends Over Years (2012-2023),Graduation Rate Comparison Over Years (2013-2023)
Train and Test data of Quora question pairs
kaggle.com
zip
Updated May 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shilpa (2018). Train and Test data of Quora question pairs [Dataset]. https://www.kaggle.com/shilpil/quoratrainandtestset
Explore at:
zip(142191560 bytes)Available download formats
Dataset updated
May 27, 2018
Authors
Shilpa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Shilpa

Released under CC0: Public Domain

Contents

It contains the following files:
w
Quora Ads Field Reference Fields
windsor.ai
json
Updated Nov 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Windsor.ai (2021). Quora Ads Field Reference Fields [Dataset]. https://windsor.ai/data-field/quora/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 3, 2021
Dataset provided by
Windsor.ai
Variables measured
Date, Week, Year, Month, Spend, Today, Clicks, Source, Campaign, Week ISO, and 11 more
Description
Auto-generated structured data of Quora Ads Field Reference from table Fields
CS 405 NLP
kaggle.com
Updated Jun 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TonyChan (2017). CS 405 NLP [Dataset]. https://www.kaggle.com/tonychanyt/cs-405-nlp/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
TonyChan
Description
Context: https://www.kaggle.com/c/quora-question-pairs/data Content: predict which of the provided pairs of questions contain two questions with the same meaning. Acknowledgements: https://www.kaggle.com/c/quora-question-pairs/discussion/34534 Inspiration: Quora Question Pairs predictions
Z
Quora Chat Dutch
data.niaid.nih.gov
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanroy, Bram (2023). Quora Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052391
Explore at:
Dataset updated
Jun 20, 2023
Dataset authored and provided by
Vanroy, Bram
License
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Description
This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.

Data Instances

{ 'id': 10, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update ' 'zal krijgen? ' '[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de ' 'Android 7.0 (Nougat) update. ' '[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal ' 'zijn? ' '[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 ' 'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de ' 'update zal ontvangen. ' '[|Human|] Is er nieuws of het binnenkort de update zal ontvangen? ' '[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. ' 'Het is het beste om uit te kijken naar updates van Motorola. ' '[|Human|] ', 'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 ' '(Nougat) update?' }

Data Fields

id: the ID of the item. The following IDs are not included because they could not be translated: [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]

input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

topic: the topic description

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is:

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

"""

The prompt to translate the topic is:

TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
" summarizing manner. Translate accurately and formally. Do not provide any explanation"
" about the translation and do not include the original title.

"

The system message was:

You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740].

The translation quality has not been verified. Use at your own risk!

Licensing Information

Licensing info for quora is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.

Replication package for the paper "What do Developers Discuss about Code...

zenodo.org

zip

Updated Jun 30, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymous; Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comment Conventions" [Dataset]. http://doi.org/10.5281/zenodo.4479703

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4479703

Dataset updated

Jun 30, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonymous; Anonymous

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# RP-commenting-conventions-multiple-sources
Replication Package for the paper "What do Developers Discuss about Code Comment Conventions?"

## Structure
```
Appendix.pdf

RQ1/
  LDA_input/
    stackoverfow_raw_dataset.csv

  LDA_output/
    Mallet/
      output_csv/
        docs-in-topics.csv
        topic-words.csv
        topics-in-docs.csv
        topics-metadata.csv
      output_html/
        all_topics.html
        Docs/
        Topics/

RQ2/
  datasource_rawdata/
    mailing_lists_selection_criteria.csv
    quora.csv
    stackoverflow.csv
  manual_analysis_output/
    stackoverflow_quora_taxonomy.xlsx
```

## Contents of the Replication Package
---
- **Appendix.pdf**- Appendix of the paper containing supplement tables

- **RQ1/** - contains the data used to answer RQ1
 - **LDA_input/** - input data used for LDA analysis
  - `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
 - **LDA_output/**
  - **Mallet/** - contains the LDA output generated by MALLET tool
     - **output_csv/**
      - `docs-in-topics.csv` - documents per topic
      - `topic-words.csv` - most relevant topic words
      - `topics-in-docs.csv` - topic probability per document
      - `topics-metadata.csv` - metadata per document and topic probability
    - **output_html/** - Browsable results of mallet output
      - `all_topics.html`
      - `Docs/`
      - `Topics/`

- **RQ2/** - contains the data used to answer RQ2
 - **datasource_rawdata/** - contains the raw data for each source
  - `mailing_lists_selection_criteria.csv` - criteria used to select mailing_lists.
  - `quora.csv` - contains the processed dataset (like removing HTML tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using [Makar](https://github.com/maethub/makar) tool.
  - `stackoverflow.csv` - contains the processed Stack Overflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using [Makar](https://github.com/maethub/makar) tool.
 - **manual_analysis_output/**
  - `stackoverflow_quora_taxonomy.xlsx` - contains the classified dataset of Stack Overflow and quora and a description of taxonomy.
    - `Taxonomy` - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by `|` symbol. 
    - `stackoverflow-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
     - `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.     
---

Question Pairs Dataset
kaggle.com
Updated Feb 2, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quora (2017). Question Pairs Dataset [Dataset]. https://www.kaggle.com/datasets/quora/question-pairs-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Quora
Description
Context

Quora's first public dataset is related to the problem of identifying duplicate questions. At Quora, an important product principle is that there should be a single question page for each logically distinct question. For example, the queries “What is the most populous state in the USA?” and “Which state in the United States has the most people?” should not exist separately on Quora because the intent behind both is identical. Having a canonical page for each logically distinct query makes knowledge-sharing more efficient in many ways: for example, knowledge seekers can access all the answers to a question in a single location, and writers can reach a larger readership than if that audience was divided amongst several pages.

The dataset is based on actual data from Quora and will give anyone the opportunity to train and test models of semantic equivalence.

Content

There are over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair.

Acknowledgements

For more information on this dataset, check out Quora's first dataset release page.

License

This data is subject to Quora's Terms of Service, allowing for non-commercial use.
Z
Replication package for the paper "What do Developers Discuss about Code...
data.niaid.nih.gov
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
Explore at:
Dataset updated
Jun 30, 2021
Dataset authored and provided by
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RP-commenting-practices-multiple-sources

Replication package for the paper "What do Developers Discuss about Code Comments?"

Structure

Appendix.pdf Tags-topics.md Stack-exchange-query.md RQ1/ LDA_input/ combined-so-quora-mallet-metadata.csv topic-input.mallet LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

Appendix.pdf- Appendix of the paper containing supplement tables

Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

RQ1/ - contains the data used to answer RQ1

LDA_input/ - input data used for LDA analysis

combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis

topic-input.mallet - input file to the mallet tool

LDA_output/

Mallet/ - contains the LDA output generated by MALLET tool

output_csv/

docs-in-topics.csv - documents per topic

topic-words.csv - most relevant topic words

topics-in-docs.csv - topic probability per document

topics-metadata.csv - metadata per document and topic probability

output_html/ - Browsable results of mallet output

all_topics.html

Docs/

Topics/

RQ2/ - contains the data used to answer RQ2

datasource_rawdata/ - contains the raw data for each source

quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

manual_analysis_output/

stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.

Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.

stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
MAUs of Zhihu Inc. in China Q1 2021- Q4 2024
statista.com
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). MAUs of Zhihu Inc. in China Q1 2021- Q4 2024 [Dataset]. https://www.statista.com/statistics/1298392/zhihu-monthly-active-users/
Explore at:
Dataset updated
Jun 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
China
Description
In the forth quarter of 2024, Zhihu Inc. reported its average monthly active users (MAUs) having reached about **** million. Meanwhile, the Chinese Quora-like Q&A platform attracted an average of **** million monthly active paying users in that quarter.
d
Replication Data for: Climate change communication in an online Q&A...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiang, Hanchen (2023). Replication Data for: Climate change communication in an online Q&A community: a case study of Quora [Dataset]. http://doi.org/10.7910/DVN/UQZUPO
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/UQZUPO
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Jiang, Hanchen
Description
Replication Data and R code for: Climate change communication in an online Q&A community: a case study of Quora
Revenue of Zhihu.Inc in China Q2 2024, by segment
statista.com
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Revenue of Zhihu.Inc in China Q2 2024, by segment [Dataset]. https://www.statista.com/statistics/1298401/zhihu-revenue-breakdown/
Explore at:
Dataset updated
Jul 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
China
Description
With about almost ** million U.S. dollars of income in the second quarter of 2024, the paid membership became the largest revenue source for Zhihu, a Chinese Quora-like platform. In the challenging macroeconomic environment, Zhihu Inc.'s advertising revenues plunged.
poojaruhal/RP-commenting-practices-social-media:...
zenodo.org
zip
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pooja Rani; Mathias Birrer; Pooja Rani; Mathias Birrer (2020). poojaruhal/RP-commenting-practices-social-media: RP-commenting-practices-social-media: RP_TOSEM_2020 v.1.0.1 Second release of of the replication Package for the paper "What do Developers Discuss about Code Comment Conventions on Social Media" [Dataset]. http://doi.org/10.5281/zenodo.3987433
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3987433
Dataset updated
Aug 17, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pooja Rani; Mathias Birrer; Pooja Rani; Mathias Birrer
Description
RP-commenting-practices-social-media

Replication Package for the paper "What do Developers Discuss about Code Comment Conventions on Social Media?"

Structure

Paper-presenation.pdf Makar_tool/ Data/ stackoverfow_questions_with_answers_by_tags.csv stackoverfow_tags_metrics.csv apache_mailing_list.csv mailing_lists_ASF_@dev_@users_1.csv mailing_lists_ASF_@dev_@users_2.csv quora.csv sample_stackoverfow_questions_with_answers_by_tags.csv Schemas/ apache_mailing_lists.json quora.json stackoverfow_questions_answers_by_tag.json stackoverfow_tag_count.json stackoverfow_tag_metrics.json RQ1/ LDA_input/ stackoverfow_raw_dataset.csv LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ mailing_lists_selection_criteria.csv quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx

Contents of the Replication Package

Paper-presenation.pdf presents the highlights of the work in a presenation.

Makar_tool/ contains the data processed using the tool for the study

Data/

stackoverfow_questions_with_answers_by_tags.csv - all stackoverflow questions used in the study as stored in Makar

stackoverfow_tags_metrics.csv - all data containing the calculations done for stackoverflow tag selection

apache_mailing_list.csv - statistically significant sample of mailing_lists_ASF_@dev_@users_1.csv and mailing_lists_ASF_@dev_@users_2.csv used in the study

mailing_lists_ASF_@dev_@users_1.csv - mailing list data used in the study as stored in Makar (part 1)

mailing_lists_ASF_@dev_@users_2.csv - mailing list data used in the study as stored in Makar (part 2)

quora.csv - all quora questions used in the study as stored in Makar

sample_stackoverfow_questions_with_answers_by_tags - statistically significant sample of stackoverfow_questions_with_answers_by_tags.csv used in the study

Schemas/

apache_mailing_lists.json - data schema used in Makar to store mailing list data

quora.json - data schema used in Makar to store quora data

stackoverfow_questions_answers_by_tag.json - data schema used in Makar to store stackoverflow questions data

stackoverfow_tag_count.json - data schema used in Makar to lookup number of questions per tag available in stackoverflow

stackoverfow_tag_metrics.json - data schema used in Makar to stackoverflow tag metrics data

RQ1/ - contains the data used to answer RQ1

LDA_input/ - input data used for LDA analysis

stackoverfow_raw_dataset.csv - stackoverflow questions used to perform LDA analysis

LDA_output/

Mallet/ - contains the LDA output generated by MALLET tool

output_csv/

docs-in-topics.csv - documents per topic

topic-words.csv - most relevant topic words

topics-in-docs.csv - topic probability per document

topics-metadata.csv - metadata per document and topic probability

output_html/ - Browsable results of mallet output

all_topics.html

Docs/

Topics/

RQ2/ - contains the data used to answer RQ2

datasource_rawdata/ - contains the raw data for each source

mailing_lists_selection_criteria.csv - criteria used to select mailing_lists.

quora.csv - contains the processed dataset (like removing HTML tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.

manual_analysis_output/

stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.

Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.

stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
Filming location share of cable scripted TV shows in North America 2015-2017...
statista.com
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Filming location share of cable scripted TV shows in North America 2015-2017 [Dataset]. https://www.statista.com/statistics/696837/cable-scripted-shows-by-location/
Explore at:
Dataset updated
Jul 7, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
This statistic presents the share of cable scripted shows in the United States in the ******* and ******* seasons, sorted by location. According to the source, in the ******* season **** percent of cable scripted shows were shot in Los Angeles, down from **** percent in the previous year.
h
Data from: qqp
huggingface.co
Updated Oct 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bias-amplified-splits (2023). qqp [Dataset]. https://huggingface.co/datasets/bias-amplified-splits/qqp
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset authored and provided by
bias-amplified-splits
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2016). Quora Dataset [Dataset]. https://brightdata.com/products/datasets/quora

Quora Dataset

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Jun 22, 2016

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

We'll tailor a Quora dataset to meet your unique needs, encompassing question topics, user engagement metrics, answer quality, demographic data of users, upvote counts, comment statistics, and other pertinent metrics. Leverage our Quora datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to grasp user preferences and community trends, facilitating nuanced content development and engagement initiatives. Customize your access to the entire dataset or specific subsets as per your business requisites. Popular use cases involve optimizing content strategy based on question popularity and user engagement, enhancing marketing strategies through targeted audience segmentation, and identifying and forecasting trends in community discussions to stay ahead in the knowledge-sharing landscape.

Clear search

Close search

Google apps

Main menu

Quora Dataset

Quora Statistics and Facts

Data from: quora-question-answer-dataset

quora.com Traffic Analytics Data

quora-duplicates

Change in Quora Ads budgets among PPC marketers worldwide 2024

Quora Secondary School

Train and Test data of Quora question pairs

Dataset

Contents

Quora Ads Field Reference Fields

CS 405 NLP

Quora Chat Dutch

Replication package for the paper "What do Developers Discuss about Code...

Question Pairs Dataset

Context

Content

Acknowledgements

License

Replication package for the paper "What do Developers Discuss about Code...

RP-commenting-practices-multiple-sources

Structure

Contents of the Replication Package

- quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

MAUs of Zhihu Inc. in China Q1 2021- Q4 2024

Replication Data for: Climate change communication in an online Q&A...

Revenue of Zhihu.Inc in China Q2 2024, by segment

poojaruhal/RP-commenting-practices-social-media:...

Filming location share of cable scripted TV shows in North America 2015-2017...

Data from: qqp

Quora Dataset

- `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.