40 datasets found
  1. E

    Quora Statistics And Facts (2025)

    • electroiq.com
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electro IQ (2025). Quora Statistics And Facts (2025) [Dataset]. https://electroiq.com/stats/quora-statistics/
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Electro IQ
    License

    https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    Quora Statistics: Quora is on the verge of establishing itself as a prominent question-answer website in which questioners can seek knowledge from a worldwide community. Over the years, Quora has dramatically expanded in users and finances. The present paper is dedicated to the statistical landscape of Quora statistics and trends to define its current standing within society.

  2. Quora Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, Quora Dataset [Dataset]. https://brightdata.com/products/datasets/quora
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    We'll tailor a Quora dataset to meet your unique needs, encompassing question topics, user engagement metrics, answer quality, demographic data of users, upvote counts, comment statistics, and other pertinent metrics. Leverage our Quora datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to grasp user preferences and community trends, facilitating nuanced content development and engagement initiatives. Customize your access to the entire dataset or specific subsets as per your business requisites. Popular use cases involve optimizing content strategy based on question popularity and user engagement, enhancing marketing strategies through targeted audience segmentation, and identifying and forecasting trends in community discussions to stay ahead in the knowledge-sharing landscape.

  3. h

    Data from: quora-question-answer-dataset

    • huggingface.co
    Updated Sep 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Bizup (2023). quora-question-answer-dataset [Dataset]. https://huggingface.co/datasets/toughdata/quora-question-answer-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2023
    Authors
    Gregory Bizup
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Quora Question Answer Dataset (Quora-QuAD) contains 56,402 question-answer pairs scraped from Quora.

      Usage:
    

    For instructions on fine-tuning a model (Flan-T5) with this dataset, please check out the article: https://www.toughdata.net/blog/post/finetune-flan-t5-question-answer-quora-dataset

  4. Quora Duplicate Questions Detection

    • kaggle.com
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Quora Duplicate Questions Detection [Dataset]. https://www.kaggle.com/datasets/thedevastator/quora-duplicate-questions-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Quora Duplicate Questions Detection

    Binary Classification of Potential Duplicate Questions on Quora

    By Social Media Data [source]

    About this dataset

    Quora Question Pairs for Duplicate Detection: A Comprehensive Dataset for Semantic Equivalence Modelling

    Embark on an enriched journey of semantic analysis with this comprehensive dataset sourced from Quora, a knowledge-sharing platform renowned globally. This dataset, named 'Question Pairs', throws light on semantic equivalence by providing potential duplicate questions present in the Quora community.

    With an impressive collection of over 400,000 pairs of potentially duplicate questions, this dataset serves as an ideal platform for training and validating various natural language processing models. It provides a real-time opportunity to learn and experiment with machine learning algorithms designed to detect semantic similarity among pieces of text - a problem that holds significant importance in the digital era.

    This extensive dataset is composed of several lines featuring potential question duplicates pairs. Each line within the file contains unique IDs assigned to each question within a pair, the whole text for each question, and finally, whether or not they are genuine duplicates through binary values (1 signifies duplicates; 0 otherwise).

    The intriguing aspect about this voluminous data set is its imbalanced structure when it comes to labeling true examples of duplicate pairs versus non-duplicates. To overcome this disbalance and provide better versatility for learners and researchers alike, the original sampling method was fine-tuned to include 'negative' examples or non-duplicates.

    These negative examples were derived from pairs that referred consistently to similar topics but were not semantically identical. However one must keep in mind that these negative additions don't disrupt proportional distribution since such related questions might indeed look like potential duplicates due to thematic similarities.

    It's important also remember that while comprehensive and substantial,the provided dataset isn't fully representative of all types of queries posed on Quora regularly.This disparity arises partially due to our selective sanitization process which includes removal tactics for entries like extremely elongated question descriptions causing irregularities.

    Furthermore,it should be noted,& appreciated,the ground-truth labels,whilst precise & accurate on many occasions,may still contain marginal levels of noise,hence aren't perfectly impeccable.The dataset is prepared under shared authorship by notables: Shankar Iyer, Nikhil Dandekar, and Kornél Csernai.

    The utilisation of this enriching resource is subject to abiding by Quora's Terms of ServicesPlease find the original dataset on its dedicated page at[Quora's data section.](https://data.quora.com/First-

    How to use the dataset

    1. Understanding the Data:

    Each row in this dataset contains a pair of questions that are potentially duplicates. There are three main columns you should focus on:

    • question1: This represents the text contents of the first question.
    • question2: This represents the text contents of the second question.
    • is_duplicate: This binary indicator communicates whether or not these two questions are indeed duplicates.

    2. Application:

    Applications can range from natural language processing tasks such as semantic similarity detection, instance matching, and data deduplication. These can solve real-world problems like creating a better search engine, recommendation systems and enhancing chatbot performance.

    3. Preprocessing:

    As per any other textual data analysis tasks, it’s important to clean up and preprocess your data before using this dataset for modeling purposes.

    A few steps could be: * Removing HTML tags (if any) * Converting all characters into lowercase * Removing punctuations * Tokenizing * Removing Stopwords

    You might consider using libraries like NLTK or Spacy for some of these preprocessing operations.

    4. Modeling & Evaluation:

    After preprocessing your data based on your specific needs, you could choose among several types of models depending upon what task you want to handle - both rule-based (like Levenshtein distance) and machine learning approaches (such as linear regression or decision trees), along with more advanced deep learning methods – LSTM (Long Short Term Memory), Siamese Networks etc where embeddings are used eit...

  5. e

    quora.com Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). quora.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/quora.com
    Explore at:
    Dataset updated
    May 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Mass Media Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for quora.com as of May 2025

  6. h

    quora-duplicates

    • huggingface.co
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sentence Transformers (2024). quora-duplicates [Dataset]. https://huggingface.co/datasets/sentence-transformers/quora-duplicates
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2024
    Dataset authored and provided by
    Sentence Transformers
    Description

    Dataset Card for Quora Duplicate Questions

    This dataset contains the Quora Question Pairs dataset in four formats that are easily used with Sentence Transformers to train embedding models. The data was originally created by Quora for this Kaggle Competition.

      Dataset Subsets
    
    
    
    
    
      pair-class subset
    

    Columns: "sentence1", "sentence2", "label" Column types: str, str, class with {"0": "different", "1": "duplicate"} Examples:{ 'sentence1': 'What is the step by step… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/quora-duplicates.

  7. quora.com Website Traffic, Ranking, Analytics [June 2025]

    • semrush.com
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). quora.com Website Traffic, Ranking, Analytics [June 2025] [Dataset]. https://www.semrush.com/website/quora.com/overview/
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/

    Time period covered
    Jul 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    quora.com is ranked #39 in US with 459.35M Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!

  8. p

    Quora Secondary School

    • publicschoolreview.com
    json, xml
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Quora Secondary School [Dataset]. https://www.publicschoolreview.com/quora-secondary-school-profile
    Explore at:
    json, xmlAvailable download formats
    Dataset updated
    Feb 9, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2009 - Dec 31, 2025
    Description

    Historical Dataset of Quora Secondary School is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2009-2023),Total Classroom Teachers Trends Over Years (2009-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2009-2023),American Indian Student Percentage Comparison Over Years (2013-2023),Asian Student Percentage Comparison Over Years (2013-2023),Hispanic Student Percentage Comparison Over Years (2009-2023),Black Student Percentage Comparison Over Years (2009-2023),White Student Percentage Comparison Over Years (2009-2023),Two or More Races Student Percentage Comparison Over Years (2012-2023),Diversity Score Comparison Over Years (2009-2023),Free Lunch Eligibility Comparison Over Years (2009-2023),Reduced-Price Lunch Eligibility Comparison Over Years (2009-2023),Reading and Language Arts Proficiency Comparison Over Years (2011-2022),Math Proficiency Comparison Over Years (2012-2023),Science Proficiency Comparison Over Years (2021-2022),Overall School Rank Trends Over Years (2012-2023),Graduation Rate Comparison Over Years (2013-2023)

  9. Change in Quora Ads budgets among PPC marketers worldwide 2024

    • statista.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Change in Quora Ads budgets among PPC marketers worldwide 2024 [Dataset]. https://www.statista.com/statistics/1549661/change-quora-ads-budgets-worldwide/
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2, 2023 - Dec 24, 2023
    Area covered
    Worldwide
    Description

    During a December 2023 survey, ** percent of responding pay-per-click marketers worldwide said they planned to maintain their Quora Ads budgets. Slight increases were reported by *** percent, while *** percent planned significant increases. Additionally, ** percent and ** percent stated they intended to significantly or slightly decrease their budgets, respectively.

  10. h

    Quora-Italian-Fauno-Baize

    • huggingface.co
    Updated Jan 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Bacciu (2015). Quora-Italian-Fauno-Baize [Dataset]. https://huggingface.co/datasets/andreabac3/Quora-Italian-Fauno-Baize
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2015
    Authors
    Andrea Bacciu
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Quora-Italian-Fauno-Baize

    This dataset is an Italian translation of the Quora dataset presented by Baize's authors.

      Languages
    

    Italian

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    Sentences 54,456 average number of turns 3.9 response lengths of each turn 35.9

      Data Fields
    

    topic, input

      Data Splits
    

    Train

      Dataset Creation
    
    
    
    
    
      Source Data
    
    
    
    
    
      Initial Data Collection and Normalization… See the full description on the dataset page: https://huggingface.co/datasets/andreabac3/Quora-Italian-Fauno-Baize.
    
  11. Train and Test data of Quora question pairs

    • kaggle.com
    zip
    Updated May 27, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shilpa (2018). Train and Test data of Quora question pairs [Dataset]. https://www.kaggle.com/shilpil/quoratrainandtestset
    Explore at:
    zip(142191560 bytes)Available download formats
    Dataset updated
    May 27, 2018
    Authors
    Shilpa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Shilpa

    Released under CC0: Public Domain

    Contents

    It contains the following files:

  12. w

    Quora Ads Field Reference Fields

    • windsor.ai
    json
    Updated Nov 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Windsor.ai (2021). Quora Ads Field Reference Fields [Dataset]. https://windsor.ai/data-field/quora/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 3, 2021
    Dataset provided by
    Windsor.ai
    Variables measured
    Date, Week, Year, Month, Spend, Today, Clicks, Source, Campaign, Week ISO, and 11 more
    Description

    Auto-generated structured data of Quora Ads Field Reference from table Fields

  13. Z

    Quora Chat Dutch

    • data.niaid.nih.gov
    • huggingface.co
    • +1more
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanroy, Bram (2023). Quora Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052391
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset authored and provided by
    Vanroy, Bram
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description

    This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.

    Data Instances

    { 'id': 10, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update ' 'zal krijgen? ' '[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de ' 'Android 7.0 (Nougat) update. ' '[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal ' 'zijn? ' '[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 ' 'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de ' 'update zal ontvangen. ' '[|Human|] Is er nieuws of het binnenkort de update zal ontvangen? ' '[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. ' 'Het is het beste om uit te kijken naar updates van Motorola. ' '[|Human|] ', 'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 ' '(Nougat) update?' }

    Data Fields

    id: the ID of the item. The following IDs are not included because they could not be translated: [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]

    input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

    topic: the topic description

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is:

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

    Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

    Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

    """

    The prompt to translate the topic is:

    TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
    " summarizing manner. Translate accurately and formally. Do not provide any explanation"
    " about the translation and do not include the original title.

    "

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

    Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740].

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    Licensing info for quora is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.

  14. quora_data

    • kaggle.com
    zip
    Updated Jun 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DĂ nae Canillas (2020). quora_data [Dataset]. https://www.kaggle.com/danaecanillas/quora-data
    Explore at:
    zip(380828647 bytes)Available download formats
    Dataset updated
    Jun 4, 2020
    Authors
    DĂ nae Canillas
    Description

    Dataset

    This dataset was created by DĂ nae Canillas

    Contents

    It contains the following files:

  15. quora_data

    • kaggle.com
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyndavi Ambati (2023). quora_data [Dataset]. https://www.kaggle.com/datasets/hyndaviambati/quora-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hyndavi Ambati
    Description

    Dataset

    This dataset was created by Hyndavi Ambati

    Contents

  16. Replication package for the paper "What do Developers Discuss about Code...

    • zenodo.org
    zip
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comment Conventions" [Dataset]. http://doi.org/10.5281/zenodo.4470126
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 30, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # RP-commenting-conventions-multiple-sources
    Replication Package for the paper "What do Developers Discuss about Code Comment Conventions?"
    
    ## Structure
    ```
    Appendix.pdf
    
    RQ1/
      LDA_input/
        stackoverfow_raw_dataset.csv
    
      LDA_output/
        Mallet/
          output_csv/
            docs-in-topics.csv
            topic-words.csv
            topics-in-docs.csv
            topics-metadata.csv
          output_html/
            all_topics.html
            Docs/
            Topics/
    
    RQ2/
      datasource_rawdata/
        mailing_lists_selection_criteria.csv
        quora.csv
        stackoverflow.csv
      manual_analysis_output/
        stackoverflow_quora_taxonomy.xlsx
    ```
    
    ## Contents of the Replication Package
    ---
    - **Appendix.pdf**- Appendix of the paper containing supplement tables
    
    - **RQ1/** - contains the data used to answer RQ1
     - **LDA_input/** - input data used for LDA analysis
      - `stackoverfow_raw_dataset.csv` - stackoverflow questions used to perform LDA analysis
     - **LDA_output/**
      - **Mallet/** - contains the LDA output generated by MALLET tool
         - **output_csv/**
          - `docs-in-topics.csv` - documents per topic
          - `topic-words.csv` - most relevant topic words
          - `topics-in-docs.csv` - topic probability per document
          - `topics-metadata.csv` - metadata per document and topic probability
        - **output_html/** - Browsable results of mallet output
          - `all_topics.html`
          - `Docs/`
          - `Topics/`
    
    - **RQ2/** - contains the data used to answer RQ2
     - **datasource_rawdata/** - contains the raw data for each source
      - `mailing_lists_selection_criteria.csv` - criteria used to select mailing_lists.
      - `quora.csv` - contains the processed dataset (like removing HTML tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using [Makar](https://github.com/maethub/makar) tool.
      - `stackoverflow.csv` - contains the processed Stack Overflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using [Makar](https://github.com/maethub/makar) tool.
     - **manual_analysis_output/**
      - `stackoverflow_quora_taxonomy.xlsx` - contains the classified dataset of Stack Overflow and quora and a description of taxonomy.
        - `Taxonomy` - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by `|` symbol. 
        - `stackoverflow-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.
         - `quota-posts` - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.     
    ---
    

  17. Question Pairs Dataset

    • kaggle.com
    Updated Feb 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quora (2017). Question Pairs Dataset [Dataset]. https://www.kaggle.com/datasets/quora/question-pairs-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Quora
    Description

    Context

    Quora's first public dataset is related to the problem of identifying duplicate questions. At Quora, an important product principle is that there should be a single question page for each logically distinct question. For example, the queries “What is the most populous state in the USA?” and “Which state in the United States has the most people?” should not exist separately on Quora because the intent behind both is identical. Having a canonical page for each logically distinct query makes knowledge-sharing more efficient in many ways: for example, knowledge seekers can access all the answers to a question in a single location, and writers can reach a larger readership than if that audience was divided amongst several pages.

    The dataset is based on actual data from Quora and will give anyone the opportunity to train and test models of semantic equivalence.

    Content

    There are over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair.

    Acknowledgements

    For more information on this dataset, check out Quora's first dataset release page.

    License

    This data is subject to Quora's Terms of Service, allowing for non-commercial use.

  18. Z

    Replication package for the paper "What do Developers Discuss about Code...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2021). Replication package for the paper "What do Developers Discuss about Code Comments" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4470125
    Explore at:
    Dataset updated
    Jun 30, 2021
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RP-commenting-practices-multiple-sources

    Replication package for the paper "What do Developers Discuss about Code Comments?"

    Structure

    Appendix.pdf
    Tags-topics.md
    Stack-exchange-query.md
    
    RQ1/
      LDA_input/
        combined-so-quora-mallet-metadata.csv
        topic-input.mallet
    
      LDA_output/
        Mallet/
          output_csv/
            docs-in-topics.csv
            topic-words.csv
            topics-in-docs.csv
            topics-metadata.csv
          output_html/
            all_topics.html
            Docs/
            Topics/
    
    RQ2/
      datasource_rawdata/
        quora.csv
        stackoverflow.csv
      manual_analysis_output/
        stackoverflow_quora_taxonomy.xlsx
    

    Contents of the Replication Package

    • Appendix.pdf- Appendix of the paper containing supplement tables

    • Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2)

    • Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer.

    • RQ1/ - contains the data used to answer RQ1

      • LDA_input/ - input data used for LDA analysis
      • combined-so-quora-mallet-metadata.csv - Stack overflow and Quora questions used to perform LDA analysis
      • topic-input.mallet - input file to the mallet tool
      • LDA_output/
      • Mallet/ - contains the LDA output generated by MALLET tool
        • output_csv/
          • docs-in-topics.csv - documents per topic
          • topic-words.csv - most relevant topic words
          • topics-in-docs.csv - topic probability per document
          • topics-metadata.csv - metadata per document and topic probability
        • output_html/ - Browsable results of mallet output
          • all_topics.html
          • Docs/
          • Topics/
    • RQ2/ - contains the data used to answer RQ2

      • datasource_rawdata/ - contains the raw data for each source
      • quora.csv - contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
      • stackoverflow.csv - contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool.
      • manual_analysis_output/
      • stackoverflow_quora_taxonomy.xlsx - contains the classified dataset of stackoverflow and quora and description of taxonomy.
        • Taxonomy - contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by | symbol.
        • stackoverflow-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

          - quota-posts - the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories.

  19. MAUs of Zhihu Inc. in China Q1 2021- Q4 2024

    • statista.com
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). MAUs of Zhihu Inc. in China Q1 2021- Q4 2024 [Dataset]. https://www.statista.com/statistics/1298392/zhihu-monthly-active-users/
    Explore at:
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    China
    Description

    In the forth quarter of 2024, Zhihu Inc. reported its average monthly active users (MAUs) having reached about **** million. Meanwhile, the Chinese Quora-like Q&A platform attracted an average of **** million monthly active paying users in that quarter.

  20. h

    Data from: qqp

    • huggingface.co
    Updated Oct 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bias-amplified-splits (2023). qqp [Dataset]. https://huggingface.co/datasets/bias-amplified-splits/qqp
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2023
    Dataset authored and provided by
    bias-amplified-splits
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Electro IQ (2025). Quora Statistics And Facts (2025) [Dataset]. https://electroiq.com/stats/quora-statistics/

Quora Statistics And Facts (2025)

Explore at:
Dataset updated
Apr 10, 2025
Dataset authored and provided by
Electro IQ
License

https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy

Time period covered
2022 - 2032
Area covered
Global
Description

Introduction

Quora Statistics: Quora is on the verge of establishing itself as a prominent question-answer website in which questioners can seek knowledge from a worldwide community. Over the years, Quora has dramatically expanded in users and finances. The present paper is dedicated to the statistical landscape of Quora statistics and trends to define its current standing within society.

Search
Clear search
Close search
Google apps
Main menu