11 datasets found
  1. o

    Game Feedback Discord Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Game Feedback Discord Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c8ccbb59-2931-4b87-adef-d604fb0774b0
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset captures suggestions for improving a game, sourced from a Discord server where members can submit their ideas. Users have the ability to upvote these suggestions, and any suggestion accumulating 35 or more upvotes is forwarded to the game's developers. The primary purpose of compiling this dataset was for natural language processing (NLP) practice, but it also offers opportunities for applying statistical analysis to understand factors that contribute to a suggestion being sent to the developers. The dataset provides valuable insights into community feedback and engagement.

    Columns

    • SuggestionDate: The date on which the suggestion was originally made.
    • SuggestionText: The full text content of the suggestion submitted by a Discord server member.
    • CharCount: A numerical count of the characters within the 'SuggestionText' field.
    • SuggestionCategory: A categorisation for the suggestion; further details on these categories would typically be found in a more expansive dataset description.
    • Upvotes: The total number of upvotes received by the suggestion from other members of the Discord server.
    • ReportedToDevs: A boolean indicator specifying whether the suggestion was reported to the game development team.

    Distribution

    The dataset is typically provided in a tabular format, such as a CSV file. It contains a total of 158 individual records or rows, each representing a unique game improvement suggestion. The data includes suggestion dates ranging from 1st April 2022 to 26th April 2022. The character count for suggestions varies widely, from 1 to 1831 characters. Categories include 'Feature' (53%), 'Item' (32%), and 'Other' (16%). A small percentage of suggestions (16%) were reported to the developers.

    Usage

    This dataset is ideally suited for various analytical tasks. It can be used for natural language processing (NLP) exercises, such as sentiment analysis of suggestions, topic modelling, or text summarisation. Additionally, it is suitable for statistical analysis to identify correlations between suggestion characteristics (e.g., length, category, keywords) and their likelihood of receiving upvotes or being reported to the developers. Game developers, community managers, and data analysts can utilise this data to gain actionable insights into player feedback and prioritise development efforts.

    Coverage

    The dataset's geographic coverage is global, as the Discord server from which suggestions were drawn is accessible worldwide. The time range for the suggestions captured spans from 1st April 2022 to 26th April 2022. The demographic scope includes any member of the specific Discord server who submitted a suggestion. There are no specific notes on data availability limitations for particular groups or years within the provided information.

    License

    CC0

    Who Can Use It

    • Data Scientists & Machine Learning Engineers: For training NLP models on user-generated content or developing predictive models for suggestion virality.
    • Game Developers: To understand community sentiment, identify popular feature requests, and prioritise game improvements based on player feedback.
    • Community Managers: To analyse engagement patterns on their Discord servers and improve feedback collection processes.
    • Data Analysts: For performing statistical tests to determine which types of suggestions have a higher probability of being adopted or forwarded.
    • Students & Researchers: For academic projects involving text analysis, social media data, or community engagement studies.

    Dataset Name Suggestions

    • Grounded Suggestions via Discord Server
    • Game Feedback Discord Dataset
    • Player Suggestion Analysis
    • Discord Game Improvement Data

    Attributes

    Original Data Source: Grounded Suggestions via Discord Server

  2. Discord Survey

    • kaggle.com
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yonko (Czeslaw Meyer) (2022). Discord Survey [Dataset]. https://www.kaggle.com/datasets/yonkotoshiro/discord-survey/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yonko (Czeslaw Meyer)
    Description

    Results of a survey of 403 discord users. The selection was random, the servers were random, a lot of people refused to go through, but someone agreed. Interrogated only Russian-speaking people. When creating, I notified users that after completion I was going to analyze the data and post the results in the public domain. No any personal user data was collected either.

    In general, you can see that I like the discord, as well as some of the psychological focus of the questions. I have no experience in doing something like this, but still I tried to do everything as correctly as possible.

    This version is translated into English. Also cleaned data and removed or changed something that wasn't needed.

  3. Anime Subtitles

    • kaggle.com
    Updated Aug 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jess Fan (2021). Anime Subtitles [Dataset]. https://www.kaggle.com/datasets/jef1056/anime-subtitles/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jess Fan
    Description

    Content

    The original extracted versions (in .srt and .ass format) are also included in this release (which, idk why, but kaggle decompressed >:U)

    This dataset contains 1,497,770 messages across 3,836 episodes of anime. The raw dataset contains 1,563,442 messages, some of which were removed during cleaning.

    This version (V4) adapts the original (frankly, terrible) format into the newer format I developed, which is used in https://github.com/JEF1056/clean-discord. The Dataset folder contains compressed text files, which are compatable with tensorflow datasets. These can be streamed as a textlinedataset in the TSV format.

    V4 also fixes many (but not all) issues that the original cleaning script was too simple to realistically take care of. It also uses the clean-discord cleaner algorithms to make sentences more natural language than formatting. The script has also been optimized to run on multi-core systems, allowing it to complete cleaning this entire dataset in under 30 seconds on a 4-core machine. See the new and impoved script here: https://github.com/JEF1056/clean-discord/blob/v1.2/misc/anime.py (no longer bundled in the dataset files)

    Format

    The files are now all compressed to save space, and are compatable with tensorflow datasets. You can initialize a dataset function as such: def dataset_fn_local(split, shuffle_files=False): global nq_tsv_path del shuffle_files # Load lines from the text file as examples. files_to_read=[os.path.join(nq_tsv_path[split],filename) for filename in os.listdir(nq_tsv_path[split]) if filename.startswith(split)] print(f"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Split {split} contains {len(files_to_read)} files. First 10: {files_to_read[0:10]}") ds = tf.data.TextLineDataset(files_to_read, compression_type="GZIP").filter(lambda line:tf.not_equal(tf.strings.length(line),0)) ds = ds.shuffle(buffer_size=600000) ds = ds.map(functools.partial(tf.io.decode_csv, record_defaults=["",""], field_delim="\t", use_quote_delim=False), num_parallel_calls=tf.data.experimental.AUTOTUNE) ds = ds.map(lambda *ex: dict(zip(["question", "answer"], ex))) return ds

    Acknowledgements

    A sincere thanks to all of my friends for helping me come up with anime titles, a shoutout to the talented and dedicated people translating Japanese anime, and an even bigger thanks to Leen Chan for compiling the actual subtitles.

    This dataset is far from complete! I hope that people who are willing to find, add and clean the data are out there, and could do their best to try and help out in the effort to grow this data

  4. Z

    Dataset on the online cryptocurrency discussion on Twitter, Telegram, and...

    • data.niaid.nih.gov
    Updated Nov 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferrara, Emilio (2022). Dataset on the online cryptocurrency discussion on Twitter, Telegram, and Discord [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3895020
    Explore at:
    Dataset updated
    Nov 22, 2022
    Dataset provided by
    Tesconi, Maurizio
    Ferrara, Emilio
    Tardelli, Serena
    Cresci, Stefano
    Avvenuti, Marco
    Nizzoli, Leonardo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Dataset is described in Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020), a study that aims to map and assess the extent of cryptocurrency manipulations within and across the online ecosystems of Twitter, Telegram, and Discord. Starting from tweets mentioning cryptocurrencies, we leveraged and followed invite URLs from platform to platform, building the invite-link network, in order to study the invite link diffusion process.

    Please, refer to the paper below for more details.

    Nizzoli, L., Tardelli, S., Avvenuti, M., Cresci, S., Tesconi, M. & Ferrara, E. (2020). Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020).

    This dataset is composed of:

    ~16M tweet ids shared between March and May 2019, mentioning at least one of the 3,822 cryptocurrencies (cashtags) provided by the CryptoCompare public API;

    ~13k nodes of the invite-link network, i.e., the information about the Telegram/Discord channels and Twitter users involved in the cryptocurrency discussion (e.g., id, name, audience, invite URL);

    ~62k edges of the invite-link network, i.e., the information about the flow of invites (e.g., source id, target id, weight).

    With such information, one can easily retrieve the content of channels and messages through Twitter, Telegram, and Discord public APIs.

    Please, refer to the README file for more details about the fields.

  5. Z

    MultiSocial

    • data.niaid.nih.gov
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moro, Robert (2024). MultiSocial [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13846151
    Explore at:
    Dataset updated
    Oct 4, 2024
    Dataset provided by
    Moro, Robert
    Kopal, Jakub
    Macko, Dominik
    Srba, Ivan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.

    If you use this dataset in any publication, project, tool or in any other form, please, cite the a paper.

    Disclaimer

    Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously. The intended use if for non-commercial research purpose only.

    Data Source

    The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:

    Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.

    Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).

    Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.

    Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).

    WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.

    From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.

    The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).

    The dataset has the following fields:

    'text' - a text sample,

    'label' - 0 for human-written text, 1 for machine-generated text,

    'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,

    'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,

    'language' - the ISO 639-1 language code identifying the detected language of the given text,

    'length' - word count of the given text,

    'source' - a string identifying the source dataset / platform of the given text,

    'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.

    ToDo Statistics (under construction)

  6. h

    midjourney-texttoimage

    • huggingface.co
    Updated Jan 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2023). midjourney-texttoimage [Dataset]. https://huggingface.co/datasets/nateraw/midjourney-texttoimage
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2023
    Authors
    Nate Raw
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Midjourney User Prompts & Generated Images (250k)

      Dataset Summary
    
    
    
    
    
      General Context
    

    Midjourney is an independent research lab whose broad mission is to "explore new mediums of thought". In 2022, they launched a text-to-image service that, given a natural language prompt, produces visual depictions that are faithful to the description. Their service is accessible via a public Discord server, where users interact with a Midjourney bot. When issued… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/midjourney-texttoimage.

  7. P

    BAH Dataset

    • paperswithcode.com
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuela González-González; Soufiane Belharbi; Muhammad Osama Zeeshan; Masoumeh Sharafi; Muhammad Haseeb Aslam; Marco Pedersoli; Alessandro Lameiras Koerich; Simon L Bacon; Eric Granger (2025). BAH Dataset [Dataset]. https://paperswithcode.com/dataset/bah
    Explore at:
    Dataset updated
    May 24, 2025
    Authors
    Manuela González-González; Soufiane Belharbi; Muhammad Osama Zeeshan; Masoumeh Sharafi; Muhammad Haseeb Aslam; Marco Pedersoli; Alessandro Lameiras Koerich; Simon L Bacon; Eric Granger
    Description

    Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H.

    This paper introduces a first Behavioural Ambivalence/Hesitancy ( BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data.

    Additionally, this paper provides preliminary benchmarking results baseline models for BAH at frame- and video-level recognition with mono- and multi-modal setups. It also includes results on models for zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.

  8. h

    midjourney-prompts

    • huggingface.co
    Updated Aug 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Succinctly AI (2022). midjourney-prompts [Dataset]. https://huggingface.co/datasets/succinctly/midjourney-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2022
    Dataset authored and provided by
    Succinctly AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Midjourney is an independent research lab whose broad mission is to "explore new mediums of thought". In 2022, they launched a text-to-image service that, given a natural language prompt, produces visual depictions that are faithful to the description. Their service is accessible via a public Discord server: users issue a query in natural language, and the Midjourney bot returns AI-generated images that follow the given description. The raw dataset (with Discord messages) can be found on… See the full description on the dataset page: https://huggingface.co/datasets/succinctly/midjourney-prompts.

  9. h

    SlimPajama-627B

    • huggingface.co
    • opendatalab.com
    Updated Oct 2, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cerebras (2012). SlimPajama-627B [Dataset]. https://huggingface.co/datasets/cerebras/SlimPajama-627B
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 2, 2012
    Dataset authored and provided by
    Cerebras
    Description

    The dataset consists of 59166 jsonl files and is ~895GB compressed. It is a cleaned and deduplicated version of Together's RedPajama. Check out our blog post explaining our methods, our code on GitHub, and join the discussion on the Cerebras Discord.

      Getting Started
    

    You can download the dataset using Hugging Face datasets: from datasets import load_dataset ds = load_dataset("cerebras/SlimPajama-627B")

      Background
    

    Today we are releasing SlimPajama – the largest… See the full description on the dataset page: https://huggingface.co/datasets/cerebras/SlimPajama-627B.

  10. h

    pii-masking-200k

    • huggingface.co
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai4Privacy (2024). pii-masking-200k [Dataset]. http://doi.org/10.57967/hf/1532
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2024
    Dataset authored and provided by
    Ai4Privacy
    Description

    Ai4Privacy Community

    Join our community at https://discord.gg/FmzWshaaQT to help build open datasets for privacy masking.

      Purpose and Features
    

    Previous world's largest open dataset for privacy. Now it is pii-masking-300k The purpose of the dataset is to train models to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-200k.

  11. h

    unsupervised_peoples_speech

    • huggingface.co
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MLCommons (2025). unsupervised_peoples_speech [Dataset]. https://huggingface.co/datasets/MLCommons/unsupervised_peoples_speech
    Explore at:
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    MLCommons
    Description

    Dataset Card for Unsupervised Peoples Speech

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    The Unsupervised Peoples Speech Dataset is a compilation of audiofiles extracted from Archive.org that is licensed for academic and commercial usage under CC-BY and CC-BY-SA licenses. It includes more than one million hours of audio with a diverse set of speakers.

    Point of Contact: MLCommons Datasets Discord

      Dataset Structure
    

    This dataset is a collection of audio… See the full description on the dataset page: https://huggingface.co/datasets/MLCommons/unsupervised_peoples_speech.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Datasimple (2025). Game Feedback Discord Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c8ccbb59-2931-4b87-adef-d604fb0774b0

Game Feedback Discord Dataset

Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered
Entertainment & Media Consumption
Description

This dataset captures suggestions for improving a game, sourced from a Discord server where members can submit their ideas. Users have the ability to upvote these suggestions, and any suggestion accumulating 35 or more upvotes is forwarded to the game's developers. The primary purpose of compiling this dataset was for natural language processing (NLP) practice, but it also offers opportunities for applying statistical analysis to understand factors that contribute to a suggestion being sent to the developers. The dataset provides valuable insights into community feedback and engagement.

Columns

  • SuggestionDate: The date on which the suggestion was originally made.
  • SuggestionText: The full text content of the suggestion submitted by a Discord server member.
  • CharCount: A numerical count of the characters within the 'SuggestionText' field.
  • SuggestionCategory: A categorisation for the suggestion; further details on these categories would typically be found in a more expansive dataset description.
  • Upvotes: The total number of upvotes received by the suggestion from other members of the Discord server.
  • ReportedToDevs: A boolean indicator specifying whether the suggestion was reported to the game development team.

Distribution

The dataset is typically provided in a tabular format, such as a CSV file. It contains a total of 158 individual records or rows, each representing a unique game improvement suggestion. The data includes suggestion dates ranging from 1st April 2022 to 26th April 2022. The character count for suggestions varies widely, from 1 to 1831 characters. Categories include 'Feature' (53%), 'Item' (32%), and 'Other' (16%). A small percentage of suggestions (16%) were reported to the developers.

Usage

This dataset is ideally suited for various analytical tasks. It can be used for natural language processing (NLP) exercises, such as sentiment analysis of suggestions, topic modelling, or text summarisation. Additionally, it is suitable for statistical analysis to identify correlations between suggestion characteristics (e.g., length, category, keywords) and their likelihood of receiving upvotes or being reported to the developers. Game developers, community managers, and data analysts can utilise this data to gain actionable insights into player feedback and prioritise development efforts.

Coverage

The dataset's geographic coverage is global, as the Discord server from which suggestions were drawn is accessible worldwide. The time range for the suggestions captured spans from 1st April 2022 to 26th April 2022. The demographic scope includes any member of the specific Discord server who submitted a suggestion. There are no specific notes on data availability limitations for particular groups or years within the provided information.

License

CC0

Who Can Use It

  • Data Scientists & Machine Learning Engineers: For training NLP models on user-generated content or developing predictive models for suggestion virality.
  • Game Developers: To understand community sentiment, identify popular feature requests, and prioritise game improvements based on player feedback.
  • Community Managers: To analyse engagement patterns on their Discord servers and improve feedback collection processes.
  • Data Analysts: For performing statistical tests to determine which types of suggestions have a higher probability of being adopted or forwarded.
  • Students & Researchers: For academic projects involving text analysis, social media data, or community engagement studies.

Dataset Name Suggestions

  • Grounded Suggestions via Discord Server
  • Game Feedback Discord Dataset
  • Player Suggestion Analysis
  • Discord Game Improvement Data

Attributes

Original Data Source: Grounded Suggestions via Discord Server

Search
Clear search
Close search
Google apps
Main menu