11 datasets found

o
Game Feedback Discord Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Game Feedback Discord Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c8ccbb59-2931-4b87-adef-d604fb0774b0
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset captures suggestions for improving a game, sourced from a Discord server where members can submit their ideas. Users have the ability to upvote these suggestions, and any suggestion accumulating 35 or more upvotes is forwarded to the game's developers. The primary purpose of compiling this dataset was for natural language processing (NLP) practice, but it also offers opportunities for applying statistical analysis to understand factors that contribute to a suggestion being sent to the developers. The dataset provides valuable insights into community feedback and engagement.

Columns

SuggestionDate: The date on which the suggestion was originally made.

SuggestionText: The full text content of the suggestion submitted by a Discord server member.

CharCount: A numerical count of the characters within the 'SuggestionText' field.

SuggestionCategory: A categorisation for the suggestion; further details on these categories would typically be found in a more expansive dataset description.

Upvotes: The total number of upvotes received by the suggestion from other members of the Discord server.

ReportedToDevs: A boolean indicator specifying whether the suggestion was reported to the game development team.

Distribution

The dataset is typically provided in a tabular format, such as a CSV file. It contains a total of 158 individual records or rows, each representing a unique game improvement suggestion. The data includes suggestion dates ranging from 1st April 2022 to 26th April 2022. The character count for suggestions varies widely, from 1 to 1831 characters. Categories include 'Feature' (53%), 'Item' (32%), and 'Other' (16%). A small percentage of suggestions (16%) were reported to the developers.

Usage

This dataset is ideally suited for various analytical tasks. It can be used for natural language processing (NLP) exercises, such as sentiment analysis of suggestions, topic modelling, or text summarisation. Additionally, it is suitable for statistical analysis to identify correlations between suggestion characteristics (e.g., length, category, keywords) and their likelihood of receiving upvotes or being reported to the developers. Game developers, community managers, and data analysts can utilise this data to gain actionable insights into player feedback and prioritise development efforts.

Coverage

The dataset's geographic coverage is global, as the Discord server from which suggestions were drawn is accessible worldwide. The time range for the suggestions captured spans from 1st April 2022 to 26th April 2022. The demographic scope includes any member of the specific Discord server who submitted a suggestion. There are no specific notes on data availability limitations for particular groups or years within the provided information.

License

CC0

Who Can Use It

Data Scientists & Machine Learning Engineers: For training NLP models on user-generated content or developing predictive models for suggestion virality.

Game Developers: To understand community sentiment, identify popular feature requests, and prioritise game improvements based on player feedback.

Community Managers: To analyse engagement patterns on their Discord servers and improve feedback collection processes.

Data Analysts: For performing statistical tests to determine which types of suggestions have a higher probability of being adopted or forwarded.

Students & Researchers: For academic projects involving text analysis, social media data, or community engagement studies.

Dataset Name Suggestions

Grounded Suggestions via Discord Server

Game Feedback Discord Dataset

Player Suggestion Analysis

Discord Game Improvement Data

Attributes

Original Data Source: Grounded Suggestions via Discord Server
Discord Survey
kaggle.com
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yonko (Czeslaw Meyer) (2022). Discord Survey [Dataset]. https://www.kaggle.com/datasets/yonkotoshiro/discord-survey/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yonko (Czeslaw Meyer)
Description
Results of a survey of 403 discord users. The selection was random, the servers were random, a lot of people refused to go through, but someone agreed. Interrogated only Russian-speaking people. When creating, I notified users that after completion I was going to analyze the data and post the results in the public domain. No any personal user data was collected either.

In general, you can see that I like the discord, as well as some of the psychological focus of the questions. I have no experience in doing something like this, but still I tried to do everything as correctly as possible.

This version is translated into English. Also cleaned data and removed or changed something that wasn't needed.
Anime Subtitles
kaggle.com
Updated Aug 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jess Fan (2021). Anime Subtitles [Dataset]. https://www.kaggle.com/datasets/jef1056/anime-subtitles/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jess Fan
Description
Content

The original extracted versions (in .srt and .ass format) are also included in this release (which, idk why, but kaggle decompressed >:U)

This dataset contains 1,497,770 messages across 3,836 episodes of anime. The raw dataset contains 1,563,442 messages, some of which were removed during cleaning.

This version (V4) adapts the original (frankly, terrible) format into the newer format I developed, which is used in https://github.com/JEF1056/clean-discord. The Dataset folder contains compressed text files, which are compatable with tensorflow datasets. These can be streamed as a textlinedataset in the TSV format.

V4 also fixes many (but not all) issues that the original cleaning script was too simple to realistically take care of. It also uses the clean-discord cleaner algorithms to make sentences more natural language than formatting. The script has also been optimized to run on multi-core systems, allowing it to complete cleaning this entire dataset in under 30 seconds on a 4-core machine. See the new and impoved script here: https://github.com/JEF1056/clean-discord/blob/v1.2/misc/anime.py (no longer bundled in the dataset files)

Format

The files are now all compressed to save space, and are compatable with tensorflow datasets. You can initialize a dataset function as such: def dataset_fn_local(split, shuffle_files=False): global nq_tsv_path del shuffle_files # Load lines from the text file as examples. files_to_read=[os.path.join(nq_tsv_path[split],filename) for filename in os.listdir(nq_tsv_path[split]) if filename.startswith(split)] print(f"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Split {split} contains {len(files_to_read)} files. First 10: {files_to_read[0:10]}") ds = tf.data.TextLineDataset(files_to_read, compression_type="GZIP").filter(lambda line:tf.not_equal(tf.strings.length(line),0)) ds = ds.shuffle(buffer_size=600000) ds = ds.map(functools.partial(tf.io.decode_csv, record_defaults=["",""], field_delim="\t", use_quote_delim=False), num_parallel_calls=tf.data.experimental.AUTOTUNE) ds = ds.map(lambda *ex: dict(zip(["question", "answer"], ex))) return ds

Acknowledgements

A sincere thanks to all of my friends for helping me come up with anime titles, a shoutout to the talented and dedicated people translating Japanese anime, and an even bigger thanks to Leen Chan for compiling the actual subtitles.

This dataset is far from complete! I hope that people who are willing to find, add and clean the data are out there, and could do their best to try and help out in the effort to grow this data
Z
Dataset on the online cryptocurrency discussion on Twitter, Telegram, and...
data.niaid.nih.gov
Updated Nov 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferrara, Emilio (2022). Dataset on the online cryptocurrency discussion on Twitter, Telegram, and Discord [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3895020
Explore at:
Dataset updated
Nov 22, 2022
Dataset provided by
Tesconi, Maurizio
Ferrara, Emilio
Tardelli, Serena
Cresci, Stefano
Avvenuti, Marco
Nizzoli, Leonardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This Dataset is described in Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020), a study that aims to map and assess the extent of cryptocurrency manipulations within and across the online ecosystems of Twitter, Telegram, and Discord. Starting from tweets mentioning cryptocurrencies, we leveraged and followed invite URLs from platform to platform, building the invite-link network, in order to study the invite link diffusion process.

Please, refer to the paper below for more details.

Nizzoli, L., Tardelli, S., Avvenuti, M., Cresci, S., Tesconi, M. & Ferrara, E. (2020). Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020).

This dataset is composed of:

~16M tweet ids shared between March and May 2019, mentioning at least one of the 3,822 cryptocurrencies (cashtags) provided by the CryptoCompare public API;

~13k nodes of the invite-link network, i.e., the information about the Telegram/Discord channels and Twitter users involved in the cryptocurrency discussion (e.g., id, name, audience, invite URL);

~62k edges of the invite-link network, i.e., the information about the flow of invites (e.g., source id, target id, weight).

With such information, one can easily retrieve the content of channels and messages through Twitter, Telegram, and Discord public APIs.

Please, refer to the README file for more details about the fields.
Z
MultiSocial
data.niaid.nih.gov
Updated Oct 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moro, Robert (2024). MultiSocial [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13846151
Explore at:
Dataset updated
Oct 4, 2024
Dataset provided by
Moro, Robert
Kopal, Jakub
Macko, Dominik
Srba, Ivan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.

If you use this dataset in any publication, project, tool or in any other form, please, cite the a paper.

Disclaimer

Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously. The intended use if for non-commercial research purpose only.

Data Source

The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:

Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.

Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).

Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.

Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).

WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.

From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.

The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).

The dataset has the following fields:

'text' - a text sample,

'label' - 0 for human-written text, 1 for machine-generated text,

'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,

'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,

'language' - the ISO 639-1 language code identifying the detected language of the given text,

'length' - word count of the given text,

'source' - a string identifying the source dataset / platform of the given text,

'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.

ToDo Statistics (under construction)
h
midjourney-texttoimage
huggingface.co
Updated Jan 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nate Raw (2023). midjourney-texttoimage [Dataset]. https://huggingface.co/datasets/nateraw/midjourney-texttoimage
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2023
Authors
Nate Raw
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Midjourney User Prompts & Generated Images (250k)

Dataset Summary General Context

Midjourney is an independent research lab whose broad mission is to "explore new mediums of thought". In 2022, they launched a text-to-image service that, given a natural language prompt, produces visual depictions that are faithful to the description. Their service is accessible via a public Discord server, where users interact with a Midjourney bot. When issued… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/midjourney-texttoimage.
P
BAH Dataset
paperswithcode.com
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuela González-González; Soufiane Belharbi; Muhammad Osama Zeeshan; Masoumeh Sharafi; Muhammad Haseeb Aslam; Marco Pedersoli; Alessandro Lameiras Koerich; Simon L Bacon; Eric Granger (2025). BAH Dataset [Dataset]. https://paperswithcode.com/dataset/bah
Explore at:
Dataset updated
May 24, 2025
Authors
Manuela González-González; Soufiane Belharbi; Muhammad Osama Zeeshan; Masoumeh Sharafi; Muhammad Haseeb Aslam; Marco Pedersoli; Alessandro Lameiras Koerich; Simon L Bacon; Eric Granger
Description
Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H.

This paper introduces a first Behavioural Ambivalence/Hesitancy ( BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data.

Additionally, this paper provides preliminary benchmarking results baseline models for BAH at frame- and video-level recognition with mono- and multi-modal setups. It also includes results on models for zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.
h
midjourney-prompts
huggingface.co
Updated Aug 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Succinctly AI (2022). midjourney-prompts [Dataset]. https://huggingface.co/datasets/succinctly/midjourney-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 7, 2022
Dataset authored and provided by
Succinctly AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Midjourney is an independent research lab whose broad mission is to "explore new mediums of thought". In 2022, they launched a text-to-image service that, given a natural language prompt, produces visual depictions that are faithful to the description. Their service is accessible via a public Discord server: users issue a query in natural language, and the Midjourney bot returns AI-generated images that follow the given description. The raw dataset (with Discord messages) can be found on… See the full description on the dataset page: https://huggingface.co/datasets/succinctly/midjourney-prompts.
h
SlimPajama-627B
huggingface.co
opendatalab.com
Updated Oct 2, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cerebras (2012). SlimPajama-627B [Dataset]. https://huggingface.co/datasets/cerebras/SlimPajama-627B
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 2, 2012
Dataset authored and provided by
Cerebras
Description
The dataset consists of 59166 jsonl files and is ~895GB compressed. It is a cleaned and deduplicated version of Together's RedPajama. Check out our blog post explaining our methods, our code on GitHub, and join the discussion on the Cerebras Discord.

Getting Started

You can download the dataset using Hugging Face datasets: from datasets import load_dataset ds = load_dataset("cerebras/SlimPajama-627B")

Background

Today we are releasing SlimPajama – the largest… See the full description on the dataset page: https://huggingface.co/datasets/cerebras/SlimPajama-627B.
h
pii-masking-200k
huggingface.co
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai4Privacy (2024). pii-masking-200k [Dataset]. http://doi.org/10.57967/hf/1532
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/1532
Dataset updated
Apr 22, 2024
Dataset authored and provided by
Ai4Privacy
Description
Ai4Privacy Community

Join our community at https://discord.gg/FmzWshaaQT to help build open datasets for privacy masking.

Purpose and Features

Previous world's largest open dataset for privacy. Now it is pii-masking-300k The purpose of the dataset is to train models to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-200k.
h
unsupervised_peoples_speech
huggingface.co
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MLCommons (2025). unsupervised_peoples_speech [Dataset]. https://huggingface.co/datasets/MLCommons/unsupervised_peoples_speech
Explore at:
Dataset updated
Feb 20, 2025
Dataset authored and provided by
MLCommons
Description
Dataset Card for Unsupervised Peoples Speech

Dataset Description Dataset Summary

The Unsupervised Peoples Speech Dataset is a compilation of audiofiles extracted from Archive.org that is licensed for academic and commercial usage under CC-BY and CC-BY-SA licenses. It includes more than one million hours of audio with a diverse set of speakers.

Point of Contact: MLCommons Datasets Discord

Dataset Structure

This dataset is a collection of audio… See the full description on the dataset page: https://huggingface.co/datasets/MLCommons/unsupervised_peoples_speech.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Datasimple (2025). Game Feedback Discord Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c8ccbb59-2931-4b87-adef-d604fb0774b0

Game Feedback Discord Dataset

Explore at:

.undefinedAvailable download formats

Dataset updated

Jul 5, 2025

Dataset authored and provided by

Datasimple

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Entertainment & Media Consumption

Description

This dataset captures suggestions for improving a game, sourced from a Discord server where members can submit their ideas. Users have the ability to upvote these suggestions, and any suggestion accumulating 35 or more upvotes is forwarded to the game's developers. The primary purpose of compiling this dataset was for natural language processing (NLP) practice, but it also offers opportunities for applying statistical analysis to understand factors that contribute to a suggestion being sent to the developers. The dataset provides valuable insights into community feedback and engagement.

Columns

SuggestionDate: The date on which the suggestion was originally made.
SuggestionText: The full text content of the suggestion submitted by a Discord server member.
CharCount: A numerical count of the characters within the 'SuggestionText' field.
SuggestionCategory: A categorisation for the suggestion; further details on these categories would typically be found in a more expansive dataset description.
Upvotes: The total number of upvotes received by the suggestion from other members of the Discord server.
ReportedToDevs: A boolean indicator specifying whether the suggestion was reported to the game development team.

Distribution

The dataset is typically provided in a tabular format, such as a CSV file. It contains a total of 158 individual records or rows, each representing a unique game improvement suggestion. The data includes suggestion dates ranging from 1st April 2022 to 26th April 2022. The character count for suggestions varies widely, from 1 to 1831 characters. Categories include 'Feature' (53%), 'Item' (32%), and 'Other' (16%). A small percentage of suggestions (16%) were reported to the developers.

Usage

This dataset is ideally suited for various analytical tasks. It can be used for natural language processing (NLP) exercises, such as sentiment analysis of suggestions, topic modelling, or text summarisation. Additionally, it is suitable for statistical analysis to identify correlations between suggestion characteristics (e.g., length, category, keywords) and their likelihood of receiving upvotes or being reported to the developers. Game developers, community managers, and data analysts can utilise this data to gain actionable insights into player feedback and prioritise development efforts.

Coverage

The dataset's geographic coverage is global, as the Discord server from which suggestions were drawn is accessible worldwide. The time range for the suggestions captured spans from 1st April 2022 to 26th April 2022. The demographic scope includes any member of the specific Discord server who submitted a suggestion. There are no specific notes on data availability limitations for particular groups or years within the provided information.

License

CC0

Who Can Use It

Data Scientists & Machine Learning Engineers: For training NLP models on user-generated content or developing predictive models for suggestion virality.
Game Developers: To understand community sentiment, identify popular feature requests, and prioritise game improvements based on player feedback.
Community Managers: To analyse engagement patterns on their Discord servers and improve feedback collection processes.
Data Analysts: For performing statistical tests to determine which types of suggestions have a higher probability of being adopted or forwarded.
Students & Researchers: For academic projects involving text analysis, social media data, or community engagement studies.

Dataset Name Suggestions

Grounded Suggestions via Discord Server
Game Feedback Discord Dataset
Player Suggestion Analysis
Discord Game Improvement Data

Attributes

Original Data Source: Grounded Suggestions via Discord Server

Clear search

Close search

Google apps

Main menu

Game Feedback Discord Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Discord Survey

Anime Subtitles

Content

Format

Acknowledgements

Dataset on the online cryptocurrency discussion on Twitter, Telegram, and...

MultiSocial

midjourney-texttoimage

BAH Dataset

midjourney-prompts

SlimPajama-627B

pii-masking-200k

unsupervised_peoples_speech

Game Feedback Discord Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes