Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A data set of livestream chat from Twitch.tv for studying the way in which viewers interact with livestreams. The paper also provides a case study highlighting the unique features of the data set using four word vector models
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Twitch Chat Dataset
This dataset is a large-scale collection of Twitch chat logs aggregated from multiple streamers across various categories. It is designed to support the research and development of models for real-time, informal, and community-driven conversation, such as:
Chatbots tailored for livestream platforms
Simulating the behavior of Twitch chat
Modeling how chat reacts during hype moments, events, or memes
The code for it can be found here
📂 Dataset… See the full description on the dataset page: https://huggingface.co/datasets/lparkourer10/twitch_chat.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Collection of chat log of 2,162 Twitch streaming videos by 52 streamers. Time period of target streaming video is from 2018-04-24 to 2018-06-24. Description of columns follows below: body: Actual text for user chat channel_id: Channel identifier (integer) commenter_id: User identifier (integer) commenter_type: User type (character) created_at: Time of when chat was entered (ISO 8601 date and time) fragments: Chat text including parsing information of Twitch emote (JSON list) offset: Time offset between start time of video stream and the time of when chat was entered (float) updated_at: Time of when chat was edited (ISO 8601 date and time) video_id: Video identifier (integer) File name indicates name of Twitch stream channel. This dataset is saved as python3 pandas.DataFrame with python pickle format. import pandas as pd pd.read_pickle('ninja.pkl')
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Fal7acy/twitch-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
We'll tailor a Twitch dataset to meet your unique needs, encompassing streamer profiles, viewer engagement metrics, streaming times, demographic data of viewers, follower counts, chat statistics, and other pertinent metrics.
Leverage our Twitch datasets for diverse applications to bolster strategic planning and market analysis. Scrutinizing these datasets enables organizations to grasp viewer preferences and streaming trends, facilitating nuanced content development and engagement initiatives. Customize your access to the entire dataset or specific subsets as per your business requisites.
Popular use cases involve optimizing content strategy based on streamer performance and viewer engagement, enhancing marketing strategies through targeted audience segmentation, and identifying and forecasting trends in the streaming community to stay ahead in the digital entertainment landscape.
Facebook
TwitterIn the first half of 2024, more than 73 million chat messages on Twitch were proactively removed from chat using Blocked Terms and AutoMod features. The majority of Twitch channels employ auto-moderation features.
Facebook
TwitterAs of April 2025, **************************************** was the most followed live streamer on Twitch with **** million followers. ***** saw a meteoric rise to fame when he was one of the first top-ranked players to stream the then newly released Fortnite Battle Royale at the end of 2017 and beginning of 2018. Spanish streamer ************************************* was ranked second with ***** million followers on Twitch. The top games on Twitch Despite being released way back in 2013, GTA V is still among the most popular games on Twitch. The action adventure classic accounted for *** percent of total hours watched on the platform in February 2025. In contrast, the most popular game on Twitch in terms of average number of channels was Fortnite - the battle royale title had ***** channels streaming the game in April 2025. Other top-ranked games include Valorant, Apex Legends, and Call of Duty: Modern Warfare II. Twitch - how it all began Founded in 2011, Twitch was initially launched as a spin-off of the general-interest streaming platform Justin.tv. The site primarily focuses on video gaming, including play through of video games, broadcasts of e-sports competitions, and more recently, creative content. Twitch Creative, the outlet for creative content, was launched on October 28, 2015. The launch included a marathon of Bob Ross ' "The Joy of Painting". Some of the most popular channels on Twitch are owned by YouTube gaming personalities, who are taking advantage of the website’s live-streaming capabilities to interact more with their audiences. Users are able to participate in a live chat during the video broadcast, with broadcasters frequently reacting to and replying to comments and questions from the chat in real time. In August 2014, Twitch was acquired by Amazon for *** million U.S. dollars. It was the company’s fifth most expensive acquisition of all time.
Facebook
TwitterAmerican online streamer and vtuber Hasan Piker, better known by his Twitch channel's name HasanAbi was reported to have engaged Twitch users for approximately seven million hours in July 2024. The political science graduate created his Twitch channel in 2018, starting to live stream in the category "Just Chatting" and quickly becoming one of the most popular streamers creating political content on the platform.
Facebook
Twitterhttps://choosealicense.com/licenses/bigscience-openrail-m/https://choosealicense.com/licenses/bigscience-openrail-m/
Introduction
These are chat logs and speech-to-text conversations obtained from Twitch channels of VTubers. All usernames have been masked with "[CHAT]" and username mentions with "@username". VOD transcriptions are only added after prior consent of their respective streamer.
Facebook
TwitterIn March 2025, video streaming platform Twitch had approximately *** million active streamers, down from a peak of **** million in January 2021. The platform experienced a boom during the COVID-19 pandemic, when many new users used the platform to connect with friends or try their hand at livestreaming. However, this trend normalized again towards the end of the year, and the streaming space has also grown more competitive as platforms apart from Twitch have evolved to attract streamers and viewers. Popular content categories on Twitch In 2024, most of the leading content categories on Twitch were all gaming-related – except for the top spot: Just Chatting. The general conversation category accumulated *** billion hours of viewing time in the measured period. In March 2025, global Twitch audiences spent around *** million hours watching Just Chatting content on Twitch, with the average viewer count of such content reaching *** thousand. HasanAbi was the most popular Just Chatting streamer on Twitch in the most recently measured month. Game streamers Twitch is very popular with gamers and gaming audiences, and the ranking of the most popular Twitch streamers reflects this. Ninja (real name: Richard Tyler Blevins), the top-ranked streamer on Twitch, had **** million followers in April 2025. Ninja saw a meteoric rise to fame when he was one of the first top-ranked players to stream the then-newly released Fortnite Battle Royale at the end of 2017. Second-ranked ibai (real name: Ibai Llanos Garatea) was ranked second with ***** million followers on Twitch. With more than **** million followers, Imane Anys, better known as Pokimane, was the only woman among the most-followed Twitch streamers worldwide. Overall, women only accounted for **** percent of the top-ranked Twitch channels.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the data and resources used for a Twitch Emote recommendation system using a Word2Vec model. The nature and exploration of the data is described in Emotes-2-Vec: A Large Scale Embedding of Twitch Chat Data. To protect the privacy of the users whose messages were scraped to build this corpus, names and timestamps have been removed and only the message bodies are included. However, a tutorial for this project is included on the project GitHub: https://github.com/KoroshM/Emote-Recommender.
embeddings.tsv and labeled_metadata.tsv may be used in TensorFlow's embedding projector to visualize the embedding space.
Note: Model files are the following: embeddings.tsv labeled_metadata.tsv model model.model** model.wv.vectors.npy
**Located here: https://drive.google.com/drive/folders/1RZC4JA4CpAcwoo6dOwq_jobTd6dNi_n2?usp=sharing
Facebook
TwitterFortnite is an online battle royale video game developed by Epic Games and released in 2017. Fortnite events on the video streaming service Twitch were watched by an average of 42,600 viewers in September 2025. Engagement with the game spiked after the re-release of the original map in November 2023 as part of the OG season, as well as the launch of Lego Fortnite in December 2023.
Facebook
TwitterThese datasets used for node classification and transfer learning are Twitch user-user networks of gamers who stream in a certain language. Nodes are the users themselves and the links are mutual friendships between them. Vertex features are extracted based on the games played and liked, location and streaming habits. Datasets share the same set of node features, this makes transfer learning across networks possible. These social networks were collected in May 2018. The supervised task related to these networks is binary node classification - one has to predict whether a streamer uses explicit language.
| DE | EN | ES | FR | PT | RU | |
|---|---|---|---|---|---|---|
| Nodes | 9,498 | 7,126 | 4,648 | 6,549 | 1,912 | 4,385 |
| Edges | 153,138 | 35,324 | 59,382 | 112,666 | 31,299 | 37,304 |
| Density | 0.003 | 0.002 | 0.006 | 0.005 | 0.017 | 0.004 |
| Transitvity | 0.047 | 0.042 | 0.084 | 0.054 | 0.131 | 0.049 |
Paper: Multi-scale Attributed Node Embedding. Benedek Rozemberczki, Carl Allen, and Rik Sarkar. arXiv, 2019. https://arxiv.org/abs/1909.13021
Facebook
TwitterAttribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
A subset of the chat logs dump from Twitch used in this work is provided, to help replicate the central findings of this work (https://doi.org/10.5281/zenodo.1182793). Data are aggregated and include the number of messages posted in each channel and the number of users posting them, sampled at intervals of 5 minutes. To protect the identity of the users in this data collection, message contents and user names are not included in this dataset. Stream names have been replaced with numeric IDs.
No additional filtering or data cleaning operation has been applied to this data. Replication code is available on Github (https://github.com/glciampaglia/twitch-overload-replication).
Facebook
TwitterIn 2023, Twitch users worldwide watched over 70 million hours of live-streaming content on the video game Diablo IV. Launched on June 5, 2023, by Blizzard Entertainment, Diablo IV was the game users on Twitch engaged with the most during its debut week. Live-streaming sessions broadcasting content based on Hogwarts Legacy, which was published on February 10, 2023, ranked up to 50 million hours watched among Twitch users. Baldur's Gate 3, debuted on August 3, 2023, and generated over 22 million hours of live-streamed entertainment content based on Twitch users' gaming sessions in its first week.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset, titled the Twitch Plays Pokemon Dataset, contains 37.8 million IRC chat messages. It contains IRC chat log data for messages made between February 2, 2014 and April 23, 2014 (68 days). Each line denotes a single IRC chat message.
Sample of the dataset:
2014-02-1408:17:32medicbluea 2014-02-1408:17:32murderousburgerrare candy, RARE CANDY 2014-02-1408:17:32milk2978B 2014-02-1408:17:32mrtiktalikb 2014-02-1408:17:32dualhammersb 2014-02-1408:17:32shares5YES 2014-02-1408:17:32orangeruststart 2014-02-1408:17:32snowieea 2014-02-1408:17:33duroatedown 2014-02-1408:17:33crypticcraigup 2014-02-1408:17:33doug2725LOL HELIX FOSSIL WENT BACK THAT FAR
Abstract
With the increasing importance of online communities, discussion forums, and customer reviews, Internet “trolls” have proliferated thereby making it difficult for information seekers to find relevant and correct information. In this paper, we consider the problem of detecting and identifying Internet trolls, almost all of which are human agents. Identifying a human agent among a human population presents significant challenges compared to detecting automated spam or computerized robots. To learn a troll’s behavior, we use contextual anomaly detection to profile each chat user. Using clustering and distance-based methods, we use contextual data such as the group’s current goal, the current time, and the username to classify each point as an anomaly. A user whose features significantly differ from the norm will be classified as a troll. We collected 38 million data points from the viral Internet fad, Twitch Plays Pokemon. Using clustering and distance-based methods, we develop heuristics for identifying trolls. Using MapReduce techniques for preprocessing and user profiling, we are able to classify trolls based on 10 features extracted from a user’s lifetime history.
You can view the full technical paper here: https://arxiv.org/abs/1902.06208
Source Code
Code related to this dataset can be found at: https://github.com/ahaque/twitch-troll-detection
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proportion for political chat posts and all chat posts are calculated by dividing the number of topics with the number of political topics, 45, and the total number of topics, 150.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cooking Master Boy Chat Records
Chat record dataset from Twitch channel "muse_tw" during the "Cooking Master Boy" (ä¸čŹŻä¸€ç•Ş) marathon event.
Introduction
This is a chat dataset collected from Twitch channel "muse_tw", while the channel is hosting a marathon anime event featuring "Cooking Master Boy" (ä¸čŹŻä¸€ç•Ş). The featured anime "Cooking Master Boy" is a Japanese manga series written and illustrated by Etsushi Ogawa. And has a big impact on meme culture, and has a cult following… See the full description on the dataset page: https://huggingface.co/datasets/h-alice/chat-cooking-master-boy-XL.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of keywords used to compute the relative frequency of discourse markers.
Facebook
TwitterFortnite is an online battle royale video game developed by Epic Games and released in 2017. Fortnite gaming streams and events on the video streaming service Twitch were watched for a combined 30.7 million hours in June 2025. In December 2023, the game was viewed for a total of 107 million hours - this increase was due to the re-release of the original map as part of the new OG season, which started in November 2023, as well as the launch of the Lego Fortnite environment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A data set of livestream chat from Twitch.tv for studying the way in which viewers interact with livestreams. The paper also provides a case study highlighting the unique features of the data set using four word vector models