CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Collection of chat log of 2,162 Twitch streaming videos by 52 streamers. Time period of target streaming video is from 2018-04-24 to 2018-06-24. Description of columns follows below: body: Actual text for user chat channel_id: Channel identifier (integer) commenter_id: User identifier (integer) commenter_type: User type (character) created_at: Time of when chat was entered (ISO 8601 date and time) fragments: Chat text including parsing information of Twitch emote (JSON list) offset: Time offset between start time of video stream and the time of when chat was entered (float) updated_at: Time of when chat was edited (ISO 8601 date and time) video_id: Video identifier (integer) File name indicates name of Twitch stream channel. This dataset is saved as python3 pandas.DataFrame with python pickle format. import pandas as pd pd.read_pickle('ninja.pkl')
In the first half of 2024, more than 73 million chat messages on Twitch were proactively removed from chat using Blocked Terms and AutoMod features. The majority of Twitch channels employ auto-moderation features.
As of February 2025, users between the ages of ************** accounted for the biggest share of Twitch.tv users worldwide. During the measured period, users in this age group amounted to over ** percent of Twitch online audiences. Users aged between *************** were the second largest demographic accessing with Twitch.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Twitch Chat Dataset
This dataset is a large-scale collection of Twitch chat logs aggregated from multiple streamers across various categories. It is designed to support the research and development of models for real-time, informal, and community-driven conversation, such as:
Chatbots tailored for livestream platforms
Simulating the behavior of Twitch chat
Modeling how chat reacts during hype moments, events, or memes
The code for it can be found here
📂 Dataset… See the full description on the dataset page: https://huggingface.co/datasets/lparkourer10/twitch_chat.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
A subset of the chat logs dump from Twitch used in this work is provided, to help replicate the central findings of this work (https://doi.org/10.5281/zenodo.1182793). Data are aggregated and include the number of messages posted in each channel and the number of users posting them, sampled at intervals of 5 minutes. To protect the identity of the users in this data collection, message contents and user names are not included in this dataset. Stream names have been replaced with numeric IDs. No additional filtering or data cleaning operation has been applied to this data. Replication code is available on Github (https://github.com/glciampaglia/twitch-overload-replication).
These datasets used for node classification and transfer learning are Twitch user-user networks of gamers who stream in a certain language. Nodes are the users themselves and the links are mutual friendships between them. Vertex features are extracted based on the games played and liked, location and streaming habits. Datasets share the same set of node features, this makes transfer learning across networks possible. These social networks were collected in May 2018. The supervised task related to these networks is binary node classification - one has to predict whether a streamer uses explicit language.
DE | EN | ES | FR | PT | RU | |
---|---|---|---|---|---|---|
Nodes | 9,498 | 7,126 | 4,648 | 6,549 | 1,912 | 4,385 |
Edges | 153,138 | 35,324 | 59,382 | 112,666 | 31,299 | 37,304 |
Density | 0.003 | 0.002 | 0.006 | 0.005 | 0.017 | 0.004 |
Transitvity | 0.047 | 0.042 | 0.084 | 0.054 | 0.131 | 0.049 |
Paper: Multi-scale Attributed Node Embedding. Benedek Rozemberczki, Carl Allen, and Rik Sarkar. arXiv, 2019. https://arxiv.org/abs/1909.13021
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset, titled the Twitch Plays Pokemon Dataset, contains 37.8 million IRC chat messages. It contains IRC chat log data for messages made between February 2, 2014 and April 23, 2014 (68 days). Each line denotes a single IRC chat message.
Sample of the dataset:
2014-02-1408:17:32medicbluea 2014-02-1408:17:32murderousburgerrare candy, RARE CANDY 2014-02-1408:17:32milk2978B 2014-02-1408:17:32mrtiktalikb 2014-02-1408:17:32dualhammersb 2014-02-1408:17:32shares5YES 2014-02-1408:17:32orangeruststart 2014-02-1408:17:32snowieea 2014-02-1408:17:33duroatedown 2014-02-1408:17:33crypticcraigup 2014-02-1408:17:33doug2725LOL HELIX FOSSIL WENT BACK THAT FAR
Abstract
With the increasing importance of online communities, discussion forums, and customer reviews, Internet “trolls” have proliferated thereby making it difficult for information seekers to find relevant and correct information. In this paper, we consider the problem of detecting and identifying Internet trolls, almost all of which are human agents. Identifying a human agent among a human population presents significant challenges compared to detecting automated spam or computerized robots. To learn a troll’s behavior, we use contextual anomaly detection to profile each chat user. Using clustering and distance-based methods, we use contextual data such as the group’s current goal, the current time, and the username to classify each point as an anomaly. A user whose features significantly differ from the norm will be classified as a troll. We collected 38 million data points from the viral Internet fad, Twitch Plays Pokemon. Using clustering and distance-based methods, we develop heuristics for identifying trolls. Using MapReduce techniques for preprocessing and user profiling, we are able to classify trolls based on 10 features extracted from a user’s lifetime history.
You can view the full technical paper here: https://arxiv.org/abs/1902.06208
Source Code
Code related to this dataset can be found at: https://github.com/ahaque/twitch-troll-detection
Fortnite is an online battle royale video game developed by Epic Games and released in 2017. Fortnite events on the video streaming service Twitch were watched by an average of 68,000 viewers in April 2025. Engagement with the game spiked after the re-release of the original map in November 2023 as part of the OG season, as well as the launch of Lego Fortnite in December 2023.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cooking Master Boy Chat Records
Chat record dataset from Twitch channel "muse_tw" during the "Cooking Master Boy" (ä¸čŹŻä¸€ç•Ş) marathon event.
Introduction
This is a chat dataset collected from Twitch channel "muse_tw", while the channel is hosting a marathon anime event featuring "Cooking Master Boy" (ä¸čŹŻä¸€ç•Ş). The featured anime "Cooking Master Boy" is a Japanese manga series written and illustrated by Etsushi Ogawa. And has a big impact on meme culture, and has a cult following… See the full description on the dataset page: https://huggingface.co/datasets/h-alice/chat-cooking-master-boy-XL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of keywords used to compute the relative frequency of discourse markers.
Fortnite is an online battle royale video game developed by Epic Games and released in 2017. Fortnite gaming streams and events on the video streaming service Twitch were watched for a combined 49.2 million hours in April 2025. In December 2023, the game was viewed for a total of 107 million hours - this increase was due to the re-release of the original map as part of the new OG season, which started in November 2023, as well as the launch of the Lego Fortnite environment.
In the first half of 2024, 81.95 percent of live content viewed on video streaming platform Twitch occurred on channels that had Twitch's AutoMod feature actively monitoring chat for harmful messages. During the same period of time, nearly 78.2 percent of content was watched in channels with at least one active moderator.
https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy
[425 Pages Report] The global stream hosting market is expected to reach a valuation of nearly US$ 4 Bn in 2022. Sales are expected to increase at an impressive 14.8% CAGR, with the total stream hosting market size reaching US$ 16 Bn by 2032.
Attribute | Details |
---|---|
Stream Hosting Market Estimated Market Value (2022) | US$ 4 Billion |
Projected Market Size (2032) | US$ 16 Billion |
Value-based CAGR% (2022-2032) | 14.8% |
Top 5 Vendor Market Share | Around 32% |
Scope Of Report
Attribute | Details |
---|---|
Forecast Period | 2022 to 2032 |
Historical Data Available for | 2017 to 2021 |
Market Analysis | US$ Billion for Value |
Key Regions Covered | North America, Latin America, Europe, East Asia, South Asia & Pacific, and the Middle East & Africa |
Key Countries Covered | The USA, Canada, Brazil, Mexico, Germany, the U.K., France, Italy, Spain, Russia, China, Japan, South Korea, India, Malaysia, Indonesia, Singapore, Australia & New Zealand, GCC Countries, Turkey, and South Africa |
Key Segments Covered | Content Type, Streaming Type, Platform, Industry and Region |
Key Companies Profiled |
|
Report Coverage | Market Forecast, Company Share Analysis, Competition Intelligence, DROT Analysis, Market Dynamics and Challenges, and Strategic Growth Initiatives |
The statistic presents the leading gaming content on Twitch from January 1, 2019 to June 30, 2019, ranked by number of hours watched. League of Legends topped the charts with a combined 512.3 million hours watched on Twitch in the first half of 2019.
As of the beginning of 2025, Kick and Rumble were the platforms presenting the highest creators' revenue share. Kick shared 95 percent of streaming revenues with its creators, while Rumble shared the totality of creators' revenues generated via subscription badges. OnlyFans, which allows users to stream live content from creators as well as pre-recorded videos, shared 80 percent of fans' payments with creators. Twitch new partner program Partner Plus Program launched in June 2023 and allows streamers with at least 350 recurring paid subscribers for three months in a row to earn 70 percent of their net subscription revenue. Twitch's new partner program came right before influential live streamers xQc and Amouranth left the platform to onboard competitor's live-streaming hosting service Kick. Streamers’ revenues According to Kick’s estimations, streamers with 5,000 subscribers would be looking to earn around 23,750 U.S. dollars, thanks to the platform’s low commissions. Launched between the end of 2022 and March 2023 under the advisory of live streamer Trainwreck, Kick allows content creators to earn 95 percent of their revenues on the platform, with the service taking only five percent of streamers' subscriptions. This represents one of the lowest commission fees among this type of online video platform, with Twitch's streamers who are not eligible for the new partnership sharing 50 percent of their revenues with the Amazon-owned host, and YouTube's content creators sharing 30 percent of the Super Chat and Super Stickers received from viewers during a live streaming session with the Google-owned service. In 2022, only 22 percent of U.S. content creators who made money from creating internet videos reported making more than 1,000 U.S. dollars. In case of live-streamed content, the share of U.S. live streamers making 1,000 U.S. dollars and more were merely seven percent of the total. Kick’s rise Kick is a live streaming platform launched between the end of 2022 and the beginning of 2023, proposing itself as a competitor to giants YouTube and Amazon-backed Twitch. Between January 2023 and April 2023, the platform grew from nine thousand channels to 67 thousand, with popular live streamers Félix Lengyel (better known as xQc) and Amouranth adding to the ranks in June of the same year. Adin Ross, who was known for streaming Grand Theft Auto V games on Twitch – was the most popular streamer on Kick during the first quarter of 2023, gathering approximately 125 peak viewers during his live streams. PaulinhoLOKObr ranked second, with around 62 thousand peak concurrent viewers for hist streams.
In the first quarter of 2021, the highest earning Twitch streamer based on income from subscriptions worldwide was Kai Cenat. The American Twitch streamer was estimated to generate 927 thousand U.S. dollars in earnings from subscriptions during the measured period.
As of April 2023, online live platform Kick reported that for 100 subscribers creators would earn 475 U.S. dollars. Kick was launched in December 2022 as a direct competitor to Twitch. Kick reportedly charges five percent of streamers revenues, in comparison to Twitch which is reported to retain between 50 and 30 percent of creator's earnings. Kick, which signed up streamers Amouranth and xQc in June 2023, reported that creators with 5,000 subscribers would be able to earn almost 24,000 U.S. dollars on the platform.
As of May 2025, Grand Theft Auto V has sold over 215 million units worldwide. Published and developed by Rockstar Games and first released in 2013, GTA V is one of the best-selling video games of all time. Of the millions of units sold around the world and across all platforms, some 20 million copies were sold in 2020 alone, marking the game's best-selling year since its initial release. The enduring popularity of GTA 5 Despite being released over a decade ago, GTA 5 is still one of the most popular video games worldwide, as evidenced by the thousands of viewers of the title regularly drawn on the video streaming platform Twitch. In January 2025, Grand Theft Auto V streams on Twitch were watched by an average of 138 thousand concurrent viewers. The monthly time spent GTA V streaming content watched on Twitch amounted to approximately 101 million hours. GTA VI is looming on the horizon After over a decade on the market, millions of U.S. dollars in revenues, and regularly topping download rankings and sales charts, it is hard to argue that Grand Theft Auto 5 is Rockstar’s golden goose. However, even the best-performing cash cow eventually gets put out to pasture. In November 2023, Rockstar's president Sam Houser said the first trailer for GTA 5’s sequel, GTA 6, would come out in the beginning of December to celebrate the company's 25th anniversary. Following years of speculation, social media leaks, and gaming journalism hints, Grand Theft Auto 6 was finally and formally revealed in December 2023.Rockstar’s GTA VI Twitter posts and YouTube trailers immediately broke records in terms of likes and user engagement – on social media platform X, Houser’s pre-announcement of the official announcement gained over 1 million likes alone. Grand Theft Auto Six is scheduled to be released in 2026 for the PlayStation 5 and Xbox Series X/S.
Im Mai 2025 war Grand Theft Auto V das beliebteste Videospiel auf dem Game-Streaming-Portal Twitch. Die Zuschauerstunden von GTA V beliefen sich auf insgesamt rund 84,7 Millionen.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Collection of chat log of 2,162 Twitch streaming videos by 52 streamers. Time period of target streaming video is from 2018-04-24 to 2018-06-24. Description of columns follows below: body: Actual text for user chat channel_id: Channel identifier (integer) commenter_id: User identifier (integer) commenter_type: User type (character) created_at: Time of when chat was entered (ISO 8601 date and time) fragments: Chat text including parsing information of Twitch emote (JSON list) offset: Time offset between start time of video stream and the time of when chat was entered (float) updated_at: Time of when chat was edited (ISO 8601 date and time) video_id: Video identifier (integer) File name indicates name of Twitch stream channel. This dataset is saved as python3 pandas.DataFrame with python pickle format. import pandas as pd pd.read_pickle('ninja.pkl')