18 datasets found

Active streamers on Twitch worldwide 2025
statista.com
tokrwards.com
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Active streamers on Twitch worldwide 2025 [Dataset]. https://www.statista.com/statistics/746173/monthly-active-streamers-on-twitch/
Explore at:
Dataset updated
Sep 4, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2018 - Mar 2025
Area covered
Worldwide
Description
In March 2025, video streaming platform Twitch had approximately *** million active streamers, down from a peak of **** million in January 2021. The platform experienced a boom during the COVID-19 pandemic, when many new users used the platform to connect with friends or try their hand at livestreaming. However, this trend normalized again towards the end of the year, and the streaming space has also grown more competitive as platforms apart from Twitch have evolved to attract streamers and viewers. Popular content categories on Twitch In 2024, most of the leading content categories on Twitch were all gaming-related – except for the top spot: Just Chatting. The general conversation category accumulated *** billion hours of viewing time in the measured period. In March 2025, global Twitch audiences spent around *** million hours watching Just Chatting content on Twitch, with the average viewer count of such content reaching *** thousand. HasanAbi was the most popular Just Chatting streamer on Twitch in the most recently measured month. Game streamers Twitch is very popular with gamers and gaming audiences, and the ranking of the most popular Twitch streamers reflects this. Ninja (real name: Richard Tyler Blevins), the top-ranked streamer on Twitch, had **** million followers in April 2025. Ninja saw a meteoric rise to fame when he was one of the first top-ranked players to stream the then-newly released Fortnite Battle Royale at the end of 2017. Second-ranked ibai (real name: Ibai Llanos Garatea) was ranked second with ***** million followers on Twitch. With more than **** million followers, Imane Anys, better known as Pokimane, was the only woman among the most-followed Twitch streamers worldwide. Overall, women only accounted for **** percent of the top-ranked Twitch channels.
s
Twitch Social Networks
marketplace.sshopencloud.eu
Updated Apr 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Twitch Social Networks [Dataset]. https://marketplace.sshopencloud.eu/dataset/3mIMx7
Explore at:
Dataset updated
Apr 24, 2020
Description
These datasets used for node classification and transfer learning are Twitch user-user networks of gamers who stream in a certain language. Nodes are the users themselves and the links are mutual friendships between them. Vertex features are extracted based on the games played and liked, location and streaming habits. Datasets share the same set of node features, this makes transfer learning across networks possible. These social networks were collected in May 2018. The supervised task related to these networks is binary node classification - one has to predict whether a streamer uses explicit language.
Twitch Social Networks
kaggle.com
Updated Nov 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Garritano (2019). Twitch Social Networks [Dataset]. https://www.kaggle.com/andreagarritano/twitch-social-networks/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 12, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andrea Garritano
Description
Twitch Social Networks

Description

These datasets used for node classification and transfer learning are Twitch user-user networks of gamers who stream in a certain language. Nodes are the users themselves and the links are mutual friendships between them. Vertex features are extracted based on the games played and liked, location and streaming habits. Datasets share the same set of node features, this makes transfer learning across networks possible. These social networks were collected in May 2018. The supervised task related to these networks is binary node classification - one has to predict whether a streamer uses explicit language.

Links

Germany

England

Spain

France

Porutgal

Russia

Properties

DE EN ES FR PT RU
Nodes 9,498 7,126 4,648 6,549 1,912 4,385
Edges 153,138 35,324 59,382 112,666 31,299 37,304
Density 0.003 0.002 0.006 0.005 0.017 0.004
Transitvity 0.047 0.042 0.084 0.054 0.131 0.049

Possible tasks

Binary node classification

Link prediction

Community detection

Network visualization

Paper: Multi-scale Attributed Node Embedding. Benedek Rozemberczki, Carl Allen, and Rik Sarkar. arXiv, 2019. https://arxiv.org/abs/1909.13021
Twitch Small Panel Results
kaggle.com
Updated Feb 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keiichi C (2019). Twitch Small Panel Results [Dataset]. https://www.kaggle.com/keiichicomplex/twitch-small-panel-results/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2019
Dataset provided by
Kaggle
Authors
Keiichi C
Description
Twitch.tv boasts over 2 million unique user views per day, and more than 100 thousand channels that entertain the users. How should new streamers stand out from more established names and gather a larger audience?

Using viewership data from Twitch.tv, I develop a model to help streamers make informed choices on choice of time, game and target language audience. I specifically consider the interaction between these choices, answering such as "When is the best time to stream League of Legends for a given language?" or "I am a Russian language streamer, what game attracts most audience?"

Additionally, I describe the whether streamers should stream when avoids time slots with more existing channels. This involves studying whether streamers has synergy with each other, despite acting as competitors by choosing to streaming similar content, together they might attract more viewers than when they stream different types of content.

Final project target is an application which is trained using historical twitch data, powered by immediate data from the Twitch API. The application offers the best selection of streaming choices under current twitch environment. Answering the questions "I want to gather the most viewships. What game in what language and when should i stream?"

Original datasource : https://clivecast.github.io

Content:

twitch_panel_fixedeffect.py : Panel Regression Model. Data Source 250 MB> 25MB limit, not included. creates regression data results 'twitch_small_panel_results.txt'

twitch_plot.py : Plots graphs using 'twitch_small_panel_results.txt'

twitch_small_panel_results.tx : contains regression results generated from twitch_panel_fixedeffect.py
Z
Twitch Plays Pokemon Dataset
data.niaid.nih.gov
Updated Jul 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haque, Albert (2020). Twitch Plays Pokemon Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3932956
Explore at:
Dataset updated
Jul 8, 2020
Dataset authored and provided by
Haque, Albert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset, titled the Twitch Plays Pokemon Dataset, contains 37.8 million IRC chat messages. It contains IRC chat log data for messages made between February 2, 2014 and April 23, 2014 (68 days). Each line denotes a single IRC chat message.

Sample of the dataset:

2014-02-1408:17:32medicbluea 2014-02-1408:17:32murderousburgerrare candy, RARE CANDY 2014-02-1408:17:32milk2978B 2014-02-1408:17:32mrtiktalikb 2014-02-1408:17:32dualhammersb 2014-02-1408:17:32shares5YES 2014-02-1408:17:32orangeruststart 2014-02-1408:17:32snowieea 2014-02-1408:17:33duroatedown 2014-02-1408:17:33crypticcraigup 2014-02-1408:17:33doug2725LOL HELIX FOSSIL WENT BACK THAT FAR

Abstract

With the increasing importance of online communities, discussion forums, and customer reviews, Internet “trolls” have proliferated thereby making it difficult for information seekers to find relevant and correct information. In this paper, we consider the problem of detecting and identifying Internet trolls, almost all of which are human agents. Identifying a human agent among a human population presents significant challenges compared to detecting automated spam or computerized robots. To learn a troll’s behavior, we use contextual anomaly detection to profile each chat user. Using clustering and distance-based methods, we use contextual data such as the group’s current goal, the current time, and the username to classify each point as an anomaly. A user whose features significantly differ from the norm will be classified as a troll. We collected 38 million data points from the viral Internet fad, Twitch Plays Pokemon. Using clustering and distance-based methods, we develop heuristics for identifying trolls. Using MapReduce techniques for preprocessing and user profiling, we are able to classify trolls based on 10 features extracted from a user’s lifetime history.

You can view the full technical paper here: https://arxiv.org/abs/1902.06208

Source Code

Code related to this dataset can be found at: https://github.com/ahaque/twitch-troll-detection
Emotes-2-Vec: A Large Scale Embedding of Twitch Chat Data
zenodo.org
data.niaid.nih.gov
bin, tsv, txt
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Korosh Moosavi; Korosh Moosavi; Muhammad Aurangzeb Ahmad; Muhammad Aurangzeb Ahmad; Afra Mashhadi; Afra Mashhadi (2023). Emotes-2-Vec: A Large Scale Embedding of Twitch Chat Data [Dataset]. http://doi.org/10.5281/zenodo.8012284
Explore at:
bin, txt, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8012284
Dataset updated
Jun 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Korosh Moosavi; Korosh Moosavi; Muhammad Aurangzeb Ahmad; Muhammad Aurangzeb Ahmad; Afra Mashhadi; Afra Mashhadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the data and resources used for a Twitch Emote recommendation system using a Word2Vec model. The nature and exploration of the data is described in Emotes-2-Vec: A Large Scale Embedding of Twitch Chat Data. To protect the privacy of the users whose messages were scraped to build this corpus, names and timestamps have been removed and only the message bodies are included. However, a tutorial for this project is included on the project GitHub: https://github.com/KoroshM/Emote-Recommender.

embeddings.tsv and labeled_metadata.tsv may be used in TensorFlow's embedding projector to visualize the embedding space.

Note: Model files are the following:
embeddings.tsv
labeled_metadata.tsv
model
model.model**
model.wv.vectors.npy

**Located here: https://drive.google.com/drive/folders/1RZC4JA4CpAcwoo6dOwq_jobTd6dNi_n2?usp=sharing
u
Goodreads Book Reviews
cseweb.ucsd.edu
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Goodreads Book Reviews [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.

Metadata includes

reviews

add-to-shelf, read, review actions

book attributes: title, isbn

graph of similar books

Basic Statistics:

Items: 1,561,465

Users: 808,749

Interactions: 225,394,930
u
Steam Video Game and Bundle Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Steam Video Game and Bundle Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.

Metadata includes

reviews

purchases, plays, recommends (likes)

product bundles

pricing information

Basic Statistics:

Reviews: 7,793,069

Users: 2,567,538

Items: 15,474

Bundles: 615
u
Amazon Question and Answer Data
cseweb.ucsd.edu
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon Question and Answer Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain 1.48 million question and answer pairs about products from Amazon.

Metadata includes

question and answer text

is the question binary (yes/no), and if so does it have a yes/no answer?

timestamps

product ID (to reference the review dataset)

Basic Statistics:

Questions: 1.48 million

Answers: 4,019,744

Labeled yes/no questions: 309,419

Number of unique products with questions: 191,185
u
Google Restaurants dataset
cseweb.ucsd.edu
csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Google Restaurants dataset [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
csvAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This is a mutli-modal dataset for restaurants from Google Local (Google Maps). Data includes images and reviews posted by users, as well as metadata for each restaurant.
u
Behance Community Art Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Behance Community Art Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
Likes and image data from the community art website Behance. This is a small, anonymized, version of a larger proprietary dataset.

Metadata includes

appreciates (likes)

timestamps

extracted image features

Basic Statistics:

Users: 63,497

Items: 178,788

Appreciates (likes): 1,000,000
u
PDMX
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
u
Marketing Bias data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Marketing Bias data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.

Metadata includes

ratings

product images

user identities

item sizes, user genders
u
Social Recommendation Data
cseweb.ucsd.edu
berd-platform.de
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Social Recommendation Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets include ratings as well as social (or trust) relationships between users. Data are from LibraryThing (a book review website) and epinions (general consumer reviews).

Metadata includes

reviews

price paid (epinions)

helpfulness votes (librarything)

flags (librarything)
u
Product Exchange/Bartering Data
cseweb.ucsd.edu
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Product Exchange/Bartering Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain peer-to-peer trades from various recommendation platforms.

Metadata includes

peer-to-peer trades

have and want lists

image data (tradesy)
Dota 2 - Pro Players Matches Results 2019 ~ 2021
kaggle.com
Updated Jun 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teo Calvo (2021). Dota 2 - Pro Players Matches Results 2019 ~ 2021 [Dataset]. https://www.kaggle.com/teocalvo/dota2-pro-players-matches-2019-202106/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2021
Dataset provided by
Kaggle
Authors
Teo Calvo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
[pt-br]

Contexto

Como jogador, estudante de Estatística e curioso, sempre que posso busco maneiras de aplicar meus conhecimentos em projetos práticos. Mais do que isso, tenho muita paixão em compartilhar minhas descobertas e aprendizados com a comunidade por meio de minhas lives na Twitch e vídeos no YouTube.

Tendo em vista recapitular um projeto que desenvolvi durante minha graduação, resolvemos fazer em live a obtenção dos dados de Partidas profissionais de Dota 2 a partir da API Open Dota. Os dados foram salvos em bancos de dados NoSQL (MongoDB) e também processados em diversas camadas de dados usando o conceito de Data Lake com a engine de processamento Apache Spark.

Você pode conferir nosso projeto em seu repositório no GitHub.

Conteúdo

Este dataset está longe de ser um dado crú, uma vez que passou por diversas etapas de transformações, cruzamentos e agregações. As informações presentes são estatísticas de cada time um dia antes da partida em questão ter início. Tais estatísticas são calculadas a partir das informações das partidas de cada jogador no 6 meses anteriores à partida em questão.

Assim, cada linha deste dataset possui a informação de qual time ganhou a partida, bem como estatísticas sumarizadas e 'não normalizadas' de cada time.

Agradecimentos

Muito obrigado a todos que acompanharam o desenvolvimento deste projeto em nossas lives e nos apoiaram com as inscrições na Twitch. O apoio de voc6es possibilita que levemos Data Science adiante, como por exemplo, compartilhando este dataset com mais pessoas que têm interesse em se desenvolver na área.

Inspiração

Nosso desejo enquanto comunidade é fazer com que o ensino chegue cada dia mais próximo das pessoas. E entendo que isso começa no Brasil. Por isso a descrição em pt-br, dando maior foco ao nosso público nacional.

Se tiver interesse em conhecer mais sobre nosso trabalho, nos acompanhe na Twitch: Téo Me Why .

[en - Google Translate]

Context

As a player, Statistics student and a curious person, I am always looking for ways to apply my skills in real time problems. I also am passionate about sharing my findings and learnings with others through my streaming sessions on Twitch or my Youtube channel.

With the goal of reusing a project that I worked on during my undergrad, we decided to stream the data acquisition of professional matches of Dota 2 through the Open Dota API. The dataset has been stored in a NoSQL (MongoDB) and it has been processed in several data layers using the Data Lake concept with the Apache Spark processing engine.

You can check out the project in this repository on GitHub.

Content

This dataset is far from being raw data, since it went through several stages of transformations, crossings and aggregations. The information present is each team's statistics one day before the match in question starts. Such statistics are calculated from each player's match information in the 6 months preceding the match in question.

Thus, each row of this dataset contains information on which team won the match, as well as summarized and 'non-normalized' statistics for each team.

Acknowledge

Many thanks to everyone who followed the development of this project in our lives and supported us with registration at Twitch. Your support enables us to take Data Science forward, such as sharing this dataset with more people who are interested in developing in the area.

Inspiration

Our desire as a community is to bring teaching closer to people every day. And I understand that this starts in Brazil. That's why the description in pt-br, giving greater focus to our national audience.

If you are interested in learning more about our work, follow us on Twitch: Téo Me Why .
u
Pinterest Fashion Compatibility
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

Metadata includes

product IDs

bounding boxes

Basic Statistics:

Scenes: 47,739

Products: 38,111

Scene-Product Pairs: 93,274
u
Recipe Pairs
cseweb.ucsd.edu
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Recipe Pairs [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
csvAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This is a collection recipes paired with variants, e.g. a recipe matched with a vegan version of the same recipe.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

	DE	EN	ES	FR	PT	RU
Nodes	9,498	7,126	4,648	6,549	1,912	4,385
Edges	153,138	35,324	59,382	112,666	31,299	37,304
Density	0.003	0.002	0.006	0.005	0.017	0.004
Transitvity	0.047	0.042	0.084	0.054	0.131	0.049

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Active streamers on Twitch worldwide 2025 [Dataset]. https://www.statista.com/statistics/746173/monthly-active-streamers-on-twitch/

Active streamers on Twitch worldwide 2025

Explore at:

20 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 4, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jan 2018 - Mar 2025

Area covered

Worldwide

Description

In March 2025, video streaming platform Twitch had approximately *** million active streamers, down from a peak of **** million in January 2021. The platform experienced a boom during the COVID-19 pandemic, when many new users used the platform to connect with friends or try their hand at livestreaming. However, this trend normalized again towards the end of the year, and the streaming space has also grown more competitive as platforms apart from Twitch have evolved to attract streamers and viewers. Popular content categories on Twitch In 2024, most of the leading content categories on Twitch were all gaming-related – except for the top spot: Just Chatting. The general conversation category accumulated *** billion hours of viewing time in the measured period. In March 2025, global Twitch audiences spent around *** million hours watching Just Chatting content on Twitch, with the average viewer count of such content reaching *** thousand. HasanAbi was the most popular Just Chatting streamer on Twitch in the most recently measured month. Game streamers Twitch is very popular with gamers and gaming audiences, and the ranking of the most popular Twitch streamers reflects this. Ninja (real name: Richard Tyler Blevins), the top-ranked streamer on Twitch, had **** million followers in April 2025. Ninja saw a meteoric rise to fame when he was one of the first top-ranked players to stream the then-newly released Fortnite Battle Royale at the end of 2017. Second-ranked ibai (real name: Ibai Llanos Garatea) was ranked second with ***** million followers on Twitch. With more than **** million followers, Imane Anys, better known as Pokimane, was the only woman among the most-followed Twitch streamers worldwide. Overall, women only accounted for **** percent of the top-ranked Twitch channels.

Clear search

Close search

Google apps

Main menu

Active streamers on Twitch worldwide 2025

Twitch Social Networks

Twitch Social Networks

Twitch Social Networks

Description

Links

Properties

Possible tasks

Twitch Small Panel Results

Original datasource : https://clivecast.github.io

Content:

Twitch Plays Pokemon Dataset

Emotes-2-Vec: A Large Scale Embedding of Twitch Chat Data

Goodreads Book Reviews

Steam Video Game and Bundle Data

Amazon Question and Answer Data

Google Restaurants dataset

Behance Community Art Data

PDMX

Marketing Bias data

Social Recommendation Data

Product Exchange/Bartering Data

Dota 2 - Pro Players Matches Results 2019 ~ 2021

[pt-br]

Contexto

Conteúdo

Agradecimentos

Inspiração

[en - Google Translate]

Context

Content

Acknowledge

Inspiration

Pinterest Fashion Compatibility

Recipe Pairs

Active streamers on Twitch worldwide 2025