Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Information of more than 110,000 games published on Steam. Maintained by Fronkon Games. This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy. Only published games, no DLCs, episodes, music, videos, etc. Here is a simple example of how to parse json information:
import os import json
dataset = {} if… See the full description on the dataset page: https://huggingface.co/datasets/FronkonGames/steam-games-dataset.
In April 2025, total video games sales in the United States amounted to **** billion U.S. dollars, representing a one percent year-over-year increase. Generally speaking, the video game industry has its most important months in November and December, as video game software and hardware make very popular Christmas gifts. In December 2024, total U.S. video game sales surpassed **** billion U.S. dollars. Birth of the video game industry Although the largest regional market in terms of sales, as well as number of gamers, is Asia Pacific, the United States is also an important player within the global video games industry. In fact, many consider the United States as the birthplace of gaming as we know it today, fueled by the arcade game fever in the ’60s and the introduction of the first personal computers and home gaming consoles in the ‘70s. Furthermore, the children of those eras are the game developers and game players of today, the ones who have driven the movement for better software solutions, better graphics, better sound and more advanced interaction not only for video games, but also for computers and communication technologies of today. An ever-changing market However, the video game industry in the United States is not only growing, it is also changing in many ways. Due to increased internet accessibility and development of technologies, more and more players are switching from single-player console or PC video games towards multiplayer games, as well as social networking games and last, but not least, mobile games, which are gaining tremendous popularity around the world. This can be evidenced in the fact that mobile games accounted for ** percent of the revenue of the games market worldwide, ahead of both console games and downloaded or boxed PC games.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Contains linguistic annotated data from the Online-Forum PC Games (https://forum.pcgames.de). The forum is concerned about gaming. All posts (approx. 2.4 mio) where scraped in April 2019 (details see Kissling 2019), resulting in 120 mio tokens of almost 70'000 authors. The data is saved in a SQL-database and can be accessed using eg. pg_restore. The database itself and the tables of the database contain detailed self-descriptions. In this database you find tokenized, part-of-speech-tagged and party lemmatized information of every token in the forum and its metadata (usernames and their location in the forum structure, e.g. which post(s), thread, subforum it belongs to). The order of the words in a post cannot be reconstructed with this corpus. Usernames were replaced with author_ids to protect the personal rights of the post authors. Additional information: As this corpus was analyzed in terms of productivity and language contact of German and English (Kissling 2020), there is additional information about German base forms found in present day English, mainly focussing on the formula "German_verb_stem + -en = English verb infinitive". Therefore the API of the Oxford Dictionary of English was used. You will find the results of the API request done with Oxford Dictionary of English in the table infinitives. The corpus can be used without using this information, too. Calculations were performed at sciCORE (http://scicore.unibas.ch/) scientific computing core facility at University of Basel on 2019-09-10. This database contains all of the primary corpus of Kissling (2020). Sources: Kissling, J. (2019). Computerunterstütztes Verfahren zur Erhebung eigener Textkorpus-Daten. Methodenentwicklung und Anwendung auf 2.4 Mio. Posts des Forums PC Games.de [certification thesis]. Universität Basel. Kissling, J. (2020). Produktivität englischer Verben im Deutschen [master thesis]. Universität Basel.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset of video games and their minimum and recommended requirements is a collection of information about various video games and the hardware specifications required to run them. It was scraped from https://www.game-debate.com/. The dataset contains 90 columns. There are columns for CPU and GPU requirements for each of the Minimum and Recommended requirements. So both minimum and recommended CPU requirements. The full list of columns are:
Complete dataset used in the research study on Examining the Gender Representation in Console and PC Games by Dr. Cynthia Bailey
The dataset contains 6 columns and 199 rows of pc games with the highest metascore available on metacritic. The title and publisher column contains the title of the game and the name of the company that published the game. The rating column contains the rating of the game with "-" which means there is no rating. The user_score column contains the average rating given by the user to the game. The metascore column contains the results of the assessment conducted by the metacritics of the game.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Contains linguistic annotated data from the Online-Forum PC Games (https://forum.pcgames.de). The forum is concerned about gaming. All posts (approx. 2.4 mio) where scraped in April 2019 (details see Kissling 2019), resulting in 120 mio tokens of almost 70'000 authors. The data is saved in a SQL-database and can be accessed using eg. pg_restore. The database itself and the tables of the database contain detailed self-descriptions.
In this database you find tokenized, part-of-speech-tagged and party lemmatized information of every token in the forum and its metadata (usernames and their location in the forum structure, e.g. which post(s), thread, subforum it belongs to). The order of the words in a post cannot be reconstructed with this corpus. Usernames were replaced with author_ids to protect the personal rights of the post authors.
Additional information:
As this corpus was analyzed in terms of productivity and language contact of German and English (Kissling 2020), there is additional information about German base forms found in present day English, mainly focussing on the formula "German_verb_stem + -en = English verb infinitive". Therefore the API of the Oxford Dictionary of English was used. You will find the results of the API request done with Oxford Dictionary of English in the table infinitives. The corpus can be used without using this information, too.
Calculations were performed at sciCORE (http://scicore.unibas.ch/) scientific computing core facility at University of Basel on 2019-09-10. This database contains all of the primary corpus of Kissling (2020).
Sources:
Kissling, J. (2019). Computerunterstütztes Verfahren zur Erhebung eigener Textkorpus-Daten. Methodenentwicklung und Anwendung auf 2.4 Mio. Posts des Forums PC Games.de [certification thesis]. Universität Basel.
Kissling, J. (2020). Produktivität englischer Verben im Deutschen [master thesis]. Universität Basel.
The used scraper is available on github: https://github.com/vizzerdrix55/web-scraping-vBulletin-forum
The online gaming platform, Steam, was first released by the Valve Corporation in 2003. What started off as a small platform for Valve to provide updates to its games has turned into the largest computer gaming platform in the world. The platform initially released just 65 games in 2004, but this number has progressively risen in the ensuing years, reaching a staggering 15,422 in 2024, up from 9,204 in 2020. Steam’s PC dominance When you think of PC gaming, you automatically think of Steam. With such a wide range of games on offer, from traditional online multiplayer shooters to farming simulators, there is something for every gaming taste on the platform. As a result, gamers flock to Steam in their millions, with the platform registering over 132 million monthly active users in 2021. The global nature of the platform can be seen by the wide range of languages spoken by its users. Whilst English was the most spoken language for most of the platform's history, this changed as over 33 percent of users in October 2024 claimed Chinese as their platform language. Steam’s biggest games Counter Strike 2 was the most popular game on Steam during 2024. The first-person shooter averaged almost 685,000 players per hour, a significant lead over its successor, Counter-Strike 2. The game was also third among the 2024 list for peak number of concurrent players — CS2 reached over 1.74 million players in a single hour in its peak, with Black Myth: Wukong claiming first place with over 2.4 million peak concurrent players.
Complete dataset used in the research study on How Gamification Techniques Enhance Engagement in Educational PC Games by Dr. Kevin Stewart
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data collected from accessibility evaluation with WCAG 2.1 in serious games. Applies a combinedmanual method including educational interactive simulations.Applies a combined manual method including educational interactive simulations.The data were recorded in a spreadsheet by applying: 1) Automatic tools to check color contrast andbrightness that can cause alterations to people with epilepsy. 2)A manual method was then appliedwith the WCAG 2.1.
Complete dataset used in the research study on Analyzing the Business Model of Free-to-Play Games on PC and Consoles by Dr. Amanda Evans
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
*Also find Metacritic Movies and Metacritic TV Shows datasets.*
This dataset contains a collection of video games and their corresponding reviews from Metacritic, a popular aggregate review site. The data provides insights into various video games across different platforms, including PC, PlayStation, Xbox, and others. Each game entry includes critical reviews, user reviews, ratings, and other relevant information that can be used for analysis, natural language processing, machine learning, and predictive modeling.
Important Note: *The games in this collection are selected from Metacritic's Best Games of All Time list, which only includes titles that have received at least 7 reviews, ensuring a minimum level of critical and user input.*
Up-to-dateness: *This dataset is accurate as of March 14, 2025, and includes the most current rankings and game details available at that time.*
The dataset contains general information and scores of 13K+ games and their corresponding 1.6M+ user/critic reviews collected by sending automated requests to Metacritic's public backend API using Python's requests and pandas libraries.
This dataset is perfect for researchers, game enthusiasts, and data scientists who are interested in exploring the gaming industry through data analysis.
League of Legends dataset associated with the paper titled: Individual Performance in Team-based Online Games by Sapienza, A., Zeng, Y., Bessi, A., Lerman, K., Ferrara, E. (Royal Society Open Science 5 180329, 2018) The dataset adopted for this study was collected using the League of Legends' Riot Games API (Riot Games API: https://developer.riotgames.com/) It consists of 435,000 matches played by a sample of 1,120 of the most active players, i.e., those who played more than 100 games. The data contains information about matches, including match time and duration, and the number of deaths, kills, earned gold, gold spent, etc. for each player in each match.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Video Game Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/gregorut/videogamesales on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com.
Fields include
Rank - Ranking of overall sales
Name - The games name
Platform - Platform of the games release (i.e. PC,PS4, etc.)
Year - Year of the game's release
Genre - Genre of the game
Publisher - Publisher of the game
NA_Sales - Sales in North America (in millions)
EU_Sales - Sales in Europe (in millions)
JP_Sales - Sales in Japan (in millions)
Other_Sales - Sales in the rest of the world (in millions)
Global_Sales - Total worldwide sales.
The script to scrape the data is available at https://github.com/GregorUT/vgchartzScrape. It is based on BeautifulSoup using Python. There are 16,598 records. 2 records were dropped due to incomplete information.
--- Original source retains full ownership of the source dataset ---
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset contains FPS measurement of video games executed on computers. Each row of the dataset describes the outcome of FPS measurement (outcome is attribute FPS) for a video game executed on a computer. A computer is characterized by the CPU and the GPU. For both the name is resolved to technical specifications (features starting with Cpu and Gpu). The technical specification of CPU and GPU are technical specification that describe the factory state of the respective component. The game is characterized by the name, the displayed resolution, and the quality setting that was adjusted during the measurement (features starting with Game). In the following there is a short descriptions of the data sources and a description for each feature in the dataset.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The following is the results of an online survey conducted by BoilingSteam.com among the Linux Gamers' Community (n=560, sharing only here answers where respondents explicitly agreed to have their answers made public, i.e. total n size was higher) in end of Q1 2016, to better understand their hardware, usage habits and reactions to several of Valve's Steam Initiatives. Most of the answers are coming from members of the r/Linux_Gaming and r/Linux subreddits, so you need to take in account that this may not be representative of your typical Linux user.
There are many variables in this data set, with both numerical, free text and categorical answers. Every line corresponds to an individual response. Note that answers are anonymous. The first row is the coding you can use for your analysis (that should save a bit of time), the second row is the actual question asked (you can erase it), and the data starts from the third row.
Questions cover some of the following attributes (there are much more in the actual datasheet):
The questionnaire was designed by Ekianjo at BoilingSteam.com. If you have suggestions for improvements of future surveys of the same kind, please reach us on Kaggle or on our contact page: http://boilingsteam.com/about-boiling-steam/
You can see some analysis done a previous iteration of this survey (previous data can not be made public however) - this may serve as a good benchmark to measure changes: http://boilingsteam.com/the-three-kinds-of-linux-gamers/
Feel free to play with the data, and share what insights you may find. We are big proponents of making data free in general for transparency purposes, so if your analysis can help generate a better understanding of who are Linux Gamers, this would be a great outcome.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the data and analysis of the evaluation of accessibility in video games when applying WCAG 2.1. To reproduce the experiment it is suggested to review the article of the IHSI 2020 conference. http://www.ihsint.org/ which comprises 9 sequential steps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset and model used for Tiny Towns Scorer, a computer vision project completed as part of CS 4664: Data-Centric Computing Capstone at Virginia Tech. The goal of the project was to calculate player scores in the board game Tiny Towns.
The dataset consists of 226 images and associated annotations, intended for object detection. The images are photographs of players' game boards over the course of a game of Tiny Towns, as well as photos of individual game pieces taken after the game. Photos were taken using hand-held smartphones. Images are in JPG and PNG formats. The annotations are provided in TFRecord 1.0 and CVAT for Images 1.1 formats.
The weights for the trained RetinaNet-portion of the model are also provided.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This table contains 2124 series, with data for years 1990 - 1998 (not all combinations necessarily have data for all years), and was last released on 2007-01-29. This table contains data described by the following dimensions (Not all combinations are available): Geography (30 items: Austria; Belgium (Flemish speaking); Belgium; Belgium (French speaking) ...), Sex (2 items: Males; Females ...), Age group (3 items: 11 years;13 years;15 years ...), Activity (2 items: Watch VCR movies; Play computer games ...), Time spent (6 items: Not at all;1 to 3 hours; Less than 1 hour;4 to 6 hours ...).
The data comprises of in-depth interviews with two groups. The first is 20 parents and carers of children and young people who spend money in digital games and have purchased loot boxes (or similar). These interviews explored how parents view their child’s gaming and in-game purchases, how they understand paid reward systems in digital games, and what would help them navigate these systems with their children. The second group are 10 game designers who have experience of designing and developing digital games that contain paid reward systems. The focus here was to investigate how designers make decisions and how they understand the effects paid reward systems have on players. The aim of this data collection was to provide in- depth qualitative evidence of how children and young people engage with, understand, and experience paid reward systems in digital games (across console, mobile, and PC). Commonly called loot boxes, card packs, or spins, these digital items give randomised rewards of uncertain value in exchange for in-game currency purchased with real world money. Their success is largely predicated upon the use of techniques borrowed from regulated gambling to engage players and encourage repeated use of these mechanisms. The motivation for the study was therefore to collect data to investigate the link between paid reward systems in digital games and their relationship to techniques drawn from regulated gambling. These interviews were supplements to video ethnography with 42 families in the North East of England that were conducted in the family home to understand children and young people's practices and activities involving paid reward systems. These files are not uploaded to ReShare due to ethical considerations of recorded footage of children and young people in homes, as per our institutional ethical approval.Gambling style systems in digital games, such as loot boxes, cards, micro-transactions and forms of currency used to purchase game specific content have become widely adopted in a range of digital games. These models of revenue generation can take many forms, from free to play smart phone games that encourage the purchase of additional digital content, to full price videogame console releases that utilise chance based cards or 'loot' paid for with real currency. These systems are highly profitable, with publishers such as Activision earning over $4 billion from this aspect of their games in 2017 alone (Makuch 2018). But, their success is predicated upon the use of techniques and mechanics borrowed from machine gambling to encourage repeated use of these systems. While gambling is a highly regulated activity in the UK that is restricted to adults over the age of 18, many of these games are actively marketed and sold to children and young people under 18. This is problematic and the Gambling Commission (2017) has recently pointed out that 25,000 children between 11 and 16 are problem gamblers, 'with many introduced to betting via computer games and social media'. These systems thus raise important questions about their design and regulation, especially if they act as a gateway to other forms of gambling such as online casinos or fixed odds betting terminals. Despite the widespread nature of gambling style systems in digital games, no academic work has explicitly: 1. Investigated how children and young people use these systems in their everyday lives and whether they create any problems or issues for these groups. 2. Investigated how parents and guardians understand and regulate their children's use of these systems. To investigate these issues and fill this gap in knowledge the project researches three groups. 1. Digital reward system designers. Through interviews with 10 digital interface designers the project will identify the key mechanics and systems utilised in the games they have worked on and the aims of this design. 2. Children and young people who use gambling style systems in digital games. Through 100 hours of video ethnography across 40 families (equalling approximately 2.5 hours of footage per family), the project will investigate how children and young people use gambling style systems in digital games. In addition, 20 semi-structured interviews with children and young people will be conducted to understand how they use gambling style systems outside of the home, for example on mobile devices. 3. Parents of children and young people who use these systems. 20 interviews with parents will investigate how they understand these systems and whether they regulate their use of these systems and what form this regulation might take. Through research with these groups, the project develops a theoretical model of gambling style systems in digital games that investigates whether the success of their underlying mechanics is fundamentally linked to the space-times where they are used. It then examines how children and young people use these systems in practice and how they make sense of them. Utilising this body of evidence, the study will then offer recommendations as to whether these systems should be regulated and what form this regulation could take. The data comprises of qualitative semi-structured interviews with two groups. The first is parents and guardians of children and young people in the North East of England who have used loot boxes and bought in-game content in digital games and apps. Discussions focus on how and when children and young people spend money, and how parents and guardians understand and manage spending. The second group is games designers who create loot boxes and in-game spending systems in a range of games and apps. Here, discussion focuses on the techniques of design in relation to encouraging children and young people to spend money and how effective these techniques are. Sampling procedures involved snowball sampling.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Information of more than 110,000 games published on Steam. Maintained by Fronkon Games. This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy. Only published games, no DLCs, episodes, music, videos, etc. Here is a simple example of how to parse json information:
import os import json
dataset = {} if… See the full description on the dataset page: https://huggingface.co/datasets/FronkonGames/steam-games-dataset.