9 datasets found
  1. h

    steam-games-dataset

    • huggingface.co
    Updated May 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Bustos (2025). steam-games-dataset [Dataset]. http://doi.org/10.57967/hf/0511
    Explore at:
    Dataset updated
    May 17, 2025
    Authors
    Martin Bustos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    Information of more than 110,000 games published on Steam. Maintained by Fronkon Games. This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy. Only published games, no DLCs, episodes, music, videos, etc. Here is a simple example of how to parse json information:

    Simple parse of the 'games.json' file.

    import os import json

    dataset = {} if… See the full description on the dataset page: https://huggingface.co/datasets/FronkonGames/steam-games-dataset.

  2. Top 1500 games on steam by revenue 09-09-2024

    • kaggle.com
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Cem Topcu (2024). Top 1500 games on steam by revenue 09-09-2024 [Dataset]. https://www.kaggle.com/datasets/alicemtopcu/top-1500-games-on-steam-by-revenue-09-09-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    Kaggle
    Authors
    Ali Cem Topcu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is my first data set that I upload to Kaggle, I hope people who collobrate or use this data enjoys it lasts Till this day I haven't seen much detailed datasets about game and games industry. I felt like I need to start it some where were gamers, game companies and gaming enthusiast who are interested in game data analytics can benefit from. firstly I must give a big tanks to gamalytic.com from where I downloaded this data freely from.

    About this data set: This dataset contains comprehensive information on the top 1500 games released on Steam between January 1, 2024, and September 9, 2024. Aggregated from 30 separate files, and combined into a single dataset. Minor adjustments were made, such as aligning game release dates for consistency.

    Key Features: Game Details: Includes titles, release dates, and developer/publisher information. Sales and Revenue: Tracks the number of copies sold, revenue generated, and pricing details. Player Engagement: Provides average playtime, peak player counts, and other user engagement metrics. Reviews and Scores: Features review scores and ratings. Dynamic Market Data: Offers insights into game performance trends over time, such as sales rank and price fluctuations.

    This dataset can be useful for:

    Game Developers: Understanding market trends, competitor analysis, and consumer behavior. Data Scientists: Exploring various data analysis techniques, including regression analysis, clustering, and time-series forecasting. Researchers: Analyzing game industry patterns and the impact of game characteristics on sales and user engagement.

  3. o

    Steam Game Review Dataset

    • opendatabay.com
    .undefined
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Steam Game Review Dataset [Dataset]. https://www.opendatabay.com/data/dataset/ca15fd2a-228a-4409-8c16-4aef376d7e2a
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    Context Video games have greatly contributed, and continue to contribute to the expansion of the entertainment industry. When the first video game, Pong, was launched in an arcade machine in 1972, it ignited a video game craze that quickly swept over the youth. With this, businesses such as Atari Games and Nintendo saw the golden opportunity of investing in a developing entertainment sector and began churning out gaming software and hardware. This caused the rise of the video game industry, which has generated over $109 billion in revenue and 2.2 billion gamers since its conception 50 years ago.

    In this industry with over 47 million daily active users, Steam has been operating for almost 16 years. Its constant improvement to better accommodate users has made its development notable in the video game industry.

    Steam is a digital distribution platform tailored to gamers and game developers. While it initially catered to PC games, the platform soon expanded its availability to home video game consoles such as the Xbox and Sony PlayStation. In Steam, gamers can log in to the website to conveniently purchase and play games online, a better alternative to buying physical copies of the games and manually downloading it on the computer.

    game

    Content A lot of gamers write reviews at the game page and have an option of choosing whether they would recommend this game to others or not. However, determining this sentiment automatically from text can help Steam to automatically tag such reviews extracted from other forums across the internet and can help them better judge the popularity of games.

    Game overview information for both train and test are available in single file game_overview.csv inside train.zip

    Acknowledgements Steam digital distribution.

    Inspiration Predict whether the reviewer recommended the game titles available in the test set on the basis of review text and other information.

    Original Data Source: Steam Game Review Dataset

  4. Steam Review Dataset (2017)

    • zenodo.org
    bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antoni Sobkowicz; Antoni Sobkowicz (2020). Steam Review Dataset (2017) [Dataset]. http://doi.org/10.5281/zenodo.1000885
    Explore at:
    bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antoni Sobkowicz; Antoni Sobkowicz
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset contains over 6.4 million publicly available reviews in English from Steam Reviews portion of Steam store run by Valve. Each review is described by review text, the id of game it belongs to, review sentiment (positive or negative) and a number of users who tough review was helpful. This is essentially an extension to previously released Steam Review Dataset

    The resource is provided as a bzip2 compressed CSV file.

    Steam Reviews and Steam are owned by Valve. Authors are not affiliated with and are not endorsed by Valve / Steam

  5. Sentiment Analysis for Steam Reviews

    • kaggle.com
    Updated Sep 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Agnihotri (2020). Sentiment Analysis for Steam Reviews [Dataset]. https://www.kaggle.com/datasets/piyushagni5/sentiment-analysis-for-steam-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2020
    Dataset provided by
    Kaggle
    Authors
    Piyush Agnihotri
    Description

    Sentiment Analysis for Steam Reviews

    Steam is a video game digital distribution service with a vast community of gamers globally. A lot of gamers write reviews on the game page and have the option of choosing whether they would recommend this game to others or not. However, determining this sentiment automatically from the text can help Steam to automatically tag such reviews extracted from other forums across the internet and can help them better judge the popularity of games.

    Given the review text with user recommendation and other information related to each game for 64 game titles, the task is to create a test set by making a split from the training set and try to predict whether the reviewer recommended the game titles available in the test set on the basis of review text and other information.

    Game overview information for the train is available in single file game_overview.csv.

    About Data Source: Steam Platform

    • train.csv

    review_id --> Unique ID for each review

    title --> Title of the game

    year --> Year in which the review was posted

    user_review --> Full Text of the review posted by a user

    user_suggestion --> (Target) Game marked Recommended(1) and Not Recommended(0) by the user

    • game_overview.csv

    title --> Title of the game

    developer --> Name of the developer of the game

    publisher --> Name of the publisher of the game

    tags --> Popular user-defined tags for the game

    overview --> Overview of the game provided by the publisher.

    Acknowledgements

    The data is collected from the Analytic Vidhya, JanataHack: NLP Hackathon.

  6. u

    Pinterest Fashion Compatibility

    • cseweb.ucsd.edu
    • beta.data.urbandatacentre.ca
    json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

    Metadata includes

    • product IDs

    • bounding boxes

    Basic Statistics:

    • Scenes: 47,739

    • Products: 38,111

    • Scene-Product Pairs: 93,274

  7. u

    PDMX

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.

  8. Z

    Spiking Seizure Classification Dataset

    • data.niaid.nih.gov
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gallou, Olympia (2025). Spiking Seizure Classification Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10800793
    Explore at:
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Gallou, Olympia
    Matthew, Cook
    Ito, Hiroyuki
    Bartels, Jim
    GHOSH, SAPTARSHI
    Sarnthein, Johannes
    Indiveri, Giacomo
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset for event encoded analog EEG signals for detection of Epileptic seizures

    This dataset contains events that are encoded from the analog signals recorded during pre-surgical evaluations of patients at the Sleep-Wake-Epilepsy-Center (SWEC) of the University Department of Neurology at the Inselspital Bern. The analog signals are sourced from the SWEC-ETHZ iEEG Database

    This database contains event streams for 10 seizures recorded from 5 patients and generated by the DYnamic Neuromorphic Asynchronous Processor (DYNAP-SE2) to demonstrate a proof-of-concept of encoding seizures with network synchronization. The pipeline consists of two parts (I) an Analog Front End (AFE) and (II) an SNN termed as"Non-Local Non-Global" (NLNG) network.

    In the first part of the pipeline, the digitally recorded signals from SWEC-ETHZ iEEG Database are converted to analog signals via an 18-bit Digital-to-Analog converter (DAC) and then amplified and encoded into events by an Asynchronous Delta Modulator (ADM). Then in the second part, the encoded event streams are fed into the SNN that extracts the features of the epileptic seizure by extracting the partial synchronous patterns intrinsic to the seizure dynamics.

    Details about the neuromorphic processing pipeline and the encoding process are included in a manuscript under review. The preprint is available in bioRxiv

    InstallationThe installation requires Python>=3.x and conda (or py-venv) package. Users can then install the requirements inside a conda environment using

    conda env create -f requirements.txt -n sez

    Once created the conda environment can be activated with conda activate sez

    The main files in the database are described in the hierarchy below.

    EventSezDataset/

    ├─ data/

    │ ├─ P x S x

    │ │ ├─ Pat x Sz x _CH x .csv

    ├─ LSVM_Params/

    │ ├─ opt_svm_params/

    │ ├─ pat_x_features_SYNCH/

    ├─ fig_gen.py

    ├─ sync_mat_gen.py

    ├─ SeizDetection_FR.py

    ├─ SeizDetection_SYNCH.py

    ├─ support.py

    ├─ run.sh

    ├─ requirements.txt

    where x represents the Patient ID and the Seizure ID respectively.

    requirements.txt: This file lists the requirements for the execution of the Python code.

    fig_gen.py: This file plots the analog signals and the associated AFE and NLNG event streams. The execution of the code happens with `python fig_gen.py 1 1 13', where patient 2, seizure 1, and channel 13 of the recording are plotted.

    sync_mat_gen.py: This file describes the function for plotting the synchronization matrices emerging from the ADM and the NLNG spikes with either linear or log colorbar. The execution of the code happens with python sync_mat_gen.py 1 1' orpython sync_mat_gen.py 1 1 log'. This execution generated four figures for pre-seizure, First Half of seizure, Second Half of seizure, and post-seizure time periods, where patient 1 and seizure 1. The third option can either be left blank or input as lin or log, for respective color bar scales. The time is the signal-time as mentioned in the table below.

    run.sh: A simple Linux script to run the above code for all patients and seizures.

    SeizDetection_FR.py: This file runs the LSVM on the ADM and NLNG spikes, using the firing rate (FR) as a feature. The code is currently set up with plotting with pre-computed features (in the LSVM_Params/opt_svm_params/ folder). Users can use the code for training the LSVM with different parameters as well.

    SeizDetection_SYNCH.py: This file runs the LSVM on the kernelized ADM and NLNG spikes, using the flattened SYNC matrices as a feature. The code is currently set up with plotting with pre-computed features (in the LSVM_Params/pat_x_features_SYNCH/ folder). Users can use the code for training the LSVM with different parameters as well.

    LSVM_Params: Folder containing LSVM features with different parameter combinations.

    support.py: This file contains the necessary functions.

    data/P1S1/: This folder, for example, contains the event streams for all channels for seizure 1 of patient 1.

    Pat1_Sz_1_CH1.csv: This file contains the spikes of the AFE and the NLNG layers with the following tabular format (which can be extracted by the fig_gen.py)

    Comments

    SStart: 180 //Start of the Seizure in signal time# SEnd: 276.0 //Start of the Seizure in signal time# Pid: 2 // The patient ID as per the SWEC-ETHZ iEEG Database # Sid: 1 // The Seizure ID as per the SWEC-ETHZ iEEG Database # Channel_No: 1 // The channel number

    SYS_time signal_time dac_value ADMspikes NLNGspikes

    The time from the interface FPGA The time of the signal as per the SWEC ETHZ Database The value of the analog signals as recorded in the SWEC ETHZ Database The event-steam is the output of the AFE in boolean format. True represents a spike The spike-steam is the output of the SNN in boolean format. True represents a spike

  9. W

    Data from: COMPUTER MODELING OF A THREE-DIMENSIONAL STEAM INJECTION...

    • cloud.csiss.gmu.edu
    pdf
    Updated Aug 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Energy Data Exchange (2019). COMPUTER MODELING OF A THREE-DIMENSIONAL STEAM INJECTION EXPERIMENT [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/computer-modeling-of-a-three-dimensional-steam-injection-experiment
    Explore at:
    pdf(1838275)Available download formats
    Dataset updated
    Aug 8, 2019
    Dataset provided by
    Energy Data Exchange
    Description

    The experimental results and CT scans obtained during a steam-flooding experiment with the SUPRI 3-D steam injection laboratory model are compared with the results obtained from a numerical simulator for the same experiment. Simulation studies were carried out using the STARS (Steam and Additives Reservoir Simulator) compositional simulator. The saturation and temperature distributions obtained and heat loss rates measured in the experimental model at different stages of steam-flooding were compared with those calculated from the numerical simulator. There is a fairly good agreement between the experimental results and the simulator output. However, the experimental scans show a greater degree of gravity override than that obtained with the simulator for the same heat-loss rates. Symmetric sides of the experimental 5-spot show asymmetric heat-loss rates contrary to theory and simulator results. Some utility programs have been written for extracting, processing and outputting the required grid data from the STARS simulator. These are general in nature and can be useful for other STARS users.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Bustos (2025). steam-games-dataset [Dataset]. http://doi.org/10.57967/hf/0511

steam-games-dataset

Steam Games Dataset

FronkonGames/steam-games-dataset

Explore at:
Dataset updated
May 17, 2025
Authors
Martin Bustos
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Overview

Information of more than 110,000 games published on Steam. Maintained by Fronkon Games. This dataset has been created with this code (MIT) and use the API provided by Steam, the largest gaming platform on PC. Data is also collected from Steam Spy. Only published games, no DLCs, episodes, music, videos, etc. Here is a simple example of how to parse json information:

Simple parse of the 'games.json' file.

import os import json

dataset = {} if… See the full description on the dataset page: https://huggingface.co/datasets/FronkonGames/steam-games-dataset.

Search
Clear search
Close search
Google apps
Main menu