Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This fictional sales dataset was created using a R code for the purpose of visualizing trends in customer demographics, product performance, and sales over time. A link to my Github repository containing all the codes used in generating the data frame and all the preceding processes can be found here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Video Games Sales Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sidtwr/videogames-sales-dataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Motivated by Gregory Smith's web scrape of VGChartz Video Games Sales, this data set simply extends the number of variables with another web scrape from Metacritic. Unfortunately, there are missing observations as Metacritic only covers a subset of the platforms. Also, a game may not have all the observations of the additional variables discussed below. Complete cases are ~ 6,900
Alongside the fields: Name, Platform, Year_of_Release, Genre, Publisher, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales, we have:-
Critic_score - Aggregate score compiled by Metacritic staff Critic_count - The number of critics used in coming up with the Critic_score User_score - Score by Metacritic's subscribers User_count - Number of users who gave the user_score Developer - Party responsible for creating the game Rating - The ESRB ratings
This repository, https://github.com/wtamu-cisresearch/scraper, after a few adjustments worked extremely well!
It would be interesting to see any machine learning techniques or continued data visualisations applied on this data set.#
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of the data and file structureWe are providing 5 datasets in TSF (time series format).All files include the series name, start and end timestamps, product type, and the time series values.NOTE: These datasets have values updated until May 2025. To obtain the updated datasets, consider using the StreamFuels package, available at https://pypi.org/project/streamfuels/Files and variablesFile: yearly_fuel_sales_by_state.tsfDescription: comprises 216 time series capturing the yearly historical sales of eight fuel products – ethanol, regular gasoline (gasoline-r), aviation gasoline (gasoline-a), liquefied petroleum gas (LPG), aviation kerosene (kerosene-a), illuminating kerosene (kerosene-i), fuel oil, and diesel – across 27 Brazilian states. Among these time series, 135 have records dating back to 1947, while 27 series begin in 1953, another 27 in 1959, and the remaining 27 in 1980.File: monthly_oilgas_operations_by_state.tsfDescription: comprises 76 time series capturing the monthly records of five types of industrial operations – production, reinjection, flaring, self-consumption, and availability – for three products: natural gas, petroleum, and natural gas liquid (NGL), across the 27 Brazilian states. Only natural gas includes all five operations, while petroleum and NGL are limited to the production operation. Of the total, 31 series started in 1997, while the remaining 45 date back to 2000.File: yearly_fuel_sales_by_city.tsfDescription: comprises 29,282 time series capturing the yearly sales of eight fuels and asphalt (a petroleum derivative) across 5,325 Brazilian cities. Most of the series begins in 1990 and 1992. However, we have some recent series that began in 2018.File: fuel_type_classification.tsfDescription: comprises 14,032 time series, each with a fixed length of 12 observations (i.e., one year of sales) and eight possible class labels. Among the five datasets, it is the only one that is labeled, with each series associated with a specific fuel type.File: monthly_fuel_sales_by_state.tsfDescription: comprises 216 time series capturing the monthly historical sales of eight fuel products across Brazil’s 27 states. All the time series dating back to 1990.Code/softwareAll the codes to collect and preprocessing the data are available at https://github.com/lucas-castrow/streamfuels/Access informationOther publicly accessible locations of the data:https://github.com/lucas-castrow/datasets_streamfuelsData was derived from the following sources:https://www.gov.br/anp/en/https://www.gov.br/anp/pt-br/centrais-de-conteudo/dados-abertos/vendas-de-derivados-de-petroleo-e-biocombustiveis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Video Game Sales’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/gregorut/videogamesales on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com.
Fields include
Rank - Ranking of overall sales
Name - The games name
Platform - Platform of the games release (i.e. PC,PS4, etc.)
Year - Year of the game's release
Genre - Genre of the game
Publisher - Publisher of the game
NA_Sales - Sales in North America (in millions)
EU_Sales - Sales in Europe (in millions)
JP_Sales - Sales in Japan (in millions)
Other_Sales - Sales in the rest of the world (in millions)
Global_Sales - Total worldwide sales.
The script to scrape the data is available at https://github.com/GregorUT/vgchartzScrape. It is based on BeautifulSoup using Python. There are 16,598 records. 2 records were dropped due to incomplete information.
--- Original source retains full ownership of the source dataset ---
This dataset contains raw GIS data sourced from the BAG (Basisregistratie Adressen en Gebouwen; Registry of Addresses and Buildings). It provides comprehensive information on buildings, including advanced height data and administrative details. It also contains geographic divisions within The Hague. Additionally, the dataset incorporates energy label data, offering insights into the energy efficiency and performance of these buildings. This combined dataset serves as the backbone of a Master's thesis in Industrial Ecology, analysing residential and office cooling and its environmental impacts in The Hague, Netherlands. The codebase of this analysis can be found in this Github repository: https://github.com/simonvanlierde/msc-thesis-ie
The dataset includes a background research spreadsheet containing supporting calculations. It also presents geopackages with results from the cooling demand model (CDM) for various scenarios: Status quo (SQ), 2030, and 2050 scenarios (Low, Medium, and High)
The background_research_data.xlsx spreadsheet contains comprehensive background research calculations supporting the shaping of input parameters used in the model. It contains several sheets:
Geographic divisions
BAG data
3D BAG
Energy labels
UHI effect data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hourly Electricity Demand by State
This archive contains the output of the Public Utility Data Liberation (PUDL) Project state electricity demand allocation analysis, as of the v0.4.0 release of the PUDL Python package. Here is the script that produced this output. It was run using the Docker container and processed data that are included in PUDL Data Release v2.0.0.
The analysis uses hourly electricity demand reported at the balancing authority and utility level in the FERC 714 (data archive), and service territories for utilities and balancing authorities inferred from the counties served by each utility, and the utilities that make up each balancing authority in the EIA 861 (data archive), to estimate the total hourly electricity demand for each US state.
We used the total electricity sales by state reported in the EIA 861 as a scaling factor to ensure that the magnitude of electricity sales is roughly correct, and obtains the shape of the demand curve from the hourly planning area demand reported in the FERC 714. The scaling is necessary partly due to imperfections in the historical utility and balancing authority service territory maps which we have been able to reconstruct from the data reported in the EIA 861 Service Territories and Balancing Authority tables.
The compilation of historical service territories based on the EIA 861 data is somewhat manual and could be improved, but overall the results seem reasonable. Additional predictive spatial variables will be required to obtain more granular electricity demand estimates (e.g. at the county level).
FERC 714 Respondents
The file ferc714_respondents.csv
links FERC Form 714 respondents to what we believe to be their corresponding EIA utilities or balancing authorities.
eia_code
: An integer ID reported in the FERC Form 714 corresponding to the respondent's EIA ID. In some cases this is a Utility ID, and in others it is a Balancing Authority ID, but which is not specified and so we have had to infer the type of entity which is responding. Note that in many cases the same company acts as both a utility and a balancing authority, and the integer ID associated with the company is often the same in both roles, but it does not need to be.respondent_type
: Either balancing_authority
or utility
depending on which type of entity we believe was responding to the FERC 714.respondent_id_ferc714
: The integer ID of the responding entity within the FERC 714.respondent_name_ferc714
: The name provided by the respondent in the FERC 714.balancing_authority_id_eia
: If the respondent was identified as a balancing authority, the EIA ID for that balancing authority, taken from the EIA Form 861.balancing_authority_code_eia
: If the respondent was identified as a balancing authority, the EIA short code used to identify the balancing authority, taken from the EIA Form 861.balancing_authority_name_eia
: If the respondent was identified as a balancing authority, the name of the balancing authority, taken from the EIA Form 861.utility_id_eia
: If the respondent was identified as a utility, the EIA utility ID, taken from the EIA Form 861.utility_name_eia
: If the respondent was identified as a utility, the name of the utility, taken from the EIA 861.FERC 714 Respondent Service Territories
The file ferc714_service_territories.csv
describes the historical service territories for FERC 714 respondents for the years 2006-2019. For each respondent and year, their service territory is composed of a collection of counties, identified by their 5-digit FIPS codes. The file contains the following columns, with each row associating a single county with a FERC 714 respondent in a particular year:
respondent_id_ferc714
: The FERC Form 714 respondent ID, which is also found in ferc714_respondents.csv
report_date
: The first day of the year for which the service territory is being described.state
: Two letter abbreviation for the state containing the county, for human readability.county
: The name of the county, for human readability.state_id_fips
: The 2-digit FIPS state code.county_id_fips
: The 5-digit FIPS county code for use with other geospatial data resources, like the US Census DP1 geodatabase.State Hourly Electricity Demand Estimates
The file demand.csv
contains hourly electricity demand estimates for each US state from 2006-2019. It contains the following columns:
state_id_fips
: The 2-digit FIPS state code.utc_datetime
: UTC time at hourly resolution.demand_mwh
: Electricity demand for that state and hour in MWh. This is an allocation of the electricity demand reported directly in the FERC Form 714.scaled_demand_mwh
: Estimated total electricity demand for that state and hour, in MWh. This is the reported FERC Form 714 hourly demand scaled up or down linearly such that the total annual electricity demand matches the total annual electricity sales reported at the state level in the EIA Form 861.A collection of plots are also included, comparing the original and scaled demand time series for each state.
Acknowledgements
This analysis was funded largely by GridLab, and done in collaboration with researchers at the Lawrence Berkeley National Laboratory, including Umed Paliwal and Nikit Abhyankar.
The data screening methods were originally designed to identify unrealistic data in the electricity demand timeseries reported to EIA on Form 930, and have been applied here to data form the FERC Form 714.
They are adapted from code published and modified by:
And described at:
The imputation methods were designed for multivariate time series forecasting.
They are adapted from code published by:
And described at:
About PUDL & Catalyst Cooperative
For additional information about this data and PUDL, see the following resources:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This fictional sales dataset was created using a R code for the purpose of visualizing trends in customer demographics, product performance, and sales over time. A link to my Github repository containing all the codes used in generating the data frame and all the preceding processes can be found here