Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
based on the ESP32 hardware.
This dataset was created by JAYAPRAKASHPONDY
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides electricity consumption data collected from the building management system of GreEn-ER. This building, located in Grenoble, hosts Grenoble-INP Ense³ Engineering School and the G2ELab (Grenoble Electrical Engineering Laboratory). It brings together in one place the teaching and research actors around new energy technologies. The electricity consumption of the building is highly monitored with plus than 300 meters. The data from each meter is available in one csv file, which contains two columns. One contains the Timestamp and the other contains de electricity consumption in kWh. The sampling rate for all data is 10 min. There are data available for 2017 and 2018. The dataset also contains data of the external temperature for 2017 and 2018. The files are structured as follows: - The main folder called "Data" contains 2 sub-folders, each one corresponding to one year (2017 and 2018). - Each sub-folder contains 3 other sub-folders, each one corresponding to a sector of the building. - The main folder "Data" also contains the csv files with the electricity consumption data of the whole building and a file called "Temp.csv" with the temperature data. - The separator used in the csv files is ";". - The sampling rate is 10 min and the unity of the consumption is kWh. It means that each sample corresponds to the energy consumption in these 10 minutes. So if the user wants to retrieve the mean power in this period (that corresponds to each sample), the value must be multiplied by 6. - Four Jupyter Notebook files, a format that allows combining text, graphics and code in python are also available. These files allow exploring all the data within the dataset. - These jupyter notebook files contains all the metadata necessary for understanding the system, like drawings of the system design, of the building etc. - Each file is named by the number of its meter. These numbers can be retrieved in tables and drawings available in the Jupyter Notebooks. - A couple of csv files with the system design are also available. They are called "TGBT1_n.csv", "TGBT2_n.csv" and "PREDIS-MHI_n.csv".
vitaliy-sharandin/energy-consumption-hourly-spain dataset hosted on Hugging Face and contributed by the HF Datasets community
Abstract: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
Data Set Characteristics | Number of Instances | Area | Attribute Characteristics | Number of Attributes | Date Donated | Associated Tasks | Missing Values |
---|---|---|---|---|---|---|---|
Multivariate, Time-Series | 2075259 | Physical | Real | 9 | 2012-08-30 | Regression, Clustering | Yes |
Source: Georges Hebrail (georges.hebrail '@' edf.fr), Senior Researcher, EDF R&D, Clamart, France Alice Berard, TELECOM ParisTech Master of Engineering Internship at EDF R&D, Clamart, France
Data Set Information: This archive contains 2075259 measurements gathered in a house located in Sceaux (7km of Paris, France) between December 2006 and November 2010 (47 months). Notes:
(global_active_power*1000/60 - sub_metering_1 - sub_metering_2 - sub_metering_3) represents the active energy consumed every minute (in watt hour) in the household by electrical equipment not measured in sub-meterings 1, 2 and 3. The dataset contains some missing values in the measurements (nearly 1,25% of the rows). All calendar timestamps are present in the dataset but for some timestamps, the measurement values are missing: a missing value is represented by the absence of value between two consecutive semi-colon attribute separators. For instance, the dataset shows missing values on April 28, 2007.
Attribute Information:
date: Date in format dd/mm/yyyy time: time in format hh:mm:ss global_active_power: household global minute-averaged active power (in kilowatt) global_reactive_power: household global minute-averaged reactive power (in kilowatt) voltage: minute-averaged voltage (in volt) global_intensity: household global minute-averaged current intensity (in ampere) sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered). sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light. sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
Relevant Papers: N/A
Citation Request: This dataset is made available under the āCreative Commons Attribution 4.0 International (CC BY 4.0)ā license
The BuildingsBench datasets consist of: Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB). BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below: ElectricityLoadDiagrams20112014 Building Data Genome Project-2 Individual household electric power consumption (Sceaux) Borealis SMART IDEAL Low Carbon London A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
Users can generate reports showing the amount of energy consumed by geographical area, sector (residential, commercial, industrial) classifications. The database also provides easy downloading of energy consumption data into the comma-separated values (CSV) file format.
Detailed household load and solar generation in minutely to hourly resolution. This data package contains measured time series data for several small businesses and residential households relevant for household- or low-voltage-level power system modeling. The data includes solar power generation as well as electricity consumption (load) in a resolution up to single device consumption. The starting point for the time series, as well as data quality, varies between households, with gaps spanning from a few minutes to entire days. All measurement devices provided cumulative energy consumption/generation over time. Hence overall energy consumption/generation is retained, in case of data gaps due to communication problems. Measurements were conducted 1-minute intervals, with all data made available in an interpolated, uniform and regular time interval. All data gaps are either interpolated linearly, or filled with data of prior days. Additionally, data in 15 and 60-minute resolution is provided for compatibility with other time series data. Data processing is conducted in Jupyter Notebooks/Python/pandas.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
D
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SoC
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
and different customers have different starting times
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor ECD-UY: Detailed household electricity consumption dataset of Uruguay. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Name: GoiEner smart meters data
Summary: The dataset contains hourly time series of electricity consumption (kWh) provided by the Spanish electricity retailer GoiEner. The time series are arranged in four compressed files:
raw.tzst, contains raw time series of all GoiEner clients (any date, any length, may have missing samples).
imp-pre.tzst, contains processed time series (imputation of missing samples), longer than one year, collected before March 1, 2020.
imp-in.tzst, contains processed time series (imputation of missing samples), longer than one year, collected between March 1, 2020 and May 30, 2021.
imp-post.tzst, contains processed time series (imputation of missing samples), longer than one year, collected after May 30, 2020.
metadata.csv, contains relevant information for each time series.
License: CC-BY-SA
Acknowledge: These data have been collected in the framework of the WHY project. This project has received funding from the European Unionās Horizon 2020 research and innovation programme under grant agreement No 891943.
Disclaimer: The sole responsibility for the content of this publication lies with the authors. It does not necessarily reflect the opinion of the Executive Agency for Small and Medium-sized Enterprises (EASME) or the European Commission (EC). EASME or the EC are not responsible for any use that may be made of the information contained therein.
Collection Date: From November 2, 2014 to June 8, 2022.
Publication Date: December 1, 2022.
DOI: 10.5281/zenodo.7362094
Other repositories: None.
Author: GoiEner, University of Deusto.
Objective of collection: This dataset was originally used to establish a methodology for clustering households according to their electricity consumption.
Description: The meaning of each column is described next for each file.
raw.tzst: (no column names provided)
timestamp;
electricity consumption in kWh.
imp-pre.tzst, imp-in.tzst, imp-post.tzst:
ātimestampā: timestamp;
ākWhā: electricity consumption in kWh;
āimputedā: binary value indicating whether the row has been obtained by imputation.
metadata.csv:
āuserā: 64-character identifying a user;
āstart_dateā: initial timestamp of the time series;
āend_dateā: final timestamp of the time series;
ālength_daysā: number of days elapsed between the initial and the final timestamps;
ālength_yearsā: number of years elapsed between the initial and the final timestamps;
āpotential_samplesā: number of samples that should be between the initial and the final timestamps of the time series if there were no missing values;
āactual_samplesā: number of actual samples of the time series;
āmissing_samples_absā: number of potential samples minus actual samples;
āmissing_samples_pctā: potential samples minus actual samples as a percentage;
ācontract_start_dateā: contract start date; ācontract_end_dateā: contract end date;
ācontracted_tariffā: type of tariff contracted (2.X: households and SMEs, 3.X: SMEs with high consumption, 6.X: industries, large commercial areas, and farms);
āself_consumption_typeā: the type of self-consumption to which the users are subscribed;
āp1ā, āp2ā, āp3ā, āp4ā, āp5ā, āp6ā: contracted power (in kW) for each of the six time slots;
āprovinceā: province where the user is located;
āmunicipalityā: municipality where the user is located (municipalities below 50.000 inhabitants have been removed);
āzip_codeā: post code (post codes of municipalities below 50.000 inhabitants have been removed);
ācnaeā: CNAE (Clasificación Nacional de Actividades Económicas) code for economic activity classification.
5 star: āāā
Preprocessing steps: Data cleaning (imputation of missing values using the Last Observation Carried Forward algorithm using weekly seasons); data integration (combination of multiple SIMEL files, i.e. the data sources); data transformation (anonymization, unit conversion, metadata generation).
Reuse: This dataset is related to datasets:
"A database of features extracted from different electricity load profiles datasets" (DOI 10.5281/zenodo.7382818), where time series feature extraction has been performed.
"Measuring the flexibility achieved by a change of tariff" (DOI 10.5281/zenodo.7382924), where the metadata has been extended to include the results of a socio-economic characterization and the answers to a survey about barriers to adapt to a change of tariff.
Update policy: There might be a single update in mid-2023.
Ethics and legal aspects: The data provided by GoiEner contained values of the CUPS (Meter Point Administration Number), which are personal data. A pre-processing step has been carried out to replace the CUPS by random 64-character hashes.
Technical aspects:
raw.tzst contains a 15.1 GB folder with 25,559 CSV files;
imp-pre.tzst contains a 6.28 GB folder with 12,149 CSV files;
imp-in.tzst contains a 4.36 GB folder with 15.562 CSV files; and
imp-post.tzst contains a 4.01 GB folder with 17.519 CSV files.
Other: None.
Energy consumption readings for a sample of 5,567 London Households that took part in the UK Power Networks led Low Carbon London project between November 2011 and February 2014.
Readings were taken at half hourly intervals. Households have been allocated to a CACI Acorn group (2010). The customers in the trial were recruited as a balanced sample representative of the Greater London population.
The dataset contains energy consumption, in kWh (per half hour), unique household identifier, date and time, and CACI Acorn group. The CSV file is around 10GB when unzipped and contains around 167million rows.
Within the data set are two groups of customers. The first is a sub-group, of approximately 1100 customers, who were subjected to Dynamic Time of Use (dToU) energy prices throughout the 2013 calendar year period. The tariff prices were given a day ahead via the Smart Meter IHD (In Home Display) or text message to mobile phone. Customers were issued High (67.20p/kWh), Low (3.99p/kWh) or normal (11.76p/kWh) price signals and the times of day these applied. The dates/times and the price signal schedule is availaible as part of this dataset. All non-Time of Use customers were on a flat rate tariff of 14.228pence/kWh.
The signals given were designed to be representative of the types of signal that may be used in the future to manage both high renewable generation (supply following) operation and also test the potential to use high price signals to reduce stress on local distribution grids during periods of stress.
The remaining sample of approximately 4500 customers energy consumption readings were not subject to the dToU tariff.
More information can be found on the Low Carbon London webpage
Some analysis of this data can be seen here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is the result of a multi-stage processing pipeline applied to raw SIMEL (Sistema de Medidas ElƩctricas) files. The pipeline involves splitting the raw SIMEL files into user-specific files, generating intermediate raw consumption time series, and finally imputing missing consumption values. The files included in this dataset are the end results of several of the processing scripts. The scripts used to generate these files can be found on GitHub: https://github.com/quesadagranja/GoiEner-v7. Detailed descriptions of each file and their contents are provided below.
imputed_goiener_v7.tar.zst
simel.tar.zst
raw_goiener_v7.tar.zst
metadata.csv
imputed_samples.csv
This dataset was created by MD. Mehedi Hassan Galib
CAMSL is the first public dataset for a Time-Of-Use (TOU) tariff intervention study using smart-meter data including pre, during and post TOU intervention periods. It includes 1423 households (1023 TOUusers and 400 Non-TOU users) in Tokyo between 1st July 2017 and 31st December 2018 (18 months). The dataset also includes raw data of 3337 customers who did not participate in the TOU trial. Each day has 48 half-hourly data points for energy consumption from a smart meter and each household has 579 days between 1 July 2017 to 31 December 2018, comprising a total of 27792 data points for electricity consumption obtained at each household for this dataset. The uniqueness of this dataset is the included online engagement data recorded via web-application usage, which enables further studies related to gamification effects.
consumption_data.zip: half-hourly consumption data from 1st June 2017 to 31st December 2018 customer_info.csv: customer information (house_type, number_of_residents, tou) TOU users == 1 Control users == 0 As for the selection of Control users, please refer to the article. web_info.csv: web activity information (for TOU customers) (sessions, average_session_duration, bounce_rate) already padded if the value is missing. temperature_Tokyo.csv: hourly temperature data in Tokyo from 1st June 2017 to 31st December 2018 holidays.csv: Japanese national holidays non_tou.csv.gz: raw data of consumption of total 3337 customers who did not participate in the TOU trial
https://cdla.io/permissive-1-0https://cdla.io/permissive-1-0
The dataset consists of energy consumption and weather data collected continuously during a period of 29 months at Challenger building (France).
Energy consumption data at 10 minutes intervals include:
Weather data include:
The data set contains
- Timeseries as CSV files
- Timeseries metadata (description, unit, type,...) as JSON file
The data have been made available from the BMS database, thanks to the BEMServer open-source platform - www.bemserver.org, developed in the HIT2GAP project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement n. 680708 - www.hit2gap.eu.
This dataset contains energy usage and total greenhouse gas emissions for city facilities/operations during the year 2013.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a Residential PV generation and consumption data set from an Estonian house. At the time of submission, one year (2023) of data was available. The data was logged at a 10-second resolution. The untouched dataset can be found in the raw data folder, which is separated month-wise. A few missing points in the dataset were filled with a simple KNN algorithm. However, improved data imputation methods based on machine learning are also possible. To carry out the imputing, run the scripts in the script folder one by one in the numerical serial order (SC1..py, SC2..py, etc.).
Data Descriptor (Scientific Data): https://doi.org/10.1038/s41597-025-04747-w">https://doi.org/10.1038/s41597-025-04747-w
General Information:
Duration: January 2023 ā December 2023
Resolution: 10 seconds
Dataset Type: Aggregated consumption and PV generation data
Logging Device: Camile Bauer PQ1000 (Ć2)
Load/Appliance Information:
Measurement Points:
Measured Parameters:
Script Description:
SC1_PV_auto_sort.py : This fixes timestamp continuity by resampling at the original sampling rate for PV generation data.
SC2_L2_auto_sort.py : This fixes timestamp continuity by resampling at the original sampling rate for meter-side measurement data.
SC3_PV_KNN_impute.py : Filling missing data points by simple KNN for PV generation data.
SC4_L2_KNN_impute.py : Filling missing data points by simple KNN for meter-side measurement data.
SC5_Final_data_gen.py : Merge PV and meter-side measurement data, and calculate load consumption.
The dataset provides all the outcomes (CSV files) from the scripts. All processed variables (PV generation, load, power import, and export) are expressed in kW units.
Update: 'SC1_PV_auto_sort.py' & 'SC2_L2_auto_sort.py' are adequate for cleaning up data and making the missing point visible. 'SC3_PV_KNN_impute.py' & 'SC4_L2_KNN_impute.py' work fine for short-range missing data points; however, these two scripts won't help much for missing data points for a longer period. They are provided as examples of one method of processing data. Future updates will include proper ML-based forecasting to predict missing data points.
Funding Agency and Grant Number:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
based on the ESP32 hardware.