6 datasets found

1200 pixels spectral datasets
zenodo.org
zip
Updated May 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hui Zhang; Hui Zhang (2024). 1200 pixels spectral datasets [Dataset]. http://doi.org/10.5281/zenodo.11082600
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11082600
Dataset updated
May 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hui Zhang; Hui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the Zip, spectral. npy was the average spectral data of red ginseng, mycotoxins and interference impurities, and label. npy was the corresponding label. Spectral data format was [1200,510] and label data format was [1200,1]. The example of data usage (sklearn in Python database was used to establish the classification model) was as follows:

import numpy as np
from sklearn. model_selection import train_test_split
from sklearn. preprocessing import StandardScaler
from sklearn. neighbors import KNeighborsClassifier
from sklearn. metrics import classification_report, accuracy_score

# Load spectral data and labels
x = np.load('.../spectral.npy')[:,1:-1]
y = np.load('.../label.npy')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Data standardization
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train the KNN model
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model. fit(x_train, y_train)

# Predict
y_pred = knn_model.predict(x_test)

# Print classification reports and accuracy rates
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score:")
print(accuracy_score(y_test, y_pred))

Household Energy Consumption

kaggle.com

Updated Apr 5, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Samx_sam (2025). Household Energy Consumption [Dataset]. https://www.kaggle.com/datasets/samxsam/household-energy-consumption

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 5, 2025

Dataset provided by

Kaggle

Authors

Samx_sam

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🏡 Household Energy Consumption - April 2025 (90,000 Records)

📌 Overview

This dataset presents detailed energy consumption records from various households over the month. With 90,000 rows and multiple features such as temperature, household size, air conditioning usage, and peak hour consumption, this dataset is perfect for performing time-series analysis, machine learning, and sustainability research.

Column Name	Data Type Category	Description
Household_ID	Categorical (Nominal)	Unique identifier for each household
Date	Datetime	The date of the energy usage record
Energy_Consumption_kWh	Numerical (Continuous)	Total energy consumed by the household in kWh
Household_Size	Numerical (Discrete)	Number of individuals living in the household
Avg_Temperature_C	Numerical (Continuous)	Average daily temperature in degrees Celsius
Has_AC	Categorical (Binary)	Indicates if the household has air conditioning (Yes/No)
Peak_Hours_Usage_kWh	Numerical (Continuous)	Energy consumed during peak hours in kWh

📂 Dataset Summary

Rows: 90,000
Time Range: April 1, 2025 – April 30, 2025
Data Granularity: Daily per household
Location: Simulated global coverage
Format: CSV (Comma-Separated Values)

📚 Libraries Used for Working with household_energy_consumption_2025.csv

🔍 1. Data Manipulation & Analysis

Library	Purpose
`pandas`	Reading, cleaning, and transforming tabular data
`numpy`	Numerical operations, working with arrays

📊 2. Data Visualization

Library	Purpose
`matplotlib`	Creating static plots (line, bar, histograms, etc.)
`seaborn`	Statistical visualizations, heatmaps, boxplots, etc.
`plotly`	Interactive charts (time series, pie, bar, scatter, etc.)

📈 3. Machine Learning / Modeling

Library	Purpose
`scikit-learn`	Preprocessing, regression, classification, clustering
`xgboost` / `lightgbm`	Gradient boosting models for better accuracy

🧹 4. Data Preprocessing

Library	Purpose
`sklearn.preprocessing`	Encoding categorical features, scaling, normalization
`datetime` / `pandas`	Date-time conversion and manipulation

🧪 5. Model Evaluation

Library	Purpose
`sklearn.metrics`	Accuracy, MAE, RMSE, R² score, confusion matrix, etc.

✅ These libraries provide a complete toolkit for performing data analysis, modeling, and visualization tasks efficiently.

📈 Potential Use Cases

This dataset is ideal for a wide variety of analytics and machine learning projects:

🔮 Forecasting & Time Series Analysis

Predict future household energy consumption based on previous trends and weather conditions.
Identify seasonal and daily consumption patterns.

💡 Energy Efficiency Analysis

Analyze differences in energy consumption between households with and without air conditioning.
Compare energy usage efficiency across varying household sizes.

🌡️ Climate Impact Studies

Investigate how temperature affects electricity usage across households.
Model the potential impact of climate change on residential energy demand.

🔌 Peak Load Management

Build models to predict and manage energy demand during peak hours.
Support research on smart grid technologies and dynamic pricing.

🧠 Machine Learning Projects

Supervised learning (regression/classification) to predict energy consumption.
Clustering households by usage patterns for targeted energy programs.
Anomaly detection in energy usage for fault detection.

🛠️ Example Starter Projects

Time-series forecasting using Facebook Prophet or ARIMA
Regression modeling using XGBoost or LightGBM
Classification of AC vs. non-AC household behavior
Energy-saving recommendation systems
Heatmaps of temperature vs. energy usage

P
EDGE-IIOTSET Dataset
paperswithcode.com
Updated Oct 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
Explore at:
Dataset updated
Oct 16, 2023
Description
ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

Instructions:

Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

Link to paper : https://ieeexplore.ieee.org/document/9751703

The directories of the Edge-IIoTset dataset include the following:

•File 1 (Normal traffic)

-File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

-File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

-File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

-File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

-File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

-File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

-File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

-File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

•File 2 (Attack traffic):

-File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

-File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

•File 3 (Selected dataset for ML and DL):

-File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

-File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

!pip install -q kaggle

files.upload()

!mkdir ~/.kaggle

!cp kaggle.json ~/.kaggle/

!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

!unzip DNN-EdgeIIoT-dataset.csv.zip

!rm DNN-EdgeIIoT-dataset.csv.zip

Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

import numpy as np

df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

Step 3 : Exploring some of the DataFrame's contents: df.head(5)

print(df['Attack_type'].value_counts())

Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

"http.file_data","http.request.full_uri","icmp.transmit_timestamp", "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport", "tcp.dstport", "udp.port", "mqtt.msg"]

df.drop(drop_columns, axis=1, inplace=True)

df.dropna(axis=0, how='any', inplace=True)

df.drop_duplicates(subset=None, keep="first", inplace=True)

df = shuffle(df)

df.isna().sum()

print(df['Attack_type'].value_counts())

Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn import preprocessing

def encode_text_dummy(df, name):

dummies = pd.get_dummies(df[name])

for x in dummies.columns:

dummy_name = f"{name}-{x}" df[dummy_name] = dummies[x]

df.drop(name, axis=1, inplace=True)

encode_text_dummy(df,'http.request.method')

encode_text_dummy(df,'http.referer')

encode_text_dummy(df,"http.request.version")

encode_text_dummy(df,"dns.qry.name.len")

encode_text_dummy(df,"mqtt.conack.flags")

encode_text_dummy(df,"mqtt.protoname")

encode_text_dummy(df,"mqtt.topic")

Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

More information about Dr. Mohamed Amine Ferrag is available at:

https://www.linkedin.com/in/Mohamed-Amine-Ferrag

https://dblp.uni-trier.de/pid/142/9937.html

https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

https://www.scopus.com/authid/detail.uri?authorId=56115001200

https://publons.com/researcher/1322865/mohamed-amine-ferrag/

https://orcid.org/0000-0002-0632-3172

Last Updated: 27 Mar. 2023
Z
Perovskite Solar Cells Ageing Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ulbrich, Carolin (2023). Perovskite Solar Cells Ageing Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8185882
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
Ulbrich, Carolin
Khenkin, Mark
Schlatmann, Rutger
Köbler, Hans
Hartono, Noor Titan Putri
Graniero, Paolo
Abate, Antonio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains cleaned 2,245 ageing test traces (time vs. MPPT PCE/ maximum power point tracking power conversion efficiency) for perovskite solar cells with various device stacks and architectures in the pickle (.pkl) format.

The dataset can be loaded with the following commands on Python.

import pickle5 as pickle import pandas as pd import numpy as np

with open('20230303_mySeriesDrop.pkl', "rb") as fh: mySeriesDrop = pickle.load(fh)

The following command can be used to call a specific row (row 0) within the dataset.

mySeriesDrop[0]

The next steps to use the dataset is using scaling/ normalisation (for instance using sklearn.preprocessing.MaxAbsScaler) and smoothing (for instance using Savitzky-Golay filter).

The code to run the complete analysis, including self-organising map clustering, can be accessed here: https://doi.org/10.5281/zenodo.8181602.
Spatial distribution of particulate matter, collected using low cost...
zenodo.org
bin
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janani Venkatraman Jagatha; Janani Venkatraman Jagatha; Christoph Schneider; Christoph Schneider; Sebastian Schubert; Luxi Jin; Sebastian Schubert; Luxi Jin (2025). Spatial distribution of particulate matter, collected using low cost sensors, in Downtown-Singapore [Dataset]. http://doi.org/10.5281/zenodo.14280847
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14280847
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Janani Venkatraman Jagatha; Janani Venkatraman Jagatha; Christoph Schneider; Christoph Schneider; Sebastian Schubert; Luxi Jin; Sebastian Schubert; Luxi Jin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Singapore
Description
The dataset consists of particulate matter concentration and meteorology data, measured in Singapore, Chinatown, and Central business district from March 13, 2018, to March 16, 2018. The data collectors walked from the Outram district - Chinatown to the Central Business District in Singapore. The measurements were carried out using a hand-held air quality sensor ensemble (URBMOBI 3.0).

The dataset contains information from two URBMOBI 3.0 devices and one reference-grade device (Grimm 1.109). The data from the sensors and Grimm are denoted by the subscript, 's1', 's2', and 'gr', respectively.

singapore_all_pm_25.geojson : The observed PM concentration and meteorology, aggregated using a 25 m buffer around the measurement points.

Information on working with geojson file can be found under GeoJSON .

Units:
PM : µg/m³
Scaled_PM_MM : Dimensionless entity scaled using Min-Max-Scaler (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)
Scaled_PM_SS : Dimensionless entity scaled using Standard-Scaler (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
Air temperature: °C
Relative humidity: %

The measurements are part of the "Effects of heavy precipitation events on near-surface climate and particulate matter concentrations in Singapore". It is funded by the support from Humboldt-Universität zu Berlin for seed funding for collaborative projects between National University of Singapore and Humboldt-Universität zu Berlin.
h
Dataset-EfficientDrivingTimeDeterminationSystem
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACHMAD AKBAR (2025). Dataset-EfficientDrivingTimeDeterminationSystem [Dataset]. https://huggingface.co/datasets/jellysquish/Dataset-EfficientDrivingTimeDeterminationSystem
Explore at:
Dataset updated
Jun 21, 2025
Authors
ACHMAD AKBAR
Description
import re import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.preprocessing import LabelEncoder from google.colab import drive from sklearn.tree import export_text from sklearn.metrics import accuracy_score

1. Mount Google Drive

drive.mount('/content/drive')

2. Baca file Excel

file_path = '/content/drive/MyDrive/Colab Notebooks/AI_GACOR_Cleaned.xlsx' data = pd.read_excel(file_path)

3. Encode kolom 'Hari'

label_encoder_hari =… See the full description on the dataset page: https://huggingface.co/datasets/jellysquish/Dataset-EfficientDrivingTimeDeterminationSystem.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hui Zhang; Hui Zhang (2024). 1200 pixels spectral datasets [Dataset]. http://doi.org/10.5281/zenodo.11082600

1200 pixels spectral datasets

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11082600

Dataset updated

May 21, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Hui Zhang; Hui Zhang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In the Zip, spectral. npy was the average spectral data of red ginseng, mycotoxins and interference impurities, and label. npy was the corresponding label. Spectral data format was [1200,510] and label data format was [1200,1]. The example of data usage (sklearn in Python database was used to establish the classification model) was as follows:

import numpy as np
from sklearn. model_selection import train_test_split
from sklearn. preprocessing import StandardScaler
from sklearn. neighbors import KNeighborsClassifier
from sklearn. metrics import classification_report, accuracy_score

# Load spectral data and labels
x = np.load('.../spectral.npy')[:,1:-1]
y = np.load('.../label.npy')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Data standardization
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train the KNN model
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model. fit(x_train, y_train)

# Predict
y_pred = knn_model.predict(x_test)

# Print classification reports and accuracy rates
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score:")
print(accuracy_score(y_test, y_pred))

Clear search

Close search

Google apps

Main menu

1200 pixels spectral datasets

Household Energy Consumption

🏡 Household Energy Consumption - April 2025 (90,000 Records)

📌 Overview

📂 Dataset Summary

📚 Libraries Used for Working with household_energy_consumption_2025.csv

🔍 1. Data Manipulation & Analysis

📊 2. Data Visualization

📈 3. Machine Learning / Modeling

🧹 4. Data Preprocessing

🧪 5. Model Evaluation

📈 Potential Use Cases

🔮 Forecasting & Time Series Analysis

💡 Energy Efficiency Analysis

🌡️ Climate Impact Studies

🔌 Peak Load Management

🧠 Machine Learning Projects

🛠️ Example Starter Projects

EDGE-IIOTSET Dataset

Perovskite Solar Cells Ageing Dataset

Spatial distribution of particulate matter, collected using low cost...

Dataset-EfficientDrivingTimeDeterminationSystem

1200 pixels spectral datasets