6 datasets found
  1. 1200 pixels spectral datasets

    • zenodo.org
    zip
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hui Zhang; Hui Zhang (2024). 1200 pixels spectral datasets [Dataset]. http://doi.org/10.5281/zenodo.11082600
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hui Zhang; Hui Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the Zip, spectral. npy was the average spectral data of red ginseng, mycotoxins and interference impurities, and label. npy was the corresponding label. Spectral data format was [1200,510] and label data format was [1200,1]. The example of data usage (sklearn in Python database was used to establish the classification model) was as follows:

    import numpy as np
    from sklearn. model_selection import train_test_split
    from sklearn. preprocessing import StandardScaler
    from sklearn. neighbors import KNeighborsClassifier
    from sklearn. metrics import classification_report, accuracy_score

    # Load spectral data and labels
    x = np.load('.../spectral.npy')[:,1:-1]
    y = np.load('.../label.npy')

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

    # Data standardization
    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_test = scaler.transform(x_test)

    # Train the KNN model
    knn_model = KNeighborsClassifier(n_neighbors=5)
    knn_model. fit(x_train, y_train)

    # Predict
    y_pred = knn_model.predict(x_test)

    # Print classification reports and accuracy rates
    print("Classification Report:")
    print(classification_report(y_test, y_pred))
    print("Accuracy Score:")
    print(accuracy_score(y_test, y_pred))

  2. Household Energy Consumption

    • kaggle.com
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samx_sam (2025). Household Energy Consumption [Dataset]. https://www.kaggle.com/datasets/samxsam/household-energy-consumption
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2025
    Dataset provided by
    Kaggle
    Authors
    Samx_sam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🏡 Household Energy Consumption - April 2025 (90,000 Records)

    📌 Overview

    This dataset presents detailed energy consumption records from various households over the month. With 90,000 rows and multiple features such as temperature, household size, air conditioning usage, and peak hour consumption, this dataset is perfect for performing time-series analysis, machine learning, and sustainability research.

    Column NameData Type CategoryDescription
    Household_IDCategorical (Nominal)Unique identifier for each household
    DateDatetimeThe date of the energy usage record
    Energy_Consumption_kWhNumerical (Continuous)Total energy consumed by the household in kWh
    Household_SizeNumerical (Discrete)Number of individuals living in the household
    Avg_Temperature_CNumerical (Continuous)Average daily temperature in degrees Celsius
    Has_ACCategorical (Binary)Indicates if the household has air conditioning (Yes/No)
    Peak_Hours_Usage_kWhNumerical (Continuous)Energy consumed during peak hours in kWh

    📂 Dataset Summary

    • Rows: 90,000
    • Time Range: April 1, 2025 – April 30, 2025
    • Data Granularity: Daily per household
    • Location: Simulated global coverage
    • Format: CSV (Comma-Separated Values)

    📚 Libraries Used for Working with household_energy_consumption_2025.csv

    🔍 1. Data Manipulation & Analysis

    LibraryPurpose
    pandasReading, cleaning, and transforming tabular data
    numpyNumerical operations, working with arrays

    📊 2. Data Visualization

    LibraryPurpose
    matplotlibCreating static plots (line, bar, histograms, etc.)
    seabornStatistical visualizations, heatmaps, boxplots, etc.
    plotlyInteractive charts (time series, pie, bar, scatter, etc.)

    📈 3. Machine Learning / Modeling

    LibraryPurpose
    scikit-learnPreprocessing, regression, classification, clustering
    xgboost / lightgbmGradient boosting models for better accuracy

    🧹 4. Data Preprocessing

    LibraryPurpose
    sklearn.preprocessingEncoding categorical features, scaling, normalization
    datetime / pandasDate-time conversion and manipulation

    🧪 5. Model Evaluation

    LibraryPurpose
    sklearn.metricsAccuracy, MAE, RMSE, R² score, confusion matrix, etc.

    ✅ These libraries provide a complete toolkit for performing data analysis, modeling, and visualization tasks efficiently.

    📈 Potential Use Cases

    This dataset is ideal for a wide variety of analytics and machine learning projects:

    🔮 Forecasting & Time Series Analysis

    • Predict future household energy consumption based on previous trends and weather conditions.
    • Identify seasonal and daily consumption patterns.

    💡 Energy Efficiency Analysis

    • Analyze differences in energy consumption between households with and without air conditioning.
    • Compare energy usage efficiency across varying household sizes.

    🌡️ Climate Impact Studies

    • Investigate how temperature affects electricity usage across households.
    • Model the potential impact of climate change on residential energy demand.

    🔌 Peak Load Management

    • Build models to predict and manage energy demand during peak hours.
    • Support research on smart grid technologies and dynamic pricing.

    🧠 Machine Learning Projects

    • Supervised learning (regression/classification) to predict energy consumption.
    • Clustering households by usage patterns for targeted energy programs.
    • Anomaly detection in energy usage for fault detection.

    🛠️ Example Starter Projects

    • Time-series forecasting using Facebook Prophet or ARIMA
    • Regression modeling using XGBoost or LightGBM
    • Classification of AC vs. non-AC household behavior
    • Energy-saving recommendation systems
    • Heatmaps of temperature vs. energy usage
  3. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  4. Z

    Perovskite Solar Cells Ageing Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulbrich, Carolin (2023). Perovskite Solar Cells Ageing Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8185882
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    Ulbrich, Carolin
    Khenkin, Mark
    Schlatmann, Rutger
    Köbler, Hans
    Hartono, Noor Titan Putri
    Graniero, Paolo
    Abate, Antonio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains cleaned 2,245 ageing test traces (time vs. MPPT PCE/ maximum power point tracking power conversion efficiency) for perovskite solar cells with various device stacks and architectures in the pickle (.pkl) format.

    The dataset can be loaded with the following commands on Python.

    import pickle5 as pickle import pandas as pd import numpy as np

    with open('20230303_mySeriesDrop.pkl', "rb") as fh: mySeriesDrop = pickle.load(fh)

    The following command can be used to call a specific row (row 0) within the dataset.

    mySeriesDrop[0]

    The next steps to use the dataset is using scaling/ normalisation (for instance using sklearn.preprocessing.MaxAbsScaler) and smoothing (for instance using Savitzky-Golay filter).

    The code to run the complete analysis, including self-organising map clustering, can be accessed here: https://doi.org/10.5281/zenodo.8181602.

  5. Spatial distribution of particulate matter, collected using low cost...

    • zenodo.org
    bin
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janani Venkatraman Jagatha; Janani Venkatraman Jagatha; Christoph Schneider; Christoph Schneider; Sebastian Schubert; Luxi Jin; Sebastian Schubert; Luxi Jin (2025). Spatial distribution of particulate matter, collected using low cost sensors, in Downtown-Singapore [Dataset]. http://doi.org/10.5281/zenodo.14280847
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janani Venkatraman Jagatha; Janani Venkatraman Jagatha; Christoph Schneider; Christoph Schneider; Sebastian Schubert; Luxi Jin; Sebastian Schubert; Luxi Jin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Singapore
    Description

    The dataset consists of particulate matter concentration and meteorology data, measured in Singapore, Chinatown, and Central business district from March 13, 2018, to March 16, 2018. The data collectors walked from the Outram district - Chinatown to the Central Business District in Singapore. The measurements were carried out using a hand-held air quality sensor ensemble (URBMOBI 3.0).

    The dataset contains information from two URBMOBI 3.0 devices and one reference-grade device (Grimm 1.109). The data from the sensors and Grimm are denoted by the subscript, 's1', 's2', and 'gr', respectively.

    singapore_all_pm_25.geojson : The observed PM concentration and meteorology, aggregated using a 25 m buffer around the measurement points.

    Information on working with geojson file can be found under GeoJSON .

    Units:
    PM : µg/m³
    Scaled_PM_MM : Dimensionless entity scaled using Min-Max-Scaler (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)
    Scaled_PM_SS : Dimensionless entity scaled using Standard-Scaler (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
    Air temperature: °C
    Relative humidity: %

    The measurements are part of the "Effects of heavy precipitation events on near-surface climate and particulate matter concentrations in Singapore". It is funded by the support from Humboldt-Universität zu Berlin for seed funding for collaborative projects between National University of Singapore and Humboldt-Universität zu Berlin.

  6. h

    Dataset-EfficientDrivingTimeDeterminationSystem

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACHMAD AKBAR (2025). Dataset-EfficientDrivingTimeDeterminationSystem [Dataset]. https://huggingface.co/datasets/jellysquish/Dataset-EfficientDrivingTimeDeterminationSystem
    Explore at:
    Dataset updated
    Jun 21, 2025
    Authors
    ACHMAD AKBAR
    Description

    import re import pandas as pd from sklearn.tree import DecisionTreeClassifier from sklearn.preprocessing import LabelEncoder from google.colab import drive from sklearn.tree import export_text from sklearn.metrics import accuracy_score

      1. Mount Google Drive
    

    drive.mount('/content/drive')

      2. Baca file Excel
    

    file_path = '/content/drive/MyDrive/Colab Notebooks/AI_GACOR_Cleaned.xlsx' data = pd.read_excel(file_path)

      3. Encode kolom 'Hari'
    

    label_encoder_hari =… See the full description on the dataset page: https://huggingface.co/datasets/jellysquish/Dataset-EfficientDrivingTimeDeterminationSystem.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hui Zhang; Hui Zhang (2024). 1200 pixels spectral datasets [Dataset]. http://doi.org/10.5281/zenodo.11082600
Organization logo

1200 pixels spectral datasets

Explore at:
zipAvailable download formats
Dataset updated
May 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hui Zhang; Hui Zhang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In the Zip, spectral. npy was the average spectral data of red ginseng, mycotoxins and interference impurities, and label. npy was the corresponding label. Spectral data format was [1200,510] and label data format was [1200,1]. The example of data usage (sklearn in Python database was used to establish the classification model) was as follows:

import numpy as np
from sklearn. model_selection import train_test_split
from sklearn. preprocessing import StandardScaler
from sklearn. neighbors import KNeighborsClassifier
from sklearn. metrics import classification_report, accuracy_score

# Load spectral data and labels
x = np.load('.../spectral.npy')[:,1:-1]
y = np.load('.../label.npy')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Data standardization
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Train the KNN model
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model. fit(x_train, y_train)

# Predict
y_pred = knn_model.predict(x_test)

# Print classification reports and accuracy rates
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Accuracy Score:")
print(accuracy_score(y_test, y_pred))

Search
Clear search
Close search
Google apps
Main menu