84 datasets found

US Consumer Complaints Against Businesses
kaggle.com
zip
Updated Oct 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019
Explore at:
zip(343188956 bytes)Available download formats
Dataset updated
Oct 9, 2022
Authors
Jeffery Mandrake
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
2,121,458 records

I used Google Colab to check out this dataset and pull the column names using Pandas.

Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

I did not modify the dataset.

Use it to practice with dataframes - Pandas or PySpark on Google Colab:

!unzip complaints.csv.zip

import pandas as pd df = pd.read_csv('complaints.csv') df.columns

df.head() etc.
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
zenodo.org
data.niaid.nih.gov
bin, csv, zip
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6965147
Dataset updated
Dec 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
Z
Linux Kernel binary size
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jun 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugo MARTIN; Mathieu ACHER (2021). Linux Kernel binary size [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4943883
Explore at:
Dataset updated
Jun 14, 2021
Dataset provided by
IRISA
Authors
Hugo MARTIN; Mathieu ACHER
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset containing measurements of Linux Kernel binary size after compilation. The reported size, in the column "perf", is the size in bytes of the vmlinux file. In contains also a column "active_options" reporting the number of activated options (set at "y"). All other columns, the list being reported in the file "Linux_options.json", are Linux kernel options. The sampling have been made using randconfig. The version of Linux used is 4.13.3.

Not all available options are present. First, it only contains options about the x86 and 64 bits version. Then, all non-tristate options have been ignored. Finally, options not having multiple value through the whole dataset, due to not enough variability in the sampling, are ignored. All options are encoded as 0 for "n" and "m" options value, and 1 for "y".

In python, importing the dataset using pandas will attribute all columns to int64, which will lead to a great consumption of memory (~50GB). We provide this way to import it using less than 1 GB of memory by setting options columns to int8.

import pandas as pd import json import numpy

with open("Linux_options.json","r") as f: linux_options = json.load(f)

Load csv by setting options as int8 to save a lot of memory

return pd.read_csv("Linux.csv", dtype={f:numpy.int8 for f in linux_options})
Merge number of excel file,convert into csv file
kaggle.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.

Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.

File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

Python

Pandas

Project Implementation:

DataFrame Creation:

Import the Pandas library.

Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.

Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).

Data Manipulation:

Add new columns to the DataFrame representing derived data or computations based on existing columns.

Filter the DataFrame to include only specific rows based on certain conditions.

Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.

File Conversion:

Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.

Convert the DataFrame into a CSV (.csv) file using the to_csv() function.

Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
R
Evaluation results of a knee distraction unloader brace on a robotic test...
entrepot.recherche.data.gouv.fr
bin, pdf +1
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lea Boillereaux; Lea Boillereaux; Simon Le Floc'h; Simon Le Floc'h; Franck Jourdan; Franck Jourdan; Arnaud Tanguy; Gille Camp; Gille Camp; Anaïs Vaysse; Gilles Dusfour; Gilles Dusfour; Abderrahmane Kheddar; Abderrahmane Kheddar; Arnaud Tanguy; Anaïs Vaysse (2025). Evaluation results of a knee distraction unloader brace on a robotic test bench – CSV files of the data obtained from the test bench – Model of the bones – Model of the tested cam of the distraction brace [Dataset]. http://doi.org/10.57745/VCALE0
Explore at:
text/comma-separated-values(36000614), text/comma-separated-values(36241909), text/comma-separated-values(36286698), text/comma-separated-values(36159297), bin(4615519), bin(14218738), text/comma-separated-values(36103576), text/comma-separated-values(37097577), bin(2350527), text/comma-separated-values(36014331), bin(4807364), text/comma-separated-values(36279419), text/comma-separated-values(35882565), pdf(93247), bin(2377437), text/comma-separated-values(36068067), bin(5916146), bin(2482876), text/comma-separated-values(36424785), bin(6588074), text/comma-separated-values(36224058), text/comma-separated-values(36024776), bin(9872504), text/comma-separated-values(36369481), text/comma-separated-values(36353970), pdf(36547), text/comma-separated-values(36507716), text/comma-separated-values(36462611), bin(96967), bin(78659), text/comma-separated-values(36124222), text/comma-separated-values(37042098), text/comma-separated-values(36136856), text/comma-separated-values(36391516), text/comma-separated-values(36913802), text/comma-separated-values(36596530), bin(7450705), text/comma-separated-values(36107973), text/comma-separated-values(36907525), text/comma-separated-values(12156782), text/comma-separated-values(36091853), text/comma-separated-values(36307863), text/comma-separated-values(37583326), bin(15096627), text/comma-separated-values(37182113), bin(8016924), text/comma-separated-values(36347087), text/comma-separated-values(36450906), text/comma-separated-values(36562118), text/comma-separated-values(36137120), text/comma-separated-values(36238235), text/comma-separated-values(35987625), text/comma-separated-values(36089715), bin(45229806), bin(3162170), text/comma-separated-values(36628692), text/comma-separated-values(36231435), text/comma-separated-values(35946191), text/comma-separated-values(36226364), bin(12725469), text/comma-separated-values(35858211), bin(15181492), text/comma-separated-values(36174476), bin(100215316), pdf(74165), text/comma-separated-values(36967243), bin(4917993), text/comma-separated-values(36255041)Available download formats
Unique identifier
https://doi.org/10.57745/VCALE0
Dataset updated
Sep 2, 2025
Dataset provided by
Recherche Data Gouv
Authors
Lea Boillereaux; Lea Boillereaux; Simon Le Floc'h; Simon Le Floc'h; Franck Jourdan; Franck Jourdan; Arnaud Tanguy; Gille Camp; Gille Camp; Anaïs Vaysse; Gilles Dusfour; Gilles Dusfour; Abderrahmane Kheddar; Abderrahmane Kheddar; Arnaud Tanguy; Anaïs Vaysse
License
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.57745/VCALE0https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.57745/VCALE0
Time period covered
Oct 1, 2021 - Aug 31, 2025
Description
Dataset Description Dataset Description This dataset is associated with the publication titled "A Distraction Knee-Brace and a Robotic Testbed for Tibiofemoral Load Reduction during Squatting" in IEEE Transactions on Medical Robotics and Bionics. It provides comprehensive data supporting the development and evaluation of a knee distraction brace designed to reduce tibiofemoral contact forces during flexion. Contents Cam Profiles STL files of the initial cam profiles designed based on averaged tibiofemoral contact force data collected from 5 squats of a patient with an instrumented prosthesis (K7L) from the CAMS Knee dataset (accessible via https://orthoload.com/). Optimized cam profiles, corrected based on experimental results, are also included. These profiles enable patient-specific adjustments to account for the non-linear evolution of tibiofemoral contact forces with flexion angles. Experimental Results CSV files containing raw results from robotic testbed experiments, testing the knee brace under various initial pneumatic pressures in the actuators. Data is provided for tests conducted: Without the brace, With the initial cam profiles, With the optimized cam profiles. Each CSV file corresponds to a specific test condition, detailing forces and kinematics observed during squatting. 3D Models of Bones and Testbed Components Geometries of the femur head and tibial plateau used in the robotic testbed experiments, provided in STEP, STL, and SLDPRT/SLDASM formats. A README file describes the biomechanical coordinate systems used for: Force and kinematic control of the robotic testbed, Result interpretation and visualization. How to Open and Read the Provided Files The dataset includes files in CSV, SLDPRT, SLDASM, STL, and IGES formats. Below are recommended software solutions, with a preference for open-source options: CSV (Comma-Separated Values): Can be opened with Microsoft Excel, Google Sheets, or open-source software like LibreOffice Calc or Python (using pandas). SLDPRT & SLDASM (SolidWorks Parts and Assemblies): These files are native to SolidWorks. For viewing without SolidWorks, use eDrawings Viewer (free) or FreeCAD (limited compatibility). STL (3D Model Format): Can be opened with MeshLab, FreeCAD, or Blender. Most 3D printing software (like Cura or PrusaSlicer) also support STL. IGES (3D CAD Exchange Format): Can be read with FreeCAD, Fusion 360 (free for personal use), or OpenCascade-based software like CAD Assistant. For full compatibility, commercial software like SolidWorks or CATIA may be required for SLDPRT and SLDASM files. However, FreeCAD and other open-source tools provide partial support. See the associated publication and the README files included in the dataset for more information.
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:

The Device Activity Report with Complete Knowledge (DARCK) for NILM

zenodo.org

bin, xz

Updated Sep 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymous Anonymous; Anonymous Anonymous (2025). The Device Activity Report with Complete Knowledge (DARCK) for NILM [Dataset]. http://doi.org/10.5281/zenodo.17159850

Explore at:

bin, xzAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.17159850

Dataset updated

Sep 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonymous Anonymous; Anonymous Anonymous

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1. Abstract

This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

2. Dataset Overview

Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
Aggregate Meter: eBZ DD3
Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
Sampling Rate: 1 Hz
Measured Quantity: Active Power
Unit of Measurement: Watt
Duration: 6 months
Format: Single CSV file (`DARCK.csv`)
Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

3. Download and Usage

The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

As it contains longer off periods with zeros, the CSV file is nicely compressible.

To extract it use: xz -d DARCK.csv.xz.
The compression leads to a 97% smaller file size (From 4GB to 90.9MB).

To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

python
import pandas as pd

df = pd.read_csv("DARCK.csv", parse_dates=["time"])

4. Measurement Setup

The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

5. File Format (`DARCK.csv`)

The dataset is provided as a single comma-separated value (CSV) file.

The first row is a header containing the column names.
All power values are rounded to the first decimal place.
There are no missing values in the final dataset.
Each row represents 1 second, from start of measuring in March until the end in September.

Column Descriptions

Column Name	Data Type	Unit	Description
`time`	datetime	-	Timestamp for the reading in `YYYY-MM-DD HH:MM:SS`
`main`	float	Watt	Total aggregate power consumption for the apartment, measured at the main electrical panel.
`[appliance_name]`	float	Watt	Power consumption of an individual appliance (e.g., `lightbathroom`, `fridge`, `sherlockpc`). See Section 8 for a full list.
Aggregate Columns
`aggr_chargers`	float	Watt	The sum of `sherlockcharger`, `sherlocklaptop`, `watsoncharger`, `watsonlaptop`, `watsonipadcharger`, `kitchencharger`.
`aggr_stoveplates`	float	Watt	The sum of `stoveplatel1` and `stoveplatel2`.
`aggr_lights`	float	Watt	The sum of `lightbathroom`, `lighthallway`, `lightsherlock`, `lightkitchen`, `lightlivingroom`, `lightwatson`, `lightstoreroom`, `fcob`, `sherlockalarmclocklight`, `sherlockfloorlamphue`, `sherlockledstrip`, `livingfloorlamphue`, `sherlockglobe`, `watsonfloorlamp`, `watsondesklamp` and `watsonledmap`.
Analysis Columns
`inaccuracy`	float	Watt	As no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

6. Data Postprocessing Pipeline

The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

6.1. Main Meter (`main`) Postprocessing

The aggregate power data required several cleaning steps to ensure accuracy.

Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

6.2. Sub-metered Devices (`shellies`) Postprocessing

The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

Grouping: Data was grouped by the unique device identifier.
Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill().
This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

6.3. Merging and Finalization

Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

7. Manual Corrections and Known Data Issues

During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

8. Appliance Details and Multipurpose Plugs

The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance.

Ecommerce Dataset (Products & Sizes Included)
kaggle.com
zip
Updated Nov 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anvit kumar (2025). Ecommerce Dataset (Products & Sizes Included) [Dataset]. https://www.kaggle.com/datasets/anvitkumar/shopping-dataset
Explore at:
zip(1274856 bytes)Available download formats
Dataset updated
Nov 13, 2025
Authors
Anvit kumar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📦 Ecommerce Dataset (Products & Sizes Included)

🛍️ Essential Data for Building an Ecommerce Website & Analyzing Online Shopping Trends 📌 Overview This dataset contains 1,000+ ecommerce products, including detailed information on pricing, ratings, product specifications, seller details, and more. It is designed to help data scientists, developers, and analysts build product recommendation systems, price prediction models, and sentiment analysis tools.

🔹 Dataset Features

Column Name Description product_id Unique identifier for the product title Product name/title product_description Detailed product description rating Average customer rating (0-5) ratings_count Number of ratings received initial_price Original product price discount Discount percentage (%) final_price Discounted price currency Currency of the price (e.g., USD, INR) images URL(s) of product images delivery_options Available delivery methods (e.g., standard, express) product_details Additional product attributes breadcrumbs Category path (e.g., Electronics > Smartphones) product_specifications Technical specifications of the product amount_of_stars Distribution of star ratings (1-5 stars) what_customers_said Customer reviews (sentiments) seller_name Name of the product seller sizes Available sizes (for clothing, shoes, etc.) videos Product video links (if available) seller_information Seller details, such as location and rating variations Different variants of the product (e.g., color, size) best_offer Best available deal for the product more_offers Other available deals/offers category Product category

📊 Potential Use Cases

📌 Build an Ecommerce Website: Use this dataset to design a functional online store with product listings, filtering, and sorting. 🔍 Price Prediction Models: Predict product prices based on features like ratings, category, and discount. 🎯 Recommendation Systems: Suggest products based on user preferences, rating trends, and customer feedback. 🗣 Sentiment Analysis: Analyze what_customers_said to understand customer satisfaction and product popularity. 📈 Market & Competitor Analysis: Track pricing trends, popular categories, and seller performance. 🔍 Why Use This Dataset? ✅ Rich Feature Set: Includes all necessary ecommerce attributes. ✅ Realistic Pricing & Rating Data: Useful for price analysis and recommendations. ✅ Multi-Purpose: Suitable for machine learning, web development, and data visualization. ✅ Structured Format: Easy-to-use CSV format for quick integration.

📂 Dataset Format CSV file (ecommerce_dataset.csv) 1000+ samples Multi-category coverage 🔗 How to Use? Download the dataset from Kaggle. Load it in Python using Pandas: python Copy Edit import pandas as pd
df = pd.read_csv("ecommerce_dataset.csv")
df.head() Explore trends & patterns using visualization tools (Seaborn, Matplotlib). Build models & applications based on the dataset!
VANET-IRAQ-BSM-Attacks
zenodo.org
application/gzip
Updated Sep 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mohammad abbas alkifaee; mohammad abbas alkifaee; Fahad Ghalib Abdulkadhim; Fahad Ghalib Abdulkadhim (2025). VANET-IRAQ-BSM-Attacks [Dataset]. http://doi.org/10.5281/zenodo.17167970
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17167970
Dataset updated
Sep 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
mohammad abbas alkifaee; mohammad abbas alkifaee; Fahad Ghalib Abdulkadhim; Fahad Ghalib Abdulkadhim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 21, 2025
Area covered
Iraq
Description
# Baghdad VANET BSM-Based

Attack Dataset (F2MD Scenarios) — Raw CSVs (File-level by Attack ID)

🆄🅽🅸🆅🅴🆁🆂🅸🆃🆈 🅾🅵 🅺🆄🅵🅰

ʙʏ Mᴏʜᴀᴍᴍᴀᴅ Aʙʙᴀs Sʜᴀʀᴇᴇғ & Dʀ.Fᴀʜᴀᴅ Gʜᴀʟɪʙ

## Overview
This package groups the original raw CSV files by attack type at the file level. No content changes were made to the CSVs—files are copied as-is into family folders.

## Dataset Summary
Total files: 17

Total records (all files): 35830975

Attack records (label=1): 2359250

Benign records (label=0): 33471725

Other/unknown labels: 0

## Attack Families
- ConstPos — Frozen constant position
- ConstPosOffset — Constant offset to coordinates
- RandomPos — Random fake positions
- RandomPosOffset — Random offset to the true position
- ConstSpeedOffset — Constant speed bias
- RandomSpeed — Random implausible speeds
- EventualStop — Gradual or sudden stop spoofing
- Disruptive — Protocol fields/values deliberately disruptive
- DataReplay — Replay of past data
- StaleMessages — Old or delayed messages
- DoS — High-rate flooding
- DoSRandom — Randomly fluctuating flooding
- DoSDisruptive — Intermittent aggressive flooding
- GridSybil — Coordinated fake identities (Sybil)
- DoSRandomSybil — Random DoS with Sybil identities
- DoSDisruptiveSybil — Aggressive DoS with Sybil identities
- Unknown — Files not mapped to a specific family

## Per-family Totals
| Family | Files | Records | Attack (label=1) | Benign (label=0) | Other |
|:--|--:|--:|--:|--:|--:|
| ConstPos | 1 | 1327217 | 63504 | 1263713 | 0 |
| ConstPosOffset | 1 | 1305206 | 62749 | 1242457 | 0 |
| ConstSpeedOffset | 1 | 2150356 | 102707 | 2047649 | 0 |
| DataReplay | 1 | 922481 | 43916 | 878565 | 0 |
| Disruptive | 1 | 1063416 | 52038 | 1011378 | 0 |
| DoS | 1 | 1241649 | 167490 | 1074159 | 0 |
| DoSDisruptive | 1 | 705817 | 97649 | 608168 | 0 |
| DoSDisruptiveSybil | 1 | 2113005 | 24365 | 2088640 | 0 |
| DoSRandom | 1 | 6440583 | 867983 | 5572600 | 0 |
| DoSRandomSybil | 1 | 2499578 | 30382 | 2469196 | 0 |
| EventualStop | 1 | 2124546 | 101617 | 2022929 | 0 |
| GridSybil | 1 | 622012 | 108728 | 513284 | 0 |
| RandomPos | 1 | 1087145 | 53253 | 1033892 | 0 |
| RandomPosOffset | 1 | 3258131 | 158686 | 3099445 | 0 |
| RandomSpeed | 1 | 3676823 | 176305 | 3500518 | 0 |
| StaleMessages | 1 | 770026 | 36645 | 733381 | 0 |
| Unknown | 1 | 4522984 | 211233 | 4311751 | 0 |

## Per-file Details
| Family | File | Records | Attack (1) | Benign (0) | Other | Label column present |
|:--|:--|--:|--:|--:|--:|:--:|
| ConstPos | 1.csv | 1327217 | 63504 | 1263713 | 0 | yes |
| ConstPosOffset | 2.csv | 1305206 | 62749 | 1242457 | 0 | yes |
| ConstSpeedOffset | 6.csv | 2150356 | 102707 | 2047649 | 0 | yes |
| DataReplay | 11.csv | 922481 | 43916 | 878565 | 0 | yes |
| Disruptive | 10.csv | 1063416 | 52038 | 1011378 | 0 | yes |
| DoS | 13.csv | 1241649 | 167490 | 1074159 | 0 | yes |
| DoSDisruptive | 15.csv | 705817 | 97649 | 608168 | 0 | yes |
| DoSDisruptiveSybil | 19.csv | 2113005 | 24365 | 2088640 | 0 | yes |
| DoSRandom | 14.csv | 6440583 | 867983 | 5572600 | 0 | yes |
| DoSRandomSybil | 18.csv | 2499578 | 30382 | 2469196 | 0 | yes |
| EventualStop | 9.csv | 2124546 | 101617 | 2022929 | 0 | yes |
| GridSybil | 16.csv | 622012 | 108728 | 513284 | 0 | yes |
| RandomPos | 3.csv | 1087145 | 53253 | 1033892 | 0 | yes |
| RandomPosOffset | 4.csv | 3258131 | 158686 | 3099445 | 0 | yes |
| RandomSpeed | 7.csv | 3676823 | 176305 | 3500518 | 0 | yes |
| StaleMessages | 12.csv | 770026 | 36645 | 733381 | 0 | yes |
| Unknown | 5.csv | 4522984 | 211233 | 4311751 | 0 | yes |

## How to Load (Python)
Use pandas to read any CSV under data/
Z
3D skeletons UP-Fall Dataset
data.niaid.nih.gov
Updated Jul 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KOFFI, Tresor (2024). 3D skeletons UP-Fall Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12773012
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
CESI LINEACT
Authors
KOFFI, Tresor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
3D skeletons UP-Fall Dataset

Different between Fall and Impact detection

Overview

This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios

Data Collection

The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.

CSV Structure

The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:

C: Camera (1 or 2)

S: Subject (1 to 5)

A: Activity (1 to N, representing different activities)

T: Trial (1 to 3)

subject1/`: Contains CSV files for Subject 1.

C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1

C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1

C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1

C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1

C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1

C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1

subject2/`: Contains CSV files for Subject 2.

C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2

C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2

C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2

C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2

C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2

C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2

subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.

Column Descriptions

Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:

Column Name

Description

joint_1_x

X coordinate of joint 1

joint_1_y

Y coordinate of joint 1

joint_1_z

Z coordinate of joint 1

joint_2_x

X coordinate of joint 2

joint_2_y

Y coordinate of joint 2

joint_2_z

Z coordinate of joint 2

...

...

joint_n_x

X coordinate of joint n

joint_n_y

Y coordinate of joint n

joint_n_z

Z coordinate of joint n

LABEL

Label indicating impact (1) or non-impact (0)

Example

Here is an example of what a row in one of the CSV files might look like:

joint_1_x

joint_1_y

joint_1_z

joint_2_x

joint_2_y

joint_2_z

...

joint_n_x

joint_n_y

joint_n_33

LABEL

0.123

0.456

0.789

0.234

0.567

0.890

...

0.345

0.678

0.901

0

Usage

This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.

Using github

Clone the repository:

-bash git clone

https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset

Navigate to the directory:

-bash -cd 3D_skeletons-UP-Fall-Dataset

Examples

Here's a simple example of how to load and inspect a sample data file using Python:```pythonimport pandas as pd

Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

data = pd.read_csv('subject1/C1S1A1T1.csv')print(data.head())
NYC Jobs Dataset (Filtered Columns)
kaggle.com
zip
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). NYC Jobs Dataset (Filtered Columns) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/nyc-jobs-filtered-cols
Explore at:
zip(93408 bytes)Available download formats
Dataset updated
Oct 5, 2022
Authors
Jeffery Mandrake
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
New York
Description
Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial

The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data

I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing

Once the csv file is uploaded to Google Colab, use these commands to process the file.

import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)
Compare Baseball Player Statistics using Visualiza
kaggle.com
zip
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
Explore at:
zip(1030978 bytes)Available download formats
Dataset updated
Sep 28, 2024
Authors
Abdelaziz Sami
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

1. Load the Data

First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

2. Explore the Data

Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

3. Visualization

We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

Example Code

Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load the data df = pd.read_csv('judge.csv') # Display the first few rows of the dataframe print(df.head()) # Set the style of seaborn sns.set(style="whitegrid") # 1. Average Release Speed by Pitch Type plt.figure(figsize=(12, 6)) avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values() sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis") plt.title('Average Release Speed by Pitch Type') plt.xlabel('Average Release Speed (mph)') plt.ylabel('Pitch Type') plt.show() # 2. Trends in Release Speed Over Time # First, convert the 'game_date' to datetime df['game_date'] = pd.to_datetime(df['game_date']) plt.figure(figsize=(14, 7)) sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None) plt.title('Trends in Release Speed Over Time') plt.xlabel('Game Date') plt.ylabel('Average Release Speed (mph)') plt.xticks(rotation=45) plt.tight_layout() plt.show() # 3. Scatter Plot of Release Speed vs. Events plt.figure(figsize=(12, 6)) sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7) plt.title('Release Speed vs. Events') plt.xlabel('Release Speed (mph)') plt.ylabel('Event Type') plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left') plt.show()

Explanation of the Code

Data Loading: The CSV file is loaded into a Pandas DataFrame.

Average Release Speed: A bar chart shows the average release speed for each pitch type.

Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.

Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

Conclusion

These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!
Pre-Processed Power Grid Frequency Time Series
zenodo.org
bin, zip
Updated Jul 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut (2021). Pre-Processed Power Grid Frequency Time Series [Dataset]. http://doi.org/10.5281/zenodo.3744121
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3744121
Dataset updated
Jul 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut
Description
Overview
This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:

Continental Europe

Great Britain

Nordic

This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.

Data sources
We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).

Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].

Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].

Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].

Content of the repository

A) Scripts

In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.

In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).

In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).

The python scripts run with Python 3.7 and with the packages found in "requirements.txt".

B) Data_converted and Data_cleansed
The folder "Data_converted" contains the output of "convert_data_format.py" and "Data_cleansed" contains the output of "clean_corrupted_data.py".

File type: The files are zipped csv-files, where each file comprises one year.

Data format: The files contain two columns. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The second column contains the frequency values in Hz.

NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.

Use cases
We point out that this repository can be used in two different was:

Use pre-processed data: You can directly use the converted or the cleansed data. Note however that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much. If your application cannot deal with NaNs, you could build upon the following commands to select the longest interval of valid data from the cleansed data:

from helper_functions import * import pandas as pd cleansed_data = pd.read_csv('/Path_to_cleansed_data/data.zip', index_col=0, header=None, squeeze=True, parse_dates=[0]) valid_bounds, valid_sizes = true_intervals(~cleansed_data.isnull()) start,end= valid_bounds[ np.argmax(valid_sizes) ] data_without_nan = cleansed_data.iloc[start:end]

Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "Data_converted".

License
We release the code in the folder "Scripts" under the MIT license [8]. In the case of Nationalgrid and Fingrid, we further release the pre-processed data in the folder "Data_converted" and "Data_cleansed" under the CC-BY 4.0 license [7]. TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.
Brain Tumor CSV
kaggle.com
zip
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Nath (2024). Brain Tumor CSV [Dataset]. https://www.kaggle.com/datasets/akashnath29/brain-tumor-csv/code
Explore at:
zip(538175483 bytes)Available download formats
Dataset updated
Oct 30, 2024
Authors
Akash Nath
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This dataset provides grayscale pixel values for brain tumor MRI images, stored in a CSV format for simplified access and ease of use. The goal is to create a "MNIST-like" dataset for brain tumors, where each row in the CSV file represents the pixel values of a single image in its original resolution. This format makes it convenient for researchers and developers to quickly load and analyze MRI data for brain tumor detection, classification, and segmentation tasks without needing to handle large image files directly.

Motivation and Use Cases

Brain tumor classification and segmentation are critical tasks in medical imaging, and datasets like these are valuable for developing and testing machine learning and deep learning models. While there are several publicly available brain tumor image datasets, they often consist of large image files that can be challenging to process. This CSV-based dataset addresses that by providing a compact and accessible format. Potential use cases include: - Tumor Classification: Identifying different types of brain tumors, such as glioma, meningioma, and pituitary tumors, or distinguishing between tumor and non-tumor images. - Tumor Segmentation: Applying pixel-level classification and segmentation techniques for tumor boundary detection. - Educational and Rapid Prototyping: Ideal for educational purposes or quick experimentation without requiring large image processing capabilities.

Data Structure

This dataset is structured as a single CSV file where each row represents an image, and each column represents a grayscale pixel value. The pixel values are stored as integers ranging from 0 (black) to 255 (white).

CSV File Contents

Pixel Values: Each row contains the pixel values of a single grayscale image, flattened into a 1-dimensional array. The original image dimensions vary, and rows in the CSV will correspondingly vary in length.

Simplified Access: By using a CSV format, this dataset avoids the need for specialized image processing libraries and can be easily loaded into data analysis and machine learning frameworks like Pandas, Scikit-Learn, and TensorFlow.

How to Use This Dataset

Loading the Data: The CSV can be loaded using standard data analysis libraries, making it compatible with Python, R, and other platforms.

Data Preprocessing: Users may normalize pixel values (e.g., between 0 and 1) for deep learning applications.

Splitting Data: While this dataset does not predefine training and testing splits, users can separate rows into training, validation, and test sets.

Reshaping for Models: If needed, each row can be reshaped to the original dimensions (retrieved from the subfolder structure) to view or process as an image.

Technical Details

Image Format: Grayscale MRI images, with pixel values ranging from 0 to 255.

Resolution: Original resolution, no resizing applied.

Size: Each row’s length varies according to the original dimensions of each MRI image.

Data Type: CSV file with integer pixel values.

Acknowledgments

This dataset is intended for research and educational purposes only. Users are encouraged to cite and credit the original data sources if using this dataset in any publications or projects. This is a derived CSV version aimed to simplify access and usability for machine learning and data science applications.
d
Using HydroShare Buckets to Access Resource Files
search.dataone.org
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pabitra Dash (2025). Using HydroShare Buckets to Access Resource Files [Dataset]. https://search.dataone.org/view/sha256%3Ab25a0f5e5d62530d70ecd6a86f1bd3fa2ab804a8350dc7ba087327839fcb1fb1
Explore at:
Dataset updated
Aug 9, 2025
Dataset provided by
Hydroshare
Authors
Pabitra Dash
Description
This resource contains a draft Jupyter Notebook that has example code snippets showing how to access HydroShare resource files using HydroShare S3 buckets. The user_account.py is a utility to read user hydroshare cached account information in any of the JupyterHub instances that HydroShare has access to. The example notebook uses this utility so that you don't have to enter your hydroshare account information in order to access hydroshare buckets.

Here are the 3 notebooks in this resource:

hydroshare_s3_bucket_access_examples.ipynb:

The above notebook has examples showing how to upload/download resource files from the resource bucket. It also contains examples how to list files and folders of a resource in a bucket.

python-modules-direct-read-from-bucket/hs_bucket_access_gdal_example.ipynb:

The above notebook has examples for reading raster and shapefile from bucket using gdal without the need of downloading the file from the bucket to local disk.

python-modules-direct-read-from-bucket/hs_bucket_access_non_gdal_example.ipynb

The above notebook has examples of using h5netcdf and xarray for reading netcdf file directly from bucket. It also contains examples of using rioxarray to read raster file, and pandas to read CSV file from hydroshare buckets.
r
Dataset with four years of condition monitoring technical language...
researchdata.se
Updated Jun 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karl Löwenmark; Fredrik Sandin; Marcus Liwicki; Stephan Schnabel (2025). Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden [Dataset]. http://doi.org/10.5878/hafd-ms27
Explore at:
(74859)Available download formats
Unique identifier
https://doi.org/10.5878/hafd-ms27
Dataset updated
Jun 17, 2025
Dataset provided by
Luleå University of Technology
Authors
Karl Löwenmark; Fredrik Sandin; Marcus Liwicki; Stephan Schnabel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2018 - 2022
Area covered
Sweden
Description
This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title.

Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']
u
Data for Gradient boosted decision trees reveal nuances of auditory...
rdr.ucl.ac.uk
txt
Updated Mar 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carla Griffiths; Jennifer Bizley; Jules Lebert; Joseph Sollini (2024). Data for Gradient boosted decision trees reveal nuances of auditory discrimination behavior [Dataset]. http://doi.org/10.5522/04/25386565.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5522/04/25386565.v1
Dataset updated
Mar 22, 2024
Dataset provided by
University College London
Authors
Carla Griffiths; Jennifer Bizley; Jules Lebert; Joseph Sollini
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Raw data for the article: Gradient boosted decision trees reveal nuances of auditory discrimination behaviour (PLOS Computational Biology).This data repository contains the csv files after extraction of the raw MATLAB metadata files into pandas (Python) dataframes (helper function author: Jules Lebert). The csv files can easily be loaded back into dataframe objects using pandas before the subsampling steps (as documented in the paper, we used subsampling to ensure the number of F0-roved and control F0 trials were relatively equal) are completed.Link to GitHub repository to run the models on this data: https://github.com/carlacodes/boostmodelsA full description of each of the variables within the dataframe can be found in the data_description_instructions_for_datasets_plos_bio.pdf.Abstract: Animal psychophysics can generate rich behavioral datasets, often comprised of many 1000s of trials for an individual subject. Gradient-boosted models are a promising machine learning approach for analyzing such data, partly due to the tools that allow users to gain insight into how the model makes predictions. We trained ferrets to report a target word’s presence, timing, and lateralization within a stream of consecutively presented non-target words. To assess the animals’ ability to generalize across pitch, we manipulated the fundamental frequency (F0) of the speech stimuli across trials, and to assess the contribution of pitch to streaming, we roved the F0 from word token-to-token. We then implemented gradient-boosted regression and decision trees on the trial outcome and reaction time data to understand the behavioral factors behind the ferrets’ decision-making. We visualized model contributions by implementing SHAPs feature importance and partial dependency plots. While ferrets could accurately perform the task across all pitch-shifted conditions, our models reveal subtle effects of shifting F0 on performance, with within-trial pitch shifting elevating false alarms and extending reaction times. Our models identified a subset of non-target words that animals commonly false alarmed to. Follow-up analysis demonstrated that the spectrotemporal similarity of target and non-target words rather than similarity in duration or amplitude waveform was the strongest predictor of the likelihood of false alarming. Finally, we compared the results with those obtained with traditional mixed effects models, revealing equivalent or better performance for the gradient-boosted models over these approaches.
Z
Discharge of springs monitored during the WABEsense project
data.niaid.nih.gov
zenodo.org
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Pablo Carbajal; Housseini, Reza; Lippuner, Jeannette; Lippuner, Daniela (2023). Discharge of springs monitored during the WABEsense project [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8239367
Explore at:
Dataset updated
Dec 7, 2023
Dataset provided by
Uli Lippuner AG
OST - Eastern Switzerland University of Applied Sciences
Authors
Juan Pablo Carbajal; Housseini, Reza; Lippuner, Jeannette; Lippuner, Daniela
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Water discharge and temperature of the springs monitored during the WABEsense project (UTF 642.21.20). The data corresponding to the all field measurements performed between Feb. 2021 and Dec 2023. Data for each spring might not cover the whole period. For each spring there are two files: *.csv and *.meta. The .csv file contains the recorded data. The .meta fiel contains further information about the spring (e.g. location) and the data.

List of springs:

Bonaduz BS Paliu Fravi- Bonaduz BS Salums Friedrich- Bonaduz BS Salums Leo- Bonaduz SS Leo Friedrich- Bregaglia BS Acqua d'Balz 1- Bregaglia BS Acqua d'Balz 2- Hergiswil BS Rossmoos- Hergiswil SS Muesli- Oberriet SS Ulrika- Schiers BS Chalta Wasser- Schiers BS Grapp rechts- Susch BS Prada bella suot- Susch BS Prada bella sura- Zernez BS Sarsura- Zug SS Nidfuren More details about each spring is given in the corresponding .meta file. ## Files formats All files are plain text. The .csv files are UTF-8 encoded and separated by ";" (semi-colon) . The .meta files follow the YAML format. An example of loading the data in python using pandas is given below: import pandas as pd data = pd.read_csv("Bonaduz.BS.Paliu_Fravi_discharge.csv", encoding="utf-8", sep=";")
m
Data for: Can government transfers make energy subsidy reform socially...
data.mendeley.com
Updated Mar 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Filip Schaffitzel (2020). Data for: Can government transfers make energy subsidy reform socially acceptable? A case study on Ecuador [Dataset]. http://doi.org/10.17632/z35m76mf9g.1
Explore at:
Unique identifier
https://doi.org/10.17632/z35m76mf9g.1
Dataset updated
Mar 31, 2020
Authors
Filip Schaffitzel
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Area covered
Ecuador
Description
Estimating the distributional impacts of energy subsidy removal and compensation schemes in Ecuador based on input-output and household data.

Import files: Dictionary Categories.csv, Dictionary ENI-IOT.csv, and Dictionary Subcategories.csv based on [1] Dictionary IOT.csv and IOT_2012.csv (cannot be redistruted) based on [2] Dictionary Taxes.csv and Dictionary Transfers.csv based on [3] ENIGHUR11_GASTOS_V.csv, ENIGHUR11_HOGARES_AGREGADOS.csv, and ENIGHUR11_PERSONAS_INGRESOS.csv based on [4] Price increase scenarios.csv based on [5]

Further basic files and documents: [1] 4_M&D_Mapping ENIGHUR expenditures to IOT_180605.xlsm [2] Input-output table 2012 (https://contenido.bce.fin.ec/documentos/PublicacionesNotas/Catalogo/CuentasNacionales/Anuales/Dolares/MIP2012Ampliada.xls). Save the sheet with the IOT 2012 (Matriz simétrica) as IOT_2012.csv and edit the format: first column and row: IOT labels [3] 4_M&D_ENIGHUR income_180606.xlsx [4] ENIGHUR data can be retrieved from http://www.ecuadorencifras.gob.ec/encuesta-nacional-de-ingresos-y-gastos-de-los-hogares-urbanos-y-rurales/ Household datasets are only available in SPSS file format and the free software PSPP is used to convert .sav- to .csv-files, as this format can be read directly and efficiently into a Python Pandas DataFrame. See PSPP syntax below: save translate /outfile = filename /type = CSV /textoptions decimal = DOT /textoptions delimiter = ';' /fieldnames /cells=values /replace. [5] 3_Ecuador_Energy subsidies and 4_M&D_Price scenarios_180610.xlsx

converted json to CSV Traffy Fondue data

kaggle.com

zip

Updated Jan 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Hansen (2025). converted json to CSV Traffy Fondue data [Dataset]. https://www.kaggle.com/datasets/motethansen/converted-json-to-csv-traffy-fondue-data

Explore at:

zip(31705770 bytes)Available download formats

Dataset updated

Jan 15, 2025

Authors

Hansen

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

Traffy Fondue Data

Data pulled from Traffy Fondue, by accessing the Traffy Fondue Open API. Date January 2022 until January 2025

The following code pulled the data:


import os
import json
import requests
from datetime import datetime, timedelta
import time

class TraffyDataFetcher:
  def _init_(self, start_date, subfolder='traffyfonduedata'):
    self.url = "https://publicapi.traffy.in.th/share/teamchadchart/search"
    self.query = {'offset': '0'}
    self.payload = {}
    self.headers = {}
    self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
    self.end_date = datetime.now()
    self.subfolder = subfolder
    self.max_requests_per_minute = 99

    if not os.path.exists(self.subfolder):
      os.makedirs(self.subfolder)

  def add_days_to_date(self, start_date_str, days_to_add):
    start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
    new_date = start_date + timedelta(days=days_to_add)
    return new_date.strftime('%Y-%m-%d')

  def fetch_data(self):
    current_date = self.start_date
    index = 0

    while current_date <= self.end_date:
      start_time = datetime.now()

      self.query['start'] = current_date.strftime('%Y-%m-%d')
      new_date = self.add_days_to_date(self.query['start'], 10)
      self.query['end'] = new_date
      response = requests.request("GET", self.url, headers=self.headers, data=self.payload, params=self.query)
      print(f"offset: {index} response: {response.status_code}")

      filename = f"traffy_{current_date.strftime('%Y-%m-%d')}.json"
      file_path = os.path.join(self.subfolder, filename)

      with open(file_path, "w") as outfile:
        json_object = json.dumps(response.json(), indent=4)
        outfile.write(json_object)

      end_time = datetime.now()
      elapsed_time = (end_time - start_time).total_seconds()
      print(f"Elapsed time: {elapsed_time} s")

      index += 950
      current_date = datetime.strptime(new_date, '%Y-%m-%d') + timedelta(days=1)

      if index % self.max_requests_per_minute == 0:
        time.sleep(60 - elapsed_time)

if _name_ == "_main_":
  fetcher = TraffyDataFetcher(start_date='2022-01-01')
  fetcher.fetch_data()

And the following code converted the json to CSV files

import os
import glob
import json
import pandas as pd
#import numpy as np

class TraffyJSONFixer:
  def _init_(self, path_to_json='*.json', subfolder='traffyfonduedata'):
    self.path_to_json = path_to_json
    self.subfolder = subfolder
    self.outputfolder = 'fixedjson'
    self.excelfolder = 'exceloutput'
    self.file_path = os.path.join(self.subfolder, self.path_to_json)
    self.json_files = glob.glob(self.file_path)
    
    # Ensure the subfolder exists
    if not os.path.exists(self.subfolder):
      os.makedirs(self.subfolder)
    # Ensure the outputfolder exists
    if not os.path.exists(self.outputfolder):
      os.makedirs(self.outputfolder)
    # Ensure the excelfolder exists
    if not os.path.exists(self.excelfolder):
      os.makedirs(self.excelfolder)
    
    # Debugging: Print the current working directory and the list of JSON files
    print(f"Current working directory: {os.getcwd()}")
    print(f"Found JSON files: {self.json_files}")
    
  def fix_json_files(self):
    for count, ele in enumerate(self.json_files):
      new_file_name = os.path.join(self.outputfolder, f"data_{os.path.basename(ele)}")
      
      try:
        with open(ele, 'r', encoding='utf-8') as f:
          data = json.load(f)

        # Debugging: Print the type of data
        print(f"Processing file: {ele}")
        print(f"Type of data: {type(data)}")
        
        # Handle different JSON structures
        if isinstance(data, dict) and "results" in data:
          results = data["results"]
        elif isinstance(data, list):
          results = data
        else:
          print(f"Unexpected JSON structure in file: {ele}")
          continue

        # Ensure results is a list or dict before writing
        if isinstance(results, (list, dict)):
          with open(new_file_name, 'w', encoding='utf-8') as f:
            f.write(json.dumps(results, indent=4))
        else:
          print(f"Unexpected type for results in file: {ele}")
      except (json.JSONDecodeError, KeyError) as e:
        print(f"Error processing file {ele}: {e}")

  def jsontoexcel(self):
    jsonfile_path = os.path.join(self.out...

Facebook

Twitter

Click to copy link

Link copied

Cite

Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019

US Consumer Complaints Against Businesses

2 Million records including product, company name, issue details and response

Explore at:

zip(343188956 bytes)Available download formats

Dataset updated

Oct 9, 2022

Authors

Jeffery Mandrake

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

2,121,458 records

I used Google Colab to check out this dataset and pull the column names using Pandas.

Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

I did not modify the dataset.

Use it to practice with dataframes - Pandas or PySpark on Google Colab:

!unzip complaints.csv.zip

import pandas as pd df = pd.read_csv('complaints.csv') df.columns

df.head() etc.

Clear search

Close search

Google apps

Main menu

US Consumer Complaints Against Businesses

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

Linux Kernel binary size

Load csv by setting options as int8 to save a lot of memory

Merge number of excel file,convert into csv file

Evaluation results of a knee distraction unloader brace on a robotic test...

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

The Device Activity Report with Complete Knowledge (DARCK) for NILM

1. Abstract

2. Dataset Overview

3. Download and Usage

4. Measurement Setup

5. File Format (DARCK.csv)

Column Descriptions

Column Name

Data Type

Unit

Description

6. Data Postprocessing Pipeline

6.1. Main Meter (main) Postprocessing

6.2. Sub-metered Devices (shellies) Postprocessing

6.3. Merging and Finalization

7. Manual Corrections and Known Data Issues

8. Appliance Details and Multipurpose Plugs

Ecommerce Dataset (Products & Sizes Included)

VANET-IRAQ-BSM-Attacks

🆄🅽🅸🆅🅴🆁🆂🅸🆃🆈 🅾🅵 🅺🆄🅵🅰

ʙʏ Mᴏʜᴀᴍᴍᴀᴅ Aʙʙᴀs Sʜᴀʀᴇᴇғ & Dʀ.Fᴀʜᴀᴅ Gʜᴀʟɪʙ

3D skeletons UP-Fall Dataset

Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

NYC Jobs Dataset (Filtered Columns)

Compare Baseball Player Statistics using Visualiza

1. Load the Data

2. Explore the Data

3. Visualization

Example Code

Explanation of the Code

Conclusion

Pre-Processed Power Grid Frequency Time Series

Brain Tumor CSV

Motivation and Use Cases

Data Structure

CSV File Contents

How to Use This Dataset

Technical Details

Acknowledgments

Using HydroShare Buckets to Access Resource Files

Dataset with four years of condition monitoring technical language...

Data for Gradient boosted decision trees reveal nuances of auditory...

Discharge of springs monitored during the WABEsense project

List of springs:

Data for: Can government transfers make energy subsidy reform socially...

converted json to CSV Traffy Fondue data

Traffy Fondue Data

US Consumer Complaints Against Businesses

2 Million records including product, company name, issue details and response

5. File Format (`DARCK.csv`)

6.1. Main Meter (`main`) Postprocessing

6.2. Sub-metered Devices (`shellies`) Postprocessing