84 datasets found
  1. US Consumer Complaints Against Businesses

    • kaggle.com
    zip
    Updated Oct 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019
    Explore at:
    zip(343188956 bytes)Available download formats
    Dataset updated
    Oct 9, 2022
    Authors
    Jeffery Mandrake
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    2,121,458 records

    I used Google Colab to check out this dataset and pull the column names using Pandas.

    Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

    Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

    I did not modify the dataset.

    Use it to practice with dataframes - Pandas or PySpark on Google Colab:

    !unzip complaints.csv.zip

    import pandas as pd df = pd.read_csv('complaints.csv') df.columns

    df.head() etc.

  2. Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    • The data is licensed through the Creative Commons Attribution 4.0 International.
    • If you have used our data and are publishing your work, we ask that you please reference both:
      1. this database through its DOI, and
      2. any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    • Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.
    • Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.
    • Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data
      • Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.
      • We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Clean_Data_v1-0-0.zip: contains all the downsampled data
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Database_References_v1-0-0.bib
      • Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    • The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data
    • Time[s]: time in seconds since the start of the test
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas
    data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    • The first column is the index of each data point
    • S/No: sample number recorded by the DAQ
    • System Date: Date and time of sample
    • Time[s]: time in seconds since the start of the test
    • C_1_Force[kN]: load cell force
    • C_1_Déform1[mm]: extensometer displacement
    • C_1_Déplacement[mm]: cross-head displacement
    • Eng_Stress[MPa]: engineering stress
    • Eng_Strain[]: engineering strain
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    • hidden_index: internal reference ID
    • grade: material grade
    • spec: specifications for the material
    • source: base material for the test specimen
    • id: internal name for the specimen
    • lp: load protocol
    • size: type of specimen (M8, M12, M20)
    • gage_length_mm_: unreduced section length in mm
    • avg_reduced_dia_mm_: average measured diameter for the reduced section in mm
    • avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm
    • avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm
    • fy_n_mpa_: nominal yield stress
    • fu_n_mpa_: nominal ultimate stress
    • t_a_deg_c_: ambient temperature in degC
    • date: date of test
    • investigator: person(s) who conducted the test
    • location: laboratory where test was conducted
    • machine: setup used to conduct test
    • pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control
    • pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control
    • pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control
    • citekey: reference corresponding to the Database_References.bib file
    • yield_stress_mpa_: computed yield stress in MPa
    • elastic_modulus_mpa_: computed elastic modulus in MPa
    • fracture_strain: computed average true strain across the fracture surface
    • c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass
    • file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
              index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
              keep_default_na=False, na_values='')
    • citekey: reference in "Campaign_References.bib".
    • Grade: material grade.
    • Spec.: specifications (e.g., J2+N).
    • Yield Stress [MPa]: initial yield stress in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
    • Elastic Modulus [MPa]: initial elastic modulus in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    • The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:
      • A500
      • A992_Gr50
      • BCP325
      • BCR295
      • HYP400
      • S460NL
      • S690QL/25mm
      • S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
  3. Z

    Linux Kernel binary size

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugo MARTIN; Mathieu ACHER (2021). Linux Kernel binary size [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4943883
    Explore at:
    Dataset updated
    Jun 14, 2021
    Dataset provided by
    IRISA
    Authors
    Hugo MARTIN; Mathieu ACHER
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset containing measurements of Linux Kernel binary size after compilation. The reported size, in the column "perf", is the size in bytes of the vmlinux file. In contains also a column "active_options" reporting the number of activated options (set at "y"). All other columns, the list being reported in the file "Linux_options.json", are Linux kernel options. The sampling have been made using randconfig. The version of Linux used is 4.13.3.

    Not all available options are present. First, it only contains options about the x86 and 64 bits version. Then, all non-tristate options have been ignored. Finally, options not having multiple value through the whole dataset, due to not enough variability in the sampling, are ignored. All options are encoded as 0 for "n" and "m" options value, and 1 for "y".

    In python, importing the dataset using pandas will attribute all columns to int64, which will lead to a great consumption of memory (~50GB). We provide this way to import it using less than 1 GB of memory by setting options columns to int8.

    import pandas as pd import json import numpy

    with open("Linux_options.json","r") as f: linux_options = json.load(f)

    Load csv by setting options as int8 to save a lot of memory

    return pd.read_csv("Linux.csv", dtype={f:numpy.int8 for f in linux_options})

  4. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  5. R

    Evaluation results of a knee distraction unloader brace on a robotic test...

    • entrepot.recherche.data.gouv.fr
    bin, pdf +1
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lea Boillereaux; Lea Boillereaux; Simon Le Floc'h; Simon Le Floc'h; Franck Jourdan; Franck Jourdan; Arnaud Tanguy; Gille Camp; Gille Camp; Anaïs Vaysse; Gilles Dusfour; Gilles Dusfour; Abderrahmane Kheddar; Abderrahmane Kheddar; Arnaud Tanguy; Anaïs Vaysse (2025). Evaluation results of a knee distraction unloader brace on a robotic test bench – CSV files of the data obtained from the test bench – Model of the bones – Model of the tested cam of the distraction brace [Dataset]. http://doi.org/10.57745/VCALE0
    Explore at:
    text/comma-separated-values(36000614), text/comma-separated-values(36241909), text/comma-separated-values(36286698), text/comma-separated-values(36159297), bin(4615519), bin(14218738), text/comma-separated-values(36103576), text/comma-separated-values(37097577), bin(2350527), text/comma-separated-values(36014331), bin(4807364), text/comma-separated-values(36279419), text/comma-separated-values(35882565), pdf(93247), bin(2377437), text/comma-separated-values(36068067), bin(5916146), bin(2482876), text/comma-separated-values(36424785), bin(6588074), text/comma-separated-values(36224058), text/comma-separated-values(36024776), bin(9872504), text/comma-separated-values(36369481), text/comma-separated-values(36353970), pdf(36547), text/comma-separated-values(36507716), text/comma-separated-values(36462611), bin(96967), bin(78659), text/comma-separated-values(36124222), text/comma-separated-values(37042098), text/comma-separated-values(36136856), text/comma-separated-values(36391516), text/comma-separated-values(36913802), text/comma-separated-values(36596530), bin(7450705), text/comma-separated-values(36107973), text/comma-separated-values(36907525), text/comma-separated-values(12156782), text/comma-separated-values(36091853), text/comma-separated-values(36307863), text/comma-separated-values(37583326), bin(15096627), text/comma-separated-values(37182113), bin(8016924), text/comma-separated-values(36347087), text/comma-separated-values(36450906), text/comma-separated-values(36562118), text/comma-separated-values(36137120), text/comma-separated-values(36238235), text/comma-separated-values(35987625), text/comma-separated-values(36089715), bin(45229806), bin(3162170), text/comma-separated-values(36628692), text/comma-separated-values(36231435), text/comma-separated-values(35946191), text/comma-separated-values(36226364), bin(12725469), text/comma-separated-values(35858211), bin(15181492), text/comma-separated-values(36174476), bin(100215316), pdf(74165), text/comma-separated-values(36967243), bin(4917993), text/comma-separated-values(36255041)Available download formats
    Dataset updated
    Sep 2, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Lea Boillereaux; Lea Boillereaux; Simon Le Floc'h; Simon Le Floc'h; Franck Jourdan; Franck Jourdan; Arnaud Tanguy; Gille Camp; Gille Camp; Anaïs Vaysse; Gilles Dusfour; Gilles Dusfour; Abderrahmane Kheddar; Abderrahmane Kheddar; Arnaud Tanguy; Anaïs Vaysse
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.57745/VCALE0https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.57745/VCALE0

    Time period covered
    Oct 1, 2021 - Aug 31, 2025
    Description

    Dataset Description Dataset Description This dataset is associated with the publication titled "A Distraction Knee-Brace and a Robotic Testbed for Tibiofemoral Load Reduction during Squatting" in IEEE Transactions on Medical Robotics and Bionics. It provides comprehensive data supporting the development and evaluation of a knee distraction brace designed to reduce tibiofemoral contact forces during flexion. Contents Cam Profiles STL files of the initial cam profiles designed based on averaged tibiofemoral contact force data collected from 5 squats of a patient with an instrumented prosthesis (K7L) from the CAMS Knee dataset (accessible via https://orthoload.com/). Optimized cam profiles, corrected based on experimental results, are also included. These profiles enable patient-specific adjustments to account for the non-linear evolution of tibiofemoral contact forces with flexion angles. Experimental Results CSV files containing raw results from robotic testbed experiments, testing the knee brace under various initial pneumatic pressures in the actuators. Data is provided for tests conducted: Without the brace, With the initial cam profiles, With the optimized cam profiles. Each CSV file corresponds to a specific test condition, detailing forces and kinematics observed during squatting. 3D Models of Bones and Testbed Components Geometries of the femur head and tibial plateau used in the robotic testbed experiments, provided in STEP, STL, and SLDPRT/SLDASM formats. A README file describes the biomechanical coordinate systems used for: Force and kinematic control of the robotic testbed, Result interpretation and visualization. How to Open and Read the Provided Files The dataset includes files in CSV, SLDPRT, SLDASM, STL, and IGES formats. Below are recommended software solutions, with a preference for open-source options: CSV (Comma-Separated Values): Can be opened with Microsoft Excel, Google Sheets, or open-source software like LibreOffice Calc or Python (using pandas). SLDPRT & SLDASM (SolidWorks Parts and Assemblies): These files are native to SolidWorks. For viewing without SolidWorks, use eDrawings Viewer (free) or FreeCAD (limited compatibility). STL (3D Model Format): Can be opened with MeshLab, FreeCAD, or Blender. Most 3D printing software (like Cura or PrusaSlicer) also support STL. IGES (3D CAD Exchange Format): Can be read with FreeCAD, Fusion 360 (free for personal use), or OpenCascade-based software like CAD Assistant. For full compatibility, commercial software like SolidWorks or CATIA may be required for SLDPRT and SLDASM files. However, FreeCAD and other open-source tools provide partial support. See the associated publication and the README files included in the dataset for more information.

  6. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  7. The Device Activity Report with Complete Knowledge (DARCK) for NILM

    • zenodo.org
    bin, xz
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Anonymous; Anonymous Anonymous (2025). The Device Activity Report with Complete Knowledge (DARCK) for NILM [Dataset]. http://doi.org/10.5281/zenodo.17159850
    Explore at:
    bin, xzAvailable download formats
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Anonymous; Anonymous Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1. Abstract

    This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

    2. Dataset Overview

    • Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
    • Aggregate Meter: eBZ DD3
    • Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
    • Sampling Rate: 1 Hz
    • Measured Quantity: Active Power
    • Unit of Measurement: Watt
    • Duration: 6 months
    • Format: Single CSV file (`DARCK.csv`)
    • Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
    • Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

    3. Download and Usage

    The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

    As it contains longer off periods with zeros, the CSV file is nicely compressible.


    To extract it use: xz -d DARCK.csv.xz.
    The compression leads to a 97% smaller file size (From 4GB to 90.9MB).


    To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

    python
    import pandas as pd

    df = pd.read_csv("DARCK.csv", parse_dates=["time"])

    4. Measurement Setup

    The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

    5. File Format (DARCK.csv)

    The dataset is provided as a single comma-separated value (CSV) file.

    • The first row is a header containing the column names.
    • All power values are rounded to the first decimal place.
    • There are no missing values in the final dataset.
    • Each row represents 1 second, from start of measuring in March until the end in September.

    Column Descriptions

    Column Name

    Data Type

    Unit

    Description

    timedatetime-Timestamp for the reading in YYYY-MM-DD HH:MM:SS
    mainfloatWattTotal aggregate power consumption for the apartment, measured at the main electrical panel.
    [appliance_name]floatWattPower consumption of an individual appliance (e.g., lightbathroom, fridge, sherlockpc). See Section 8 for a full list.
    Aggregate Columns
    aggr_chargersfloatWattThe sum of sherlockcharger, sherlocklaptop, watsoncharger, watsonlaptop, watsonipadcharger, kitchencharger.
    aggr_stoveplatesfloatWattThe sum of stoveplatel1 and stoveplatel2.
    aggr_lightsfloatWattThe sum of lightbathroom, lighthallway, lightsherlock, lightkitchen, lightlivingroom, lightwatson, lightstoreroom, fcob, sherlockalarmclocklight, sherlockfloorlamphue, sherlockledstrip, livingfloorlamphue, sherlockglobe, watsonfloorlamp, watsondesklamp and watsonledmap.
    Analysis Columns
    inaccuracyfloatWattAs no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

    6. Data Postprocessing Pipeline

    The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

    6.1. Main Meter (main) Postprocessing

    The aggregate power data required several cleaning steps to ensure accuracy.

    1. Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
    2. Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
    3. Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

    6.2. Sub-metered Devices (shellies) Postprocessing

    The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

    1. Grouping: Data was grouped by the unique device identifier.
    2. Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill().
      This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

    6.3. Merging and Finalization

    1. Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
    2. Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

    7. Manual Corrections and Known Data Issues

    During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

    1. March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
    2. May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

    8. Appliance Details and Multipurpose Plugs

    The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance.

  8. Ecommerce Dataset (Products & Sizes Included)

    • kaggle.com
    zip
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anvit kumar (2025). Ecommerce Dataset (Products & Sizes Included) [Dataset]. https://www.kaggle.com/datasets/anvitkumar/shopping-dataset
    Explore at:
    zip(1274856 bytes)Available download formats
    Dataset updated
    Nov 13, 2025
    Authors
    Anvit kumar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📦 Ecommerce Dataset (Products & Sizes Included)

    🛍️ Essential Data for Building an Ecommerce Website & Analyzing Online Shopping Trends 📌 Overview This dataset contains 1,000+ ecommerce products, including detailed information on pricing, ratings, product specifications, seller details, and more. It is designed to help data scientists, developers, and analysts build product recommendation systems, price prediction models, and sentiment analysis tools.

    🔹 Dataset Features

    Column Name Description product_id Unique identifier for the product title Product name/title product_description Detailed product description rating Average customer rating (0-5) ratings_count Number of ratings received initial_price Original product price discount Discount percentage (%) final_price Discounted price currency Currency of the price (e.g., USD, INR) images URL(s) of product images delivery_options Available delivery methods (e.g., standard, express) product_details Additional product attributes breadcrumbs Category path (e.g., Electronics > Smartphones) product_specifications Technical specifications of the product amount_of_stars Distribution of star ratings (1-5 stars) what_customers_said Customer reviews (sentiments) seller_name Name of the product seller sizes Available sizes (for clothing, shoes, etc.) videos Product video links (if available) seller_information Seller details, such as location and rating variations Different variants of the product (e.g., color, size) best_offer Best available deal for the product more_offers Other available deals/offers category Product category

    📊 Potential Use Cases

    📌 Build an Ecommerce Website: Use this dataset to design a functional online store with product listings, filtering, and sorting. 🔍 Price Prediction Models: Predict product prices based on features like ratings, category, and discount. 🎯 Recommendation Systems: Suggest products based on user preferences, rating trends, and customer feedback. 🗣 Sentiment Analysis: Analyze what_customers_said to understand customer satisfaction and product popularity. 📈 Market & Competitor Analysis: Track pricing trends, popular categories, and seller performance. 🔍 Why Use This Dataset? ✅ Rich Feature Set: Includes all necessary ecommerce attributes. ✅ Realistic Pricing & Rating Data: Useful for price analysis and recommendations. ✅ Multi-Purpose: Suitable for machine learning, web development, and data visualization. ✅ Structured Format: Easy-to-use CSV format for quick integration.

    📂 Dataset Format CSV file (ecommerce_dataset.csv) 1000+ samples Multi-category coverage 🔗 How to Use? Download the dataset from Kaggle. Load it in Python using Pandas: python Copy Edit import pandas as pd
    df = pd.read_csv("ecommerce_dataset.csv")
    df.head() Explore trends & patterns using visualization tools (Seaborn, Matplotlib). Build models & applications based on the dataset!

  9. VANET-IRAQ-BSM-Attacks

    • zenodo.org
    application/gzip
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohammad abbas alkifaee; mohammad abbas alkifaee; Fahad Ghalib Abdulkadhim; Fahad Ghalib Abdulkadhim (2025). VANET-IRAQ-BSM-Attacks [Dataset]. http://doi.org/10.5281/zenodo.17167970
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    mohammad abbas alkifaee; mohammad abbas alkifaee; Fahad Ghalib Abdulkadhim; Fahad Ghalib Abdulkadhim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 21, 2025
    Area covered
    Iraq
    Description

    # Baghdad VANET BSM-Based

    Attack Dataset (F2MD Scenarios) — Raw CSVs (File-level by Attack ID)

    🆄🅽🅸🆅🅴🆁🆂🅸🆃🆈 🅾🅵 🅺🆄🅵🅰

    ʙʏ Mᴏʜᴀᴍᴍᴀᴅ Aʙʙᴀs Sʜᴀʀᴇᴇғ & Dʀ.Fᴀʜᴀᴅ Gʜᴀʟɪʙ

    ## Overview
    This package groups the original raw CSV files by attack type at the file level. No content changes were made to the CSVs—files are copied as-is into family folders.

    ## Dataset Summary
    Total files: 17

    Total records (all files): 35830975

    Attack records (label=1): 2359250

    Benign records (label=0): 33471725

    Other/unknown labels: 0

    ## Attack Families
    - ConstPos — Frozen constant position
    - ConstPosOffset — Constant offset to coordinates
    - RandomPos — Random fake positions
    - RandomPosOffset — Random offset to the true position
    - ConstSpeedOffset — Constant speed bias
    - RandomSpeed — Random implausible speeds
    - EventualStop — Gradual or sudden stop spoofing
    - Disruptive — Protocol fields/values deliberately disruptive
    - DataReplay — Replay of past data
    - StaleMessages — Old or delayed messages
    - DoS — High-rate flooding
    - DoSRandom — Randomly fluctuating flooding
    - DoSDisruptive — Intermittent aggressive flooding
    - GridSybil — Coordinated fake identities (Sybil)
    - DoSRandomSybil — Random DoS with Sybil identities
    - DoSDisruptiveSybil — Aggressive DoS with Sybil identities
    - Unknown — Files not mapped to a specific family

    ## Per-family Totals
    | Family | Files | Records | Attack (label=1) | Benign (label=0) | Other |
    |:--|--:|--:|--:|--:|--:|
    | ConstPos | 1 | 1327217 | 63504 | 1263713 | 0 |
    | ConstPosOffset | 1 | 1305206 | 62749 | 1242457 | 0 |
    | ConstSpeedOffset | 1 | 2150356 | 102707 | 2047649 | 0 |
    | DataReplay | 1 | 922481 | 43916 | 878565 | 0 |
    | Disruptive | 1 | 1063416 | 52038 | 1011378 | 0 |
    | DoS | 1 | 1241649 | 167490 | 1074159 | 0 |
    | DoSDisruptive | 1 | 705817 | 97649 | 608168 | 0 |
    | DoSDisruptiveSybil | 1 | 2113005 | 24365 | 2088640 | 0 |
    | DoSRandom | 1 | 6440583 | 867983 | 5572600 | 0 |
    | DoSRandomSybil | 1 | 2499578 | 30382 | 2469196 | 0 |
    | EventualStop | 1 | 2124546 | 101617 | 2022929 | 0 |
    | GridSybil | 1 | 622012 | 108728 | 513284 | 0 |
    | RandomPos | 1 | 1087145 | 53253 | 1033892 | 0 |
    | RandomPosOffset | 1 | 3258131 | 158686 | 3099445 | 0 |
    | RandomSpeed | 1 | 3676823 | 176305 | 3500518 | 0 |
    | StaleMessages | 1 | 770026 | 36645 | 733381 | 0 |
    | Unknown | 1 | 4522984 | 211233 | 4311751 | 0 |

    ## Per-file Details
    | Family | File | Records | Attack (1) | Benign (0) | Other | Label column present |
    |:--|:--|--:|--:|--:|--:|:--:|
    | ConstPos | 1.csv | 1327217 | 63504 | 1263713 | 0 | yes |
    | ConstPosOffset | 2.csv | 1305206 | 62749 | 1242457 | 0 | yes |
    | ConstSpeedOffset | 6.csv | 2150356 | 102707 | 2047649 | 0 | yes |
    | DataReplay | 11.csv | 922481 | 43916 | 878565 | 0 | yes |
    | Disruptive | 10.csv | 1063416 | 52038 | 1011378 | 0 | yes |
    | DoS | 13.csv | 1241649 | 167490 | 1074159 | 0 | yes |
    | DoSDisruptive | 15.csv | 705817 | 97649 | 608168 | 0 | yes |
    | DoSDisruptiveSybil | 19.csv | 2113005 | 24365 | 2088640 | 0 | yes |
    | DoSRandom | 14.csv | 6440583 | 867983 | 5572600 | 0 | yes |
    | DoSRandomSybil | 18.csv | 2499578 | 30382 | 2469196 | 0 | yes |
    | EventualStop | 9.csv | 2124546 | 101617 | 2022929 | 0 | yes |
    | GridSybil | 16.csv | 622012 | 108728 | 513284 | 0 | yes |
    | RandomPos | 3.csv | 1087145 | 53253 | 1033892 | 0 | yes |
    | RandomPosOffset | 4.csv | 3258131 | 158686 | 3099445 | 0 | yes |
    | RandomSpeed | 7.csv | 3676823 | 176305 | 3500518 | 0 | yes |
    | StaleMessages | 12.csv | 770026 | 36645 | 733381 | 0 | yes |
    | Unknown | 5.csv | 4522984 | 211233 | 4311751 | 0 | yes |

    ## How to Load (Python)
    Use pandas to read any CSV under data/

  10. Z

    3D skeletons UP-Fall Dataset

    • data.niaid.nih.gov
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KOFFI, Tresor (2024). 3D skeletons UP-Fall Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12773012
    Explore at:
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    CESI LINEACT
    Authors
    KOFFI, Tresor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    3D skeletons UP-Fall Dataset

                          Different between Fall and Impact detection 
    

    Overview

    This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios

    Data Collection

    The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.

    CSV Structure

    The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:

    C: Camera (1 or 2)

    S: Subject (1 to 5)

    A: Activity (1 to N, representing different activities)

    T: Trial (1 to 3)

    subject1/`: Contains CSV files for Subject 1.

    C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1

    C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1

    C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1

    C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1

    C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1

    C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1

    subject2/`: Contains CSV files for Subject 2.

    C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2

    C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2

    C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2

    C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2

    C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2

    C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2

    subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.

    Column Descriptions

    Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:

    Column Name

    Description

    joint_1_x

    X coordinate of joint 1

    joint_1_y

    Y coordinate of joint 1

    joint_1_z

    Z coordinate of joint 1

    joint_2_x

    X coordinate of joint 2

    joint_2_y

    Y coordinate of joint 2

    joint_2_z

    Z coordinate of joint 2

    ...

    ...

    joint_n_x

    X coordinate of joint n

    joint_n_y

    Y coordinate of joint n

    joint_n_z

    Z coordinate of joint n

    LABEL

    Label indicating impact (1) or non-impact (0)

    Example

    Here is an example of what a row in one of the CSV files might look like:

    joint_1_x

    joint_1_y

    joint_1_z

    joint_2_x

    joint_2_y

    joint_2_z

    ...

    joint_n_x

    joint_n_y

    joint_n_33

    LABEL

    0.123

    0.456

    0.789

    0.234

    0.567

    0.890

    ...

    0.345

    0.678

    0.901

    0

    Usage

    This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.

    Using github

    1. Clone the repository:

      -bash git clone

    https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset

    1. Navigate to the directory:

      -bash -cd 3D_skeletons-UP-Fall-Dataset

    Examples

    Here's a simple example of how to load and inspect a sample data file using Python:```pythonimport pandas as pd

    Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

    data = pd.read_csv('subject1/C1S1A1T1.csv')print(data.head())

  11. NYC Jobs Dataset (Filtered Columns)

    • kaggle.com
    zip
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). NYC Jobs Dataset (Filtered Columns) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/nyc-jobs-filtered-cols
    Explore at:
    zip(93408 bytes)Available download formats
    Dataset updated
    Oct 5, 2022
    Authors
    Jeffery Mandrake
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial

    The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data

    I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing

    Once the csv file is uploaded to Google Colab, use these commands to process the file.

    import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)

  12. Compare Baseball Player Statistics using Visualiza

    • kaggle.com
    zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
    Explore at:
    zip(1030978 bytes)Available download formats
    Dataset updated
    Sep 28, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

    1. Load the Data

    First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

    2. Explore the Data

    Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

    3. Visualization

    We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

    Example Code

    Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Load the data
    df = pd.read_csv('judge.csv')
    
    # Display the first few rows of the dataframe
    print(df.head())
    
    # Set the style of seaborn
    sns.set(style="whitegrid")
    
    # 1. Average Release Speed by Pitch Type
    plt.figure(figsize=(12, 6))
    avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values()
    sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis")
    plt.title('Average Release Speed by Pitch Type')
    plt.xlabel('Average Release Speed (mph)')
    plt.ylabel('Pitch Type')
    plt.show()
    
    # 2. Trends in Release Speed Over Time
    # First, convert the 'game_date' to datetime
    df['game_date'] = pd.to_datetime(df['game_date'])
    
    plt.figure(figsize=(14, 7))
    sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None)
    plt.title('Trends in Release Speed Over Time')
    plt.xlabel('Game Date')
    plt.ylabel('Average Release Speed (mph)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    # 3. Scatter Plot of Release Speed vs. Events
    plt.figure(figsize=(12, 6))
    sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7)
    plt.title('Release Speed vs. Events')
    plt.xlabel('Release Speed (mph)')
    plt.ylabel('Event Type')
    plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.show()
    

    Explanation of the Code

    • Data Loading: The CSV file is loaded into a Pandas DataFrame.
    • Average Release Speed: A bar chart shows the average release speed for each pitch type.
    • Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.
    • Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

    Conclusion

    These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!

  13. Pre-Processed Power Grid Frequency Time Series

    • zenodo.org
    bin, zip
    Updated Jul 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut (2021). Pre-Processed Power Grid Frequency Time Series [Dataset]. http://doi.org/10.5281/zenodo.3744121
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johannes Kruse; Johannes Kruse; Benjamin Schäfer; Benjamin Schäfer; Dirk Witthaut; Dirk Witthaut
    Description

    Overview
    This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:

    • Continental Europe
    • Great Britain
    • Nordic

    This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.

    Data sources
    We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).

    • Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].
    • Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].
    • Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].

    Content of the repository

    A) Scripts

    1. In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.
    2. In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).
    3. In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).

    The python scripts run with Python 3.7 and with the packages found in "requirements.txt".

    B) Data_converted and Data_cleansed
    The folder "Data_converted" contains the output of "convert_data_format.py" and "Data_cleansed" contains the output of "clean_corrupted_data.py".

    • File type: The files are zipped csv-files, where each file comprises one year.
    • Data format: The files contain two columns. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The second column contains the frequency values in Hz.
    • NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.

    Use cases
    We point out that this repository can be used in two different was:

    • Use pre-processed data: You can directly use the converted or the cleansed data. Note however that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much. If your application cannot deal with NaNs, you could build upon the following commands to select the longest interval of valid data from the cleansed data:
    from helper_functions import *
    import pandas as pd
    
    cleansed_data = pd.read_csv('/Path_to_cleansed_data/data.zip',
                index_col=0, header=None, squeeze=True,
                parse_dates=[0])
    valid_bounds, valid_sizes = true_intervals(~cleansed_data.isnull())
    start,end= valid_bounds[ np.argmax(valid_sizes) ]
    data_without_nan = cleansed_data.iloc[start:end]
    • Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "Data_converted".

    License
    We release the code in the folder "Scripts" under the MIT license [8]. In the case of Nationalgrid and Fingrid, we further release the pre-processed data in the folder "Data_converted" and "Data_cleansed" under the CC-BY 4.0 license [7]. TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.

  14. Brain Tumor CSV

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Nath (2024). Brain Tumor CSV [Dataset]. https://www.kaggle.com/datasets/akashnath29/brain-tumor-csv/code
    Explore at:
    zip(538175483 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Akash Nath
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset provides grayscale pixel values for brain tumor MRI images, stored in a CSV format for simplified access and ease of use. The goal is to create a "MNIST-like" dataset for brain tumors, where each row in the CSV file represents the pixel values of a single image in its original resolution. This format makes it convenient for researchers and developers to quickly load and analyze MRI data for brain tumor detection, classification, and segmentation tasks without needing to handle large image files directly.

    Motivation and Use Cases

    Brain tumor classification and segmentation are critical tasks in medical imaging, and datasets like these are valuable for developing and testing machine learning and deep learning models. While there are several publicly available brain tumor image datasets, they often consist of large image files that can be challenging to process. This CSV-based dataset addresses that by providing a compact and accessible format. Potential use cases include: - Tumor Classification: Identifying different types of brain tumors, such as glioma, meningioma, and pituitary tumors, or distinguishing between tumor and non-tumor images. - Tumor Segmentation: Applying pixel-level classification and segmentation techniques for tumor boundary detection. - Educational and Rapid Prototyping: Ideal for educational purposes or quick experimentation without requiring large image processing capabilities.

    Data Structure

    This dataset is structured as a single CSV file where each row represents an image, and each column represents a grayscale pixel value. The pixel values are stored as integers ranging from 0 (black) to 255 (white).

    CSV File Contents

    • Pixel Values: Each row contains the pixel values of a single grayscale image, flattened into a 1-dimensional array. The original image dimensions vary, and rows in the CSV will correspondingly vary in length.
    • Simplified Access: By using a CSV format, this dataset avoids the need for specialized image processing libraries and can be easily loaded into data analysis and machine learning frameworks like Pandas, Scikit-Learn, and TensorFlow.

    How to Use This Dataset

    1. Loading the Data: The CSV can be loaded using standard data analysis libraries, making it compatible with Python, R, and other platforms.
    2. Data Preprocessing: Users may normalize pixel values (e.g., between 0 and 1) for deep learning applications.
    3. Splitting Data: While this dataset does not predefine training and testing splits, users can separate rows into training, validation, and test sets.
    4. Reshaping for Models: If needed, each row can be reshaped to the original dimensions (retrieved from the subfolder structure) to view or process as an image.

    Technical Details

    • Image Format: Grayscale MRI images, with pixel values ranging from 0 to 255.
    • Resolution: Original resolution, no resizing applied.
    • Size: Each row’s length varies according to the original dimensions of each MRI image.
    • Data Type: CSV file with integer pixel values.

    Acknowledgments

    This dataset is intended for research and educational purposes only. Users are encouraged to cite and credit the original data sources if using this dataset in any publications or projects. This is a derived CSV version aimed to simplify access and usability for machine learning and data science applications.

  15. d

    Using HydroShare Buckets to Access Resource Files

    • search.dataone.org
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pabitra Dash (2025). Using HydroShare Buckets to Access Resource Files [Dataset]. https://search.dataone.org/view/sha256%3Ab25a0f5e5d62530d70ecd6a86f1bd3fa2ab804a8350dc7ba087327839fcb1fb1
    Explore at:
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Hydroshare
    Authors
    Pabitra Dash
    Description

    This resource contains a draft Jupyter Notebook that has example code snippets showing how to access HydroShare resource files using HydroShare S3 buckets. The user_account.py is a utility to read user hydroshare cached account information in any of the JupyterHub instances that HydroShare has access to. The example notebook uses this utility so that you don't have to enter your hydroshare account information in order to access hydroshare buckets.

    Here are the 3 notebooks in this resource:

    • hydroshare_s3_bucket_access_examples.ipynb:

    The above notebook has examples showing how to upload/download resource files from the resource bucket. It also contains examples how to list files and folders of a resource in a bucket.

    • python-modules-direct-read-from-bucket/hs_bucket_access_gdal_example.ipynb:

    The above notebook has examples for reading raster and shapefile from bucket using gdal without the need of downloading the file from the bucket to local disk.

    • python-modules-direct-read-from-bucket/hs_bucket_access_non_gdal_example.ipynb

    The above notebook has examples of using h5netcdf and xarray for reading netcdf file directly from bucket. It also contains examples of using rioxarray to read raster file, and pandas to read CSV file from hydroshare buckets.

  16. r

    Dataset with four years of condition monitoring technical language...

    • researchdata.se
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karl Löwenmark; Fredrik Sandin; Marcus Liwicki; Stephan Schnabel (2025). Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden [Dataset]. http://doi.org/10.5878/hafd-ms27
    Explore at:
    (74859)Available download formats
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Luleå University of Technology
    Authors
    Karl Löwenmark; Fredrik Sandin; Marcus Liwicki; Stephan Schnabel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2018 - 2022
    Area covered
    Sweden
    Description

    This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title.

    Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']

  17. u

    Data for Gradient boosted decision trees reveal nuances of auditory...

    • rdr.ucl.ac.uk
    txt
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Griffiths; Jennifer Bizley; Jules Lebert; Joseph Sollini (2024). Data for Gradient boosted decision trees reveal nuances of auditory discrimination behavior [Dataset]. http://doi.org/10.5522/04/25386565.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    University College London
    Authors
    Carla Griffiths; Jennifer Bizley; Jules Lebert; Joseph Sollini
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Raw data for the article: Gradient boosted decision trees reveal nuances of auditory discrimination behaviour (PLOS Computational Biology).This data repository contains the csv files after extraction of the raw MATLAB metadata files into pandas (Python) dataframes (helper function author: Jules Lebert). The csv files can easily be loaded back into dataframe objects using pandas before the subsampling steps (as documented in the paper, we used subsampling to ensure the number of F0-roved and control F0 trials were relatively equal) are completed.Link to GitHub repository to run the models on this data: https://github.com/carlacodes/boostmodelsA full description of each of the variables within the dataframe can be found in the data_description_instructions_for_datasets_plos_bio.pdf.Abstract: Animal psychophysics can generate rich behavioral datasets, often comprised of many 1000s of trials for an individual subject. Gradient-boosted models are a promising machine learning approach for analyzing such data, partly due to the tools that allow users to gain insight into how the model makes predictions. We trained ferrets to report a target word’s presence, timing, and lateralization within a stream of consecutively presented non-target words. To assess the animals’ ability to generalize across pitch, we manipulated the fundamental frequency (F0) of the speech stimuli across trials, and to assess the contribution of pitch to streaming, we roved the F0 from word token-to-token. We then implemented gradient-boosted regression and decision trees on the trial outcome and reaction time data to understand the behavioral factors behind the ferrets’ decision-making. We visualized model contributions by implementing SHAPs feature importance and partial dependency plots. While ferrets could accurately perform the task across all pitch-shifted conditions, our models reveal subtle effects of shifting F0 on performance, with within-trial pitch shifting elevating false alarms and extending reaction times. Our models identified a subset of non-target words that animals commonly false alarmed to. Follow-up analysis demonstrated that the spectrotemporal similarity of target and non-target words rather than similarity in duration or amplitude waveform was the strongest predictor of the likelihood of false alarming. Finally, we compared the results with those obtained with traditional mixed effects models, revealing equivalent or better performance for the gradient-boosted models over these approaches.

  18. Z

    Discharge of springs monitored during the WABEsense project

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Pablo Carbajal; Housseini, Reza; Lippuner, Jeannette; Lippuner, Daniela (2023). Discharge of springs monitored during the WABEsense project [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8239367
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Uli Lippuner AG
    OST - Eastern Switzerland University of Applied Sciences
    Authors
    Juan Pablo Carbajal; Housseini, Reza; Lippuner, Jeannette; Lippuner, Daniela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Water discharge and temperature of the springs monitored during the WABEsense project (UTF 642.21.20). The data corresponding to the all field measurements performed between Feb. 2021 and Dec 2023. Data for each spring might not cover the whole period. For each spring there are two files: *.csv and *.meta. The .csv file contains the recorded data. The .meta fiel contains further information about the spring (e.g. location) and the data.

    List of springs:

    • Bonaduz BS Paliu Fravi- Bonaduz BS Salums Friedrich- Bonaduz BS Salums Leo- Bonaduz SS Leo Friedrich- Bregaglia BS Acqua d'Balz 1- Bregaglia BS Acqua d'Balz 2- Hergiswil BS Rossmoos- Hergiswil SS Muesli- Oberriet SS Ulrika- Schiers BS Chalta Wasser- Schiers BS Grapp rechts- Susch BS Prada bella suot- Susch BS Prada bella sura- Zernez BS Sarsura- Zug SS Nidfuren More details about each spring is given in the corresponding .meta file. ## Files formats All files are plain text. The .csv files are UTF-8 encoded and separated by ";" (semi-colon) . The .meta files follow the YAML format. An example of loading the data in python using pandas is given below: import pandas as pd data = pd.read_csv("Bonaduz.BS.Paliu_Fravi_discharge.csv", encoding="utf-8", sep=";")
  19. m

    Data for: Can government transfers make energy subsidy reform socially...

    • data.mendeley.com
    Updated Mar 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Filip Schaffitzel (2020). Data for: Can government transfers make energy subsidy reform socially acceptable? A case study on Ecuador [Dataset]. http://doi.org/10.17632/z35m76mf9g.1
    Explore at:
    Dataset updated
    Mar 31, 2020
    Authors
    Filip Schaffitzel
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    Ecuador
    Description

    Estimating the distributional impacts of energy subsidy removal and compensation schemes in Ecuador based on input-output and household data.

    Import files: Dictionary Categories.csv, Dictionary ENI-IOT.csv, and Dictionary Subcategories.csv based on [1] Dictionary IOT.csv and IOT_2012.csv (cannot be redistruted) based on [2] Dictionary Taxes.csv and Dictionary Transfers.csv based on [3] ENIGHUR11_GASTOS_V.csv, ENIGHUR11_HOGARES_AGREGADOS.csv, and ENIGHUR11_PERSONAS_INGRESOS.csv based on [4] Price increase scenarios.csv based on [5]

    Further basic files and documents: [1] 4_M&D_Mapping ENIGHUR expenditures to IOT_180605.xlsm [2] Input-output table 2012 (https://contenido.bce.fin.ec/documentos/PublicacionesNotas/Catalogo/CuentasNacionales/Anuales/Dolares/MIP2012Ampliada.xls). Save the sheet with the IOT 2012 (Matriz simétrica) as IOT_2012.csv and edit the format: first column and row: IOT labels [3] 4_M&D_ENIGHUR income_180606.xlsx [4] ENIGHUR data can be retrieved from http://www.ecuadorencifras.gob.ec/encuesta-nacional-de-ingresos-y-gastos-de-los-hogares-urbanos-y-rurales/ Household datasets are only available in SPSS file format and the free software PSPP is used to convert .sav- to .csv-files, as this format can be read directly and efficiently into a Python Pandas DataFrame. See PSPP syntax below: save translate /outfile = filename /type = CSV /textoptions decimal = DOT /textoptions delimiter = ';' /fieldnames /cells=values /replace. [5] 3_Ecuador_Energy subsidies and 4_M&D_Price scenarios_180610.xlsx

  20. converted json to CSV Traffy Fondue data

    • kaggle.com
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hansen (2025). converted json to CSV Traffy Fondue data [Dataset]. https://www.kaggle.com/datasets/motethansen/converted-json-to-csv-traffy-fondue-data
    Explore at:
    zip(31705770 bytes)Available download formats
    Dataset updated
    Jan 15, 2025
    Authors
    Hansen
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Traffy Fondue Data

    Data pulled from Traffy Fondue, by accessing the Traffy Fondue Open API. Date January 2022 until January 2025

    The following code pulled the data:

    
    import os
    import json
    import requests
    from datetime import datetime, timedelta
    import time
    
    class TraffyDataFetcher:
      def _init_(self, start_date, subfolder='traffyfonduedata'):
        self.url = "https://publicapi.traffy.in.th/share/teamchadchart/search"
        self.query = {'offset': '0'}
        self.payload = {}
        self.headers = {}
        self.start_date = datetime.strptime(start_date, '%Y-%m-%d')
        self.end_date = datetime.now()
        self.subfolder = subfolder
        self.max_requests_per_minute = 99
    
        if not os.path.exists(self.subfolder):
          os.makedirs(self.subfolder)
    
      def add_days_to_date(self, start_date_str, days_to_add):
        start_date = datetime.strptime(start_date_str, '%Y-%m-%d')
        new_date = start_date + timedelta(days=days_to_add)
        return new_date.strftime('%Y-%m-%d')
    
      def fetch_data(self):
        current_date = self.start_date
        index = 0
    
        while current_date <= self.end_date:
          start_time = datetime.now()
    
          self.query['start'] = current_date.strftime('%Y-%m-%d')
          new_date = self.add_days_to_date(self.query['start'], 10)
          self.query['end'] = new_date
          response = requests.request("GET", self.url, headers=self.headers, data=self.payload, params=self.query)
          print(f"offset: {index} response: {response.status_code}")
    
          filename = f"traffy_{current_date.strftime('%Y-%m-%d')}.json"
          file_path = os.path.join(self.subfolder, filename)
    
          with open(file_path, "w") as outfile:
            json_object = json.dumps(response.json(), indent=4)
            outfile.write(json_object)
    
          end_time = datetime.now()
          elapsed_time = (end_time - start_time).total_seconds()
          print(f"Elapsed time: {elapsed_time} s")
    
          index += 950
          current_date = datetime.strptime(new_date, '%Y-%m-%d') + timedelta(days=1)
    
          if index % self.max_requests_per_minute == 0:
            time.sleep(60 - elapsed_time)
    
    if _name_ == "_main_":
      fetcher = TraffyDataFetcher(start_date='2022-01-01')
      fetcher.fetch_data()
    

    --

    And the following code converted the json to CSV files

    import os
    import glob
    import json
    import pandas as pd
    #import numpy as np
    
    class TraffyJSONFixer:
      def _init_(self, path_to_json='*.json', subfolder='traffyfonduedata'):
        self.path_to_json = path_to_json
        self.subfolder = subfolder
        self.outputfolder = 'fixedjson'
        self.excelfolder = 'exceloutput'
        self.file_path = os.path.join(self.subfolder, self.path_to_json)
        self.json_files = glob.glob(self.file_path)
        
        # Ensure the subfolder exists
        if not os.path.exists(self.subfolder):
          os.makedirs(self.subfolder)
        # Ensure the outputfolder exists
        if not os.path.exists(self.outputfolder):
          os.makedirs(self.outputfolder)
        # Ensure the excelfolder exists
        if not os.path.exists(self.excelfolder):
          os.makedirs(self.excelfolder)
        
        # Debugging: Print the current working directory and the list of JSON files
        print(f"Current working directory: {os.getcwd()}")
        print(f"Found JSON files: {self.json_files}")
        
      def fix_json_files(self):
        for count, ele in enumerate(self.json_files):
          new_file_name = os.path.join(self.outputfolder, f"data_{os.path.basename(ele)}")
          
          try:
            with open(ele, 'r', encoding='utf-8') as f:
              data = json.load(f)
    
            # Debugging: Print the type of data
            print(f"Processing file: {ele}")
            print(f"Type of data: {type(data)}")
            
            # Handle different JSON structures
            if isinstance(data, dict) and "results" in data:
              results = data["results"]
            elif isinstance(data, list):
              results = data
            else:
              print(f"Unexpected JSON structure in file: {ele}")
              continue
    
            # Ensure results is a list or dict before writing
            if isinstance(results, (list, dict)):
              with open(new_file_name, 'w', encoding='utf-8') as f:
                f.write(json.dumps(results, indent=4))
            else:
              print(f"Unexpected type for results in file: {ele}")
          except (json.JSONDecodeError, KeyError) as e:
            print(f"Error processing file {ele}: {e}")
    
      def jsontoexcel(self):
        jsonfile_path = os.path.join(self.out...
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019
Organization logo

US Consumer Complaints Against Businesses

2 Million records including product, company name, issue details and response

Explore at:
zip(343188956 bytes)Available download formats
Dataset updated
Oct 9, 2022
Authors
Jeffery Mandrake
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

2,121,458 records

I used Google Colab to check out this dataset and pull the column names using Pandas.

Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

I did not modify the dataset.

Use it to practice with dataframes - Pandas or PySpark on Google Colab:

!unzip complaints.csv.zip

import pandas as pd df = pd.read_csv('complaints.csv') df.columns

df.head() etc.

Search
Clear search
Close search
Google apps
Main menu