100+ datasets found

US Consumer Complaints Against Businesses
kaggle.com
zip
Updated Oct 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019
Explore at:
zip(343188956 bytes)Available download formats
Dataset updated
Oct 9, 2022
Authors
Jeffery Mandrake
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
2,121,458 records

I used Google Colab to check out this dataset and pull the column names using Pandas.

Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

I did not modify the dataset.

Use it to practice with dataframes - Pandas or PySpark on Google Colab:

!unzip complaints.csv.zip

import pandas as pd df = pd.read_csv('complaints.csv') df.columns

df.head() etc.
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
zenodo.org
data.niaid.nih.gov
bin, csv, zip
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6965147
Dataset updated
Dec 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
Shopping Mall
kaggle.com
zip
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
Explore at:
zip(22852 bytes)Available download formats
Dataset updated
Dec 15, 2023
Authors
Anshul Pachauri
Description
Libraries Import:

Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:

The Device Activity Report with Complete Knowledge (DARCK) for NILM

zenodo.org

bin, xz

Updated Sep 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymous Anonymous; Anonymous Anonymous (2025). The Device Activity Report with Complete Knowledge (DARCK) for NILM [Dataset]. http://doi.org/10.5281/zenodo.17159850

Explore at:

bin, xzAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.17159850

Dataset updated

Sep 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonymous Anonymous; Anonymous Anonymous

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1. Abstract

This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

2. Dataset Overview

Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
Aggregate Meter: eBZ DD3
Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
Sampling Rate: 1 Hz
Measured Quantity: Active Power
Unit of Measurement: Watt
Duration: 6 months
Format: Single CSV file (`DARCK.csv`)
Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

3. Download and Usage

The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

As it contains longer off periods with zeros, the CSV file is nicely compressible.

To extract it use: xz -d DARCK.csv.xz.
The compression leads to a 97% smaller file size (From 4GB to 90.9MB).

To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

python
import pandas as pd

df = pd.read_csv("DARCK.csv", parse_dates=["time"])

4. Measurement Setup

The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

5. File Format (`DARCK.csv`)

The dataset is provided as a single comma-separated value (CSV) file.

The first row is a header containing the column names.
All power values are rounded to the first decimal place.
There are no missing values in the final dataset.
Each row represents 1 second, from start of measuring in March until the end in September.

Column Descriptions

Column Name	Data Type	Unit	Description
`time`	datetime	-	Timestamp for the reading in `YYYY-MM-DD HH:MM:SS`
`main`	float	Watt	Total aggregate power consumption for the apartment, measured at the main electrical panel.
`[appliance_name]`	float	Watt	Power consumption of an individual appliance (e.g., `lightbathroom`, `fridge`, `sherlockpc`). See Section 8 for a full list.
Aggregate Columns
`aggr_chargers`	float	Watt	The sum of `sherlockcharger`, `sherlocklaptop`, `watsoncharger`, `watsonlaptop`, `watsonipadcharger`, `kitchencharger`.
`aggr_stoveplates`	float	Watt	The sum of `stoveplatel1` and `stoveplatel2`.
`aggr_lights`	float	Watt	The sum of `lightbathroom`, `lighthallway`, `lightsherlock`, `lightkitchen`, `lightlivingroom`, `lightwatson`, `lightstoreroom`, `fcob`, `sherlockalarmclocklight`, `sherlockfloorlamphue`, `sherlockledstrip`, `livingfloorlamphue`, `sherlockglobe`, `watsonfloorlamp`, `watsondesklamp` and `watsonledmap`.
Analysis Columns
`inaccuracy`	float	Watt	As no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

6. Data Postprocessing Pipeline

The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

6.1. Main Meter (`main`) Postprocessing

The aggregate power data required several cleaning steps to ensure accuracy.

Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

6.2. Sub-metered Devices (`shellies`) Postprocessing

The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

Grouping: Data was grouped by the unique device identifier.
Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill().
This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

6.3. Merging and Finalization

Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

7. Manual Corrections and Known Data Issues

During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

8. Appliance Details and Multipurpose Plugs

The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance.

h
descriptor_prediction
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanhao Qu, descriptor_prediction [Dataset]. https://huggingface.co/datasets/yhqu/descriptor_prediction
Explore at:
Authors
Yuanhao Qu
Description
Descriptor Prediction Dataset

This dataset is part of the Deep Principle Bench collection.

Files

descriptor_prediction.csv: Main dataset file

Usage

import pandas as pd from datasets import load_dataset

Load the dataset

dataset = load_dataset("yhqu/descriptor_prediction")

Or load directly as pandas DataFrame

df = pd.read_csv("hf://datasets/yhqu/descriptor_prediction/descriptor_prediction.csv")

Citation

Please cite this work if you use… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/descriptor_prediction.
h
property_based_matching
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanhao Qu, property_based_matching [Dataset]. https://huggingface.co/datasets/yhqu/property_based_matching
Explore at:
Authors
Yuanhao Qu
Description
Property Based Matching Dataset

This dataset is part of the Deep Principle Bench collection.

Files

property_based_matching.csv: Main dataset file

Usage

import pandas as pd from datasets import load_dataset

Load the dataset

dataset = load_dataset("yhqu/property_based_matching")

Or load directly as pandas DataFrame

df = pd.read_csv("hf://datasets/yhqu/property_based_matching/property_based_matching.csv")

Citation

Please cite this work if… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/property_based_matching.
Z
3D skeletons UP-Fall Dataset
data.niaid.nih.gov
Updated Jul 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KOFFI, Tresor (2024). 3D skeletons UP-Fall Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12773012
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
CESI LINEACT
Authors
KOFFI, Tresor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
3D skeletons UP-Fall Dataset

Different between Fall and Impact detection

Overview

This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios

Data Collection

The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.

CSV Structure

The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:

C: Camera (1 or 2)

S: Subject (1 to 5)

A: Activity (1 to N, representing different activities)

T: Trial (1 to 3)

subject1/`: Contains CSV files for Subject 1.

C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1

C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1

C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1

C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1

C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1

C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1

subject2/`: Contains CSV files for Subject 2.

C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2

C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2

C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2

C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2

C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2

C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2

subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.

Column Descriptions

Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:

Column Name

Description

joint_1_x

X coordinate of joint 1

joint_1_y

Y coordinate of joint 1

joint_1_z

Z coordinate of joint 1

joint_2_x

X coordinate of joint 2

joint_2_y

Y coordinate of joint 2

joint_2_z

Z coordinate of joint 2

...

...

joint_n_x

X coordinate of joint n

joint_n_y

Y coordinate of joint n

joint_n_z

Z coordinate of joint n

LABEL

Label indicating impact (1) or non-impact (0)

Example

Here is an example of what a row in one of the CSV files might look like:

joint_1_x

joint_1_y

joint_1_z

joint_2_x

joint_2_y

joint_2_z

...

joint_n_x

joint_n_y

joint_n_33

LABEL

0.123

0.456

0.789

0.234

0.567

0.890

...

0.345

0.678

0.901

0

Usage

This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.

Using github

Clone the repository:

-bash git clone

https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset

Navigate to the directory:

-bash -cd 3D_skeletons-UP-Fall-Dataset

Examples

Here's a simple example of how to load and inspect a sample data file using Python:```pythonimport pandas as pd

Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

data = pd.read_csv('subject1/C1S1A1T1.csv')print(data.head())
R
Evaluation results of a knee distraction unloader brace on a robotic test...
entrepot.recherche.data.gouv.fr
bin, pdf +1
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lea Boillereaux; Lea Boillereaux; Simon Le Floc'h; Simon Le Floc'h; Franck Jourdan; Franck Jourdan; Arnaud Tanguy; Gille Camp; Gille Camp; Anaïs Vaysse; Gilles Dusfour; Gilles Dusfour; Abderrahmane Kheddar; Abderrahmane Kheddar; Arnaud Tanguy; Anaïs Vaysse (2025). Evaluation results of a knee distraction unloader brace on a robotic test bench – CSV files of the data obtained from the test bench – Model of the bones – Model of the tested cam of the distraction brace [Dataset]. http://doi.org/10.57745/VCALE0
Explore at:
text/comma-separated-values(36000614), text/comma-separated-values(36241909), text/comma-separated-values(36286698), text/comma-separated-values(36159297), bin(4615519), bin(14218738), text/comma-separated-values(36103576), text/comma-separated-values(37097577), bin(2350527), text/comma-separated-values(36014331), bin(4807364), text/comma-separated-values(36279419), text/comma-separated-values(35882565), pdf(93247), bin(2377437), text/comma-separated-values(36068067), bin(5916146), bin(2482876), text/comma-separated-values(36424785), bin(6588074), text/comma-separated-values(36224058), text/comma-separated-values(36024776), bin(9872504), text/comma-separated-values(36369481), text/comma-separated-values(36353970), pdf(36547), text/comma-separated-values(36507716), text/comma-separated-values(36462611), bin(96967), bin(78659), text/comma-separated-values(36124222), text/comma-separated-values(37042098), text/comma-separated-values(36136856), text/comma-separated-values(36391516), text/comma-separated-values(36913802), text/comma-separated-values(36596530), bin(7450705), text/comma-separated-values(36107973), text/comma-separated-values(36907525), text/comma-separated-values(12156782), text/comma-separated-values(36091853), text/comma-separated-values(36307863), text/comma-separated-values(37583326), bin(15096627), text/comma-separated-values(37182113), bin(8016924), text/comma-separated-values(36347087), text/comma-separated-values(36450906), text/comma-separated-values(36562118), text/comma-separated-values(36137120), text/comma-separated-values(36238235), text/comma-separated-values(35987625), text/comma-separated-values(36089715), bin(45229806), bin(3162170), text/comma-separated-values(36628692), text/comma-separated-values(36231435), text/comma-separated-values(35946191), text/comma-separated-values(36226364), bin(12725469), text/comma-separated-values(35858211), bin(15181492), text/comma-separated-values(36174476), bin(100215316), pdf(74165), text/comma-separated-values(36967243), bin(4917993), text/comma-separated-values(36255041)Available download formats
Unique identifier
https://doi.org/10.57745/VCALE0
Dataset updated
Sep 2, 2025
Dataset provided by
Recherche Data Gouv
Authors
Lea Boillereaux; Lea Boillereaux; Simon Le Floc'h; Simon Le Floc'h; Franck Jourdan; Franck Jourdan; Arnaud Tanguy; Gille Camp; Gille Camp; Anaïs Vaysse; Gilles Dusfour; Gilles Dusfour; Abderrahmane Kheddar; Abderrahmane Kheddar; Arnaud Tanguy; Anaïs Vaysse
License
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.57745/VCALE0https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.57745/VCALE0
Time period covered
Oct 1, 2021 - Aug 31, 2025
Description
Dataset Description Dataset Description This dataset is associated with the publication titled "A Distraction Knee-Brace and a Robotic Testbed for Tibiofemoral Load Reduction during Squatting" in IEEE Transactions on Medical Robotics and Bionics. It provides comprehensive data supporting the development and evaluation of a knee distraction brace designed to reduce tibiofemoral contact forces during flexion. Contents Cam Profiles STL files of the initial cam profiles designed based on averaged tibiofemoral contact force data collected from 5 squats of a patient with an instrumented prosthesis (K7L) from the CAMS Knee dataset (accessible via https://orthoload.com/). Optimized cam profiles, corrected based on experimental results, are also included. These profiles enable patient-specific adjustments to account for the non-linear evolution of tibiofemoral contact forces with flexion angles. Experimental Results CSV files containing raw results from robotic testbed experiments, testing the knee brace under various initial pneumatic pressures in the actuators. Data is provided for tests conducted: Without the brace, With the initial cam profiles, With the optimized cam profiles. Each CSV file corresponds to a specific test condition, detailing forces and kinematics observed during squatting. 3D Models of Bones and Testbed Components Geometries of the femur head and tibial plateau used in the robotic testbed experiments, provided in STEP, STL, and SLDPRT/SLDASM formats. A README file describes the biomechanical coordinate systems used for: Force and kinematic control of the robotic testbed, Result interpretation and visualization. How to Open and Read the Provided Files The dataset includes files in CSV, SLDPRT, SLDASM, STL, and IGES formats. Below are recommended software solutions, with a preference for open-source options: CSV (Comma-Separated Values): Can be opened with Microsoft Excel, Google Sheets, or open-source software like LibreOffice Calc or Python (using pandas). SLDPRT & SLDASM (SolidWorks Parts and Assemblies): These files are native to SolidWorks. For viewing without SolidWorks, use eDrawings Viewer (free) or FreeCAD (limited compatibility). STL (3D Model Format): Can be opened with MeshLab, FreeCAD, or Blender. Most 3D printing software (like Cura or PrusaSlicer) also support STL. IGES (3D CAD Exchange Format): Can be read with FreeCAD, Fusion 360 (free for personal use), or OpenCascade-based software like CAD Assistant. For full compatibility, commercial software like SolidWorks or CATIA may be required for SLDPRT and SLDASM files. However, FreeCAD and other open-source tools provide partial support. See the associated publication and the README files included in the dataset for more information.
Z
Linux Kernel binary size
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jun 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugo MARTIN; Mathieu ACHER (2021). Linux Kernel binary size [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4943883
Explore at:
Dataset updated
Jun 14, 2021
Dataset provided by
IRISA
Authors
Hugo MARTIN; Mathieu ACHER
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset containing measurements of Linux Kernel binary size after compilation. The reported size, in the column "perf", is the size in bytes of the vmlinux file. In contains also a column "active_options" reporting the number of activated options (set at "y"). All other columns, the list being reported in the file "Linux_options.json", are Linux kernel options. The sampling have been made using randconfig. The version of Linux used is 4.13.3.

Not all available options are present. First, it only contains options about the x86 and 64 bits version. Then, all non-tristate options have been ignored. Finally, options not having multiple value through the whole dataset, due to not enough variability in the sampling, are ignored. All options are encoded as 0 for "n" and "m" options value, and 1 for "y".

In python, importing the dataset using pandas will attribute all columns to int64, which will lead to a great consumption of memory (~50GB). We provide this way to import it using less than 1 GB of memory by setting options columns to int8.

import pandas as pd import json import numpy

with open("Linux_options.json","r") as f: linux_options = json.load(f)

Load csv by setting options as int8 to save a lot of memory

return pd.read_csv("Linux.csv", dtype={f:numpy.int8 for f in linux_options})
h
Data from: English-French-Translation
huggingface.co
kaggle.com
Updated Oct 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FrancophonIA (2024). English-French-Translation [Dataset]. https://huggingface.co/datasets/FrancophonIA/English-French-Translation
Explore at:
Dataset updated
Oct 12, 2024
Dataset authored and provided by
FrancophonIA
Area covered
French
Description
[!NOTE] Dataset origin: https://www.kaggle.com/datasets/adewoleakorede/english-french-translation

Dataset

I used this dataset for my project on translating from English to French using the transformer architecture. To load the CSV file with pandas, use the parameter encoding_errors='ignore' - I couldn't fix the issues with the encoding.
NYC Jobs Dataset (Filtered Columns)
kaggle.com
zip
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). NYC Jobs Dataset (Filtered Columns) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/nyc-jobs-filtered-cols
Explore at:
zip(93408 bytes)Available download formats
Dataset updated
Oct 5, 2022
Authors
Jeffery Mandrake
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
New York
Description
Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial

The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data

I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing

Once the csv file is uploaded to Google Colab, use these commands to process the file.

import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)
Ecommerce Dataset (Products & Sizes Included)
kaggle.com
zip
Updated Nov 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anvit kumar (2025). Ecommerce Dataset (Products & Sizes Included) [Dataset]. https://www.kaggle.com/datasets/anvitkumar/shopping-dataset
Explore at:
zip(1274856 bytes)Available download formats
Dataset updated
Nov 13, 2025
Authors
Anvit kumar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📦 Ecommerce Dataset (Products & Sizes Included)

🛍️ Essential Data for Building an Ecommerce Website & Analyzing Online Shopping Trends 📌 Overview This dataset contains 1,000+ ecommerce products, including detailed information on pricing, ratings, product specifications, seller details, and more. It is designed to help data scientists, developers, and analysts build product recommendation systems, price prediction models, and sentiment analysis tools.

🔹 Dataset Features

Column Name Description product_id Unique identifier for the product title Product name/title product_description Detailed product description rating Average customer rating (0-5) ratings_count Number of ratings received initial_price Original product price discount Discount percentage (%) final_price Discounted price currency Currency of the price (e.g., USD, INR) images URL(s) of product images delivery_options Available delivery methods (e.g., standard, express) product_details Additional product attributes breadcrumbs Category path (e.g., Electronics > Smartphones) product_specifications Technical specifications of the product amount_of_stars Distribution of star ratings (1-5 stars) what_customers_said Customer reviews (sentiments) seller_name Name of the product seller sizes Available sizes (for clothing, shoes, etc.) videos Product video links (if available) seller_information Seller details, such as location and rating variations Different variants of the product (e.g., color, size) best_offer Best available deal for the product more_offers Other available deals/offers category Product category

📊 Potential Use Cases

📌 Build an Ecommerce Website: Use this dataset to design a functional online store with product listings, filtering, and sorting. 🔍 Price Prediction Models: Predict product prices based on features like ratings, category, and discount. 🎯 Recommendation Systems: Suggest products based on user preferences, rating trends, and customer feedback. 🗣 Sentiment Analysis: Analyze what_customers_said to understand customer satisfaction and product popularity. 📈 Market & Competitor Analysis: Track pricing trends, popular categories, and seller performance. 🔍 Why Use This Dataset? ✅ Rich Feature Set: Includes all necessary ecommerce attributes. ✅ Realistic Pricing & Rating Data: Useful for price analysis and recommendations. ✅ Multi-Purpose: Suitable for machine learning, web development, and data visualization. ✅ Structured Format: Easy-to-use CSV format for quick integration.

📂 Dataset Format CSV file (ecommerce_dataset.csv) 1000+ samples Multi-category coverage 🔗 How to Use? Download the dataset from Kaggle. Load it in Python using Pandas: python Copy Edit import pandas as pd
df = pd.read_csv("ecommerce_dataset.csv")
df.head() Explore trends & patterns using visualization tools (Seaborn, Matplotlib). Build models & applications based on the dataset!
Merge number of excel file,convert into csv file
kaggle.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.

Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.

File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

Python

Pandas

Project Implementation:

DataFrame Creation:

Import the Pandas library.

Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.

Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).

Data Manipulation:

Add new columns to the DataFrame representing derived data or computations based on existing columns.

Filter the DataFrame to include only specific rows based on certain conditions.

Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.

File Conversion:

Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.

Convert the DataFrame into a CSV (.csv) file using the to_csv() function.

Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
g
Dataset with four years of condition monitoring technical language...
gimi9.com
Updated Jan 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-5878-hafd-ms27/
Explore at:
Dataset updated
Jan 8, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Sweden
Description
This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title. Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']
Z
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in...
data-staging.niaid.nih.gov
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antici, Francesco; Bartolini, Andrea; Domke, Jens; Kiziltan, Zeynep; Yamamoto, Keiji (2024). F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_11467482
Explore at:
Dataset updated
Jun 10, 2024
Dataset provided by
University of Bologna
RIKEN Center for Computational Science
Authors
Antici, Francesco; Bartolini, Andrea; Domke, Jens; Kiziltan, Zeynep; Yamamoto, Keiji
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
F-DATA is a novel workload dataset containing the data of around 24 million jobs executed on Supercomputer Fugaku, over the three years of public system usage (March 2021-April 2024). Each job data contains an extensive set of features, such as exit code, duration, power consumption and performance metrics (e.g. #flops, memory bandwidth, operational intensity and memory/compute bound label), which allows for a multitude of job characteristics prediction. The full list of features can be found in the file feature_list.csv.

The sensitive data appears both in anonymized and encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes, without violating data privacy. The scripts used to generate the dataset are available in the F-DATA GitHub repository, along with a series of plots and instruction on how to load the data.

F-DATA is composed of 38 files, with each YY_MM.parquet file containing the data of the jobs submitted in the month MM of the year YY.

The files of F-DATA are saved as .parquet files. It is possible to load such files as dataframes by leveraging the pandas APIs, after installing pyarrow (pip install pyarrow). A single file can be read with the following Python instrcutions:

Importing pandas library

import pandas as pd

Read the 21_01.parquet file in a dataframe format

df = pd.read_parquet("21_01.parquet")

df.head()
Z
SELTO Dataset
data.niaid.nih.gov
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dittmer, Sören; Erzmann, David; Harms, Henrik; Falck, Rielson; Gosch, Marco (2023). SELTO Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7034898
Explore at:
Dataset updated
May 23, 2023
Dataset provided by
University of Bremen, University of Cambridge
ArianeGroup GmbH
University of Bremen
Authors
Dittmer, Sören; Erzmann, David; Harms, Henrik; Falck, Rielson; Gosch, Marco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Benchmark Dataset for Deep Learning for 3D Topology Optimization

This dataset represents voxelized 3D topology optimization problems and solutions. The solutions have been generated in cooperation with the Ariane Group and Synera using the Altair OptiStruct implementation of SIMP within the Synera software. The SELTO dataset consists of four different 3D datasets for topology optimization, called disc simple, disc complex, sphere simple and sphere complex. Each of these datasets is further split into a training and a validation subset.

The following paper provides full documentation and examples:

Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

The Python library DL4TO (https://github.com/dl4to/dl4to) can be used to download and access all SELTO dataset subsets. Each TAR.GZ file container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and contains an associated ground truth solution. Each problem-solution pair consists of two files, where one contains voxel-wise information and the other file contains scalar information. For example, the i-th sample is stored in the files i.csv and i_info.csv, where i.csv contains all voxel-wise information and i_info.csv contains all scalar information. We define all spatially varying quantities at the center of the voxels, rather than on the vertices or surfaces. This allows for a shape-consistent tensor representation.

For the i-th sample, the columns of i_info.csv correspond to the following scalar information:

E - Young's modulus [Pa]

ν - Poisson's ratio [-]

σ_ys - a yield stress [Pa]

h - discretization size of the voxel grid [m]

The columns of i.csv correspond to the following voxel-wise information:

x, y, z - the indices that state the location of the voxel within the voxel mesh

Ω_design - design space information for each voxel. This is a ternary variable that indicates the type of density constraint on the voxel. 0 and 1 indicate that the density is fixed at 0 or 1, respectively. -1 indicates the absence of constraints, i.e., the density in that voxel can be freely optimized

Ω_dirichlet_x, Ω_dirichlet_y, Ω_dirichlet_z - homogeneous Dirichlet boundary conditions for each voxel. These are binary variables that define whether the voxel is subject to homogeneous Dirichlet boundary constraints in the respective dimension

F_x, F_y, F_z - floating point variables that define the three spacial components of external forces applied to each voxel. All forces are body forces given in [N/m^3]

density - defines the binary voxel-wise density of the ground truth solution to the topology optimization problem

How to Import the Dataset

with DL4TO: With the Python library DL4TO (https://github.com/dl4to/dl4to) it is straightforward to download and access the dataset as a customized PyTorch torch.utils.data.Dataset object. As shown in the tutorial this can be done via:

from dl4to.datasets import SELTODataset

dataset = SELTODataset(root=root, name=name, train=train)

Here, root is the path where the dataset should be saved. name is the name of the SELTO subset and can be one of "disc_simple", "disc_complex", "sphere_simple" and "sphere_complex". train is a boolean that indicates whether the corresponding training or validation subset should be loaded. See here for further documentation on the SELTODataset class.

without DL4TO: After downloading and unzipping, any of the i.csv files can be manually imported into Python as a Pandas dataframe object:

import pandas as pd

root = ... file_path = f'{root}/{i}.csv' columns = ['x', 'y', 'z', 'Ω_design','Ω_dirichlet_x', 'Ω_dirichlet_y', 'Ω_dirichlet_z', 'F_x', 'F_y', 'F_z', 'density'] df = pd.read_csv(file_path, names=columns)

Similarly, we can import a i_info.csv file via:

file_path = f'{root}/{i}_info.csv' info_column_names = ['E', 'ν', 'σ_ys', 'h'] df_info = pd.read_csv(file_path, names=info_columns)

We can extract PyTorch tensors from the Pandas dataframe df using the following function:

import torch

def get_torch_tensors_from_dataframe(df, dtype=torch.float32): shape = df[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1 voxels = [df['x'].values, df['y'].values, df['z'].values]

Ω_design = torch.zeros(1, *shape, dtype=int) Ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['Ω_design'].values.astype(int)) Ω_Dirichlet = torch.zeros(3, *shape, dtype=dtype) Ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_x'].values, dtype=dtype) Ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_y'].values, dtype=dtype) Ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_z'].values, dtype=dtype) F = torch.zeros(3, *shape, dtype=dtype) F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_x'].values, dtype=dtype) F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_y'].values, dtype=dtype) F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_z'].values, dtype=dtype) density = torch.zeros(1, *shape, dtype=dtype) density[:, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['density'].values, dtype=dtype) return Ω_design, Ω_Dirichlet, F, density
NYC Yellow Taxi 2024 Data in CSV format
kaggle.com
zip
Updated Nov 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ishaan Singh Shekhawat (2025). NYC Yellow Taxi 2024 Data in CSV format [Dataset]. https://www.kaggle.com/datasets/ishaansinghshekhawat/nyc-yellow-taxi-2024-data-in-csv-format
Explore at:
zip(847185464 bytes)Available download formats
Dataset updated
Nov 25, 2025
Authors
Ishaan Singh Shekhawat
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
New York
Description
This dataset contains NYC Yellow Taxi trip records for the full year 2024, consolidated into three CSV files for easier loading, analysis, and modeling. The data originates from the official NYC Taxi & Limousine Commission (TLC) Trip Record Data releases, which provide detailed information about every yellow taxi trip recorded in New York City.

Each trip includes fields such as pickup and dropoff timestamps, locations, distances, fares, taxes, surcharges, passenger counts, and payment information.

The original TLC files are provided as monthly Parquet files. They have been cleaned and consolidated into three manageable CSV files for ease of use in Kaggle kernels, pandas, SQL, and Spark workflows.

Source: NYC Taxi & Limousine Commission (TLC) Original Data Link: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

This is public data provided by NYC TLC and may be used freely under open data guidelines.
VANET-IRAQ-BSM-Attacks
zenodo.org
application/gzip
Updated Sep 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mohammad abbas alkifaee; mohammad abbas alkifaee; Fahad Ghalib Abdulkadhim; Fahad Ghalib Abdulkadhim (2025). VANET-IRAQ-BSM-Attacks [Dataset]. http://doi.org/10.5281/zenodo.17167970
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17167970
Dataset updated
Sep 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
mohammad abbas alkifaee; mohammad abbas alkifaee; Fahad Ghalib Abdulkadhim; Fahad Ghalib Abdulkadhim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 21, 2025
Area covered
Iraq
Description
# Baghdad VANET BSM-Based

Attack Dataset (F2MD Scenarios) — Raw CSVs (File-level by Attack ID)

🆄🅽🅸🆅🅴🆁🆂🅸🆃🆈 🅾🅵 🅺🆄🅵🅰

ʙʏ Mᴏʜᴀᴍᴍᴀᴅ Aʙʙᴀs Sʜᴀʀᴇᴇғ & Dʀ.Fᴀʜᴀᴅ Gʜᴀʟɪʙ

## Overview
This package groups the original raw CSV files by attack type at the file level. No content changes were made to the CSVs—files are copied as-is into family folders.

## Dataset Summary
Total files: 17

Total records (all files): 35830975

Attack records (label=1): 2359250

Benign records (label=0): 33471725

Other/unknown labels: 0

## Attack Families
- ConstPos — Frozen constant position
- ConstPosOffset — Constant offset to coordinates
- RandomPos — Random fake positions
- RandomPosOffset — Random offset to the true position
- ConstSpeedOffset — Constant speed bias
- RandomSpeed — Random implausible speeds
- EventualStop — Gradual or sudden stop spoofing
- Disruptive — Protocol fields/values deliberately disruptive
- DataReplay — Replay of past data
- StaleMessages — Old or delayed messages
- DoS — High-rate flooding
- DoSRandom — Randomly fluctuating flooding
- DoSDisruptive — Intermittent aggressive flooding
- GridSybil — Coordinated fake identities (Sybil)
- DoSRandomSybil — Random DoS with Sybil identities
- DoSDisruptiveSybil — Aggressive DoS with Sybil identities
- Unknown — Files not mapped to a specific family

## Per-family Totals
| Family | Files | Records | Attack (label=1) | Benign (label=0) | Other |
|:--|--:|--:|--:|--:|--:|
| ConstPos | 1 | 1327217 | 63504 | 1263713 | 0 |
| ConstPosOffset | 1 | 1305206 | 62749 | 1242457 | 0 |
| ConstSpeedOffset | 1 | 2150356 | 102707 | 2047649 | 0 |
| DataReplay | 1 | 922481 | 43916 | 878565 | 0 |
| Disruptive | 1 | 1063416 | 52038 | 1011378 | 0 |
| DoS | 1 | 1241649 | 167490 | 1074159 | 0 |
| DoSDisruptive | 1 | 705817 | 97649 | 608168 | 0 |
| DoSDisruptiveSybil | 1 | 2113005 | 24365 | 2088640 | 0 |
| DoSRandom | 1 | 6440583 | 867983 | 5572600 | 0 |
| DoSRandomSybil | 1 | 2499578 | 30382 | 2469196 | 0 |
| EventualStop | 1 | 2124546 | 101617 | 2022929 | 0 |
| GridSybil | 1 | 622012 | 108728 | 513284 | 0 |
| RandomPos | 1 | 1087145 | 53253 | 1033892 | 0 |
| RandomPosOffset | 1 | 3258131 | 158686 | 3099445 | 0 |
| RandomSpeed | 1 | 3676823 | 176305 | 3500518 | 0 |
| StaleMessages | 1 | 770026 | 36645 | 733381 | 0 |
| Unknown | 1 | 4522984 | 211233 | 4311751 | 0 |

## Per-file Details
| Family | File | Records | Attack (1) | Benign (0) | Other | Label column present |
|:--|:--|--:|--:|--:|--:|:--:|
| ConstPos | 1.csv | 1327217 | 63504 | 1263713 | 0 | yes |
| ConstPosOffset | 2.csv | 1305206 | 62749 | 1242457 | 0 | yes |
| ConstSpeedOffset | 6.csv | 2150356 | 102707 | 2047649 | 0 | yes |
| DataReplay | 11.csv | 922481 | 43916 | 878565 | 0 | yes |
| Disruptive | 10.csv | 1063416 | 52038 | 1011378 | 0 | yes |
| DoS | 13.csv | 1241649 | 167490 | 1074159 | 0 | yes |
| DoSDisruptive | 15.csv | 705817 | 97649 | 608168 | 0 | yes |
| DoSDisruptiveSybil | 19.csv | 2113005 | 24365 | 2088640 | 0 | yes |
| DoSRandom | 14.csv | 6440583 | 867983 | 5572600 | 0 | yes |
| DoSRandomSybil | 18.csv | 2499578 | 30382 | 2469196 | 0 | yes |
| EventualStop | 9.csv | 2124546 | 101617 | 2022929 | 0 | yes |
| GridSybil | 16.csv | 622012 | 108728 | 513284 | 0 | yes |
| RandomPos | 3.csv | 1087145 | 53253 | 1033892 | 0 | yes |
| RandomPosOffset | 4.csv | 3258131 | 158686 | 3099445 | 0 | yes |
| RandomSpeed | 7.csv | 3676823 | 176305 | 3500518 | 0 | yes |
| StaleMessages | 12.csv | 770026 | 36645 | 733381 | 0 | yes |
| Unknown | 5.csv | 4522984 | 211233 | 4311751 | 0 | yes |

## How to Load (Python)
Use pandas to read any CSV under data/
h
crispr_delivery
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanhao Qu, crispr_delivery [Dataset]. https://huggingface.co/datasets/yhqu/crispr_delivery
Explore at:
Authors
Yuanhao Qu
Description
Crispr Delivery Dataset

This dataset is part of the Deep Principle Bench collection.

Files

crispr_delivery.csv: Main dataset file

Usage

import pandas as pd from datasets import load_dataset

Load the dataset

dataset = load_dataset("yhqu/crispr_delivery")

Or load directly as pandas DataFrame

df = pd.read_csv("hf://datasets/yhqu/crispr_delivery/crispr_delivery.csv")

Citation

Please cite this work if you use this dataset in your research.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019

US Consumer Complaints Against Businesses

2 Million records including product, company name, issue details and response

Explore at:

zip(343188956 bytes)Available download formats

Dataset updated

Oct 9, 2022

Authors

Jeffery Mandrake

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

2,121,458 records

I used Google Colab to check out this dataset and pull the column names using Pandas.

Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

I did not modify the dataset.

Use it to practice with dataframes - Pandas or PySpark on Google Colab:

!unzip complaints.csv.zip

import pandas as pd df = pd.read_csv('complaints.csv') df.columns

df.head() etc.

Clear search

Close search

Google apps

Main menu

US Consumer Complaints Against Businesses

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

Shopping Mall

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

The Device Activity Report with Complete Knowledge (DARCK) for NILM

1. Abstract

2. Dataset Overview

3. Download and Usage

4. Measurement Setup

5. File Format (DARCK.csv)

Column Descriptions

Column Name

Data Type

Unit

Description

6. Data Postprocessing Pipeline

6.1. Main Meter (main) Postprocessing

6.2. Sub-metered Devices (shellies) Postprocessing

6.3. Merging and Finalization

7. Manual Corrections and Known Data Issues

8. Appliance Details and Multipurpose Plugs

descriptor_prediction

Load the dataset

Or load directly as pandas DataFrame

property_based_matching

Load the dataset

Or load directly as pandas DataFrame

3D skeletons UP-Fall Dataset

Load a sample data file for Subject 1, Camera 1, Activity 1, Trial 1

Evaluation results of a knee distraction unloader brace on a robotic test...

Linux Kernel binary size

Load csv by setting options as int8 to save a lot of memory

Data from: English-French-Translation

NYC Jobs Dataset (Filtered Columns)

Ecommerce Dataset (Products & Sizes Included)

Merge number of excel file,convert into csv file

Dataset with four years of condition monitoring technical language...

F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in...

Importing pandas library

Read the 21_01.parquet file in a dataframe format

SELTO Dataset

NYC Yellow Taxi 2024 Data in CSV format

VANET-IRAQ-BSM-Attacks

🆄🅽🅸🆅🅴🆁🆂🅸🆃🆈 🅾🅵 🅺🆄🅵🅰

ʙʏ Mᴏʜᴀᴍᴍᴀᴅ Aʙʙᴀs Sʜᴀʀᴇᴇғ & Dʀ.Fᴀʜᴀᴅ Gʜᴀʟɪʙ

crispr_delivery

Load the dataset

Or load directly as pandas DataFrame

US Consumer Complaints Against Businesses

2 Million records including product, company name, issue details and response

5. File Format (`DARCK.csv`)

6.1. Main Meter (`main`) Postprocessing

6.2. Sub-metered Devices (`shellies`) Postprocessing