100+ datasets found

Shopping Mall
kaggle.com
zip
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
Explore at:
zip(22852 bytes)Available download formats
Dataset updated
Dec 15, 2023
Authors
Anshul Pachauri
Description
Libraries Import:

Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").
p
Dataframe of Significant Stems.csv
psycharchives.org
Updated Oct 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Dataframe of Significant Stems.csv [Dataset]. https://www.psycharchives.org/en/item/84d5c4b2-579d-48a0-8d4e-f02f2ae99192
Explore at:
Dataset updated
Oct 8, 2019
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:
Merge number of excel file,convert into csv file
kaggle.com
zip
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
Explore at:
zip(6731 bytes)Available download formats
Dataset updated
Mar 30, 2024
Authors
Aashirvad pandey
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Project Description:

Title: Pandas Data Manipulation and File Conversion

Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

Key Objectives:

DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.

Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.

File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

Tools and Libraries Used:

Python

Pandas

Project Implementation:

DataFrame Creation:

Import the Pandas library.

Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.

Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).

Data Manipulation:

Add new columns to the DataFrame representing derived data or computations based on existing columns.

Filter the DataFrame to include only specific rows based on certain conditions.

Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.

File Conversion:

Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.

Convert the DataFrame into a CSV (.csv) file using the to_csv() function.

Save the generated files to the local file system for further analysis or sharing.

Expected Outcome:

Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

Conclusion:

The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .
AI4Code Train Dataframe
kaggle.com
zip
Updated May 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darien Schettler (2022). AI4Code Train Dataframe [Dataset]. https://www.kaggle.com/datasets/dschettler8845/ai4code-train-dataframe
Explore at:
zip(622120487 bytes)Available download formats
Dataset updated
May 12, 2022
Authors
Darien Schettler
Description
[EDIT/UPDATE]

There are a few important updates.

When SAVING the pd.Dataframe as a .csv, the following command should be used to avoid improper interpretation of newline character(s).

train_df.to_csv( "train.csv", index=False, encoding='utf-8', quoting=csv.QUOTE_NONNUMERIC # <== THIS IS REQUIRED )

When LOADING the .csv as a pd.Dataframe, the following command must be used to avoid misinterpretation of NaN like strings (null, nan, ...) as pd.NaN values.

train_df = pd.read_csv( "/kaggle/input/ai4code-train-dataframe/train.csv", keep_default_na=False # <== THIS IS REQUIRED )
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
data.europa.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
h
property_based_matching
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanhao Qu, property_based_matching [Dataset]. https://huggingface.co/datasets/yhqu/property_based_matching
Explore at:
Authors
Yuanhao Qu
Description
Property Based Matching Dataset

This dataset is part of the Deep Principle Bench collection.

Files

property_based_matching.csv: Main dataset file

Usage

import pandas as pd from datasets import load_dataset

Load the dataset

dataset = load_dataset("yhqu/property_based_matching")

Or load directly as pandas DataFrame

df = pd.read_csv("hf://datasets/yhqu/property_based_matching/property_based_matching.csv")

Citation

Please cite this work if… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/property_based_matching.
Pandas Example
kaggle.com
zip
Updated Jan 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Marques (2021). Pandas Example [Dataset]. https://www.kaggle.com/franciscomcm/pandas-example
Explore at:
zip(1342 bytes)Available download formats
Dataset updated
Jan 24, 2021
Authors
Francisco Marques
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Pandas is a powerful package to process tabular data. This dataset provides the absolute minimum amount of data needed to start exploring the capabilities of this package.

Content

This dataset contains very basic examples to explore handy operations with Pandas DataFrames. There are 3 CSV files in the dataset: - thermometer_A.csv and thermometer_B.csv contain synthetic data representing temperature measurements over a full day by two devices. - fertiliser_plant_growth.csv contains synthetic data represeting the growth of 3 groups of plants (control, fertilizer A and fertilizer B).

Acknowledgments

Banner image by Sid Balachandran on Unsplash
Z
Longitudinal corpus of privacy policies
data.niaid.nih.gov
Updated Dec 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wagner, Isabel (2022). Longitudinal corpus of privacy policies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5841138
Explore at:
Dataset updated
Dec 12, 2022
Dataset provided by
University of Basel
Authors
Wagner, Isabel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.

policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.

labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

Details on the methodology can be found in the accompanying paper.
h
descriptor_prediction
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuanhao Qu, descriptor_prediction [Dataset]. https://huggingface.co/datasets/yhqu/descriptor_prediction
Explore at:
Authors
Yuanhao Qu
Description
Descriptor Prediction Dataset

This dataset is part of the Deep Principle Bench collection.

Files

descriptor_prediction.csv: Main dataset file

Usage

import pandas as pd from datasets import load_dataset

Load the dataset

dataset = load_dataset("yhqu/descriptor_prediction")

Or load directly as pandas DataFrame

df = pd.read_csv("hf://datasets/yhqu/descriptor_prediction/descriptor_prediction.csv")

Citation

Please cite this work if you use… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/descriptor_prediction.

The Device Activity Report with Complete Knowledge (DARCK) for NILM

zenodo.org

bin, xz

Updated Sep 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymous Anonymous; Anonymous Anonymous (2025). The Device Activity Report with Complete Knowledge (DARCK) for NILM [Dataset]. http://doi.org/10.5281/zenodo.17159850

Explore at:

bin, xzAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.17159850

Dataset updated

Sep 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonymous Anonymous; Anonymous Anonymous

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

1. Abstract

This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

2. Dataset Overview

Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
Aggregate Meter: eBZ DD3
Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
Sampling Rate: 1 Hz
Measured Quantity: Active Power
Unit of Measurement: Watt
Duration: 6 months
Format: Single CSV file (`DARCK.csv`)
Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

3. Download and Usage

The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

As it contains longer off periods with zeros, the CSV file is nicely compressible.

To extract it use: xz -d DARCK.csv.xz.
The compression leads to a 97% smaller file size (From 4GB to 90.9MB).

To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

python
import pandas as pd

df = pd.read_csv("DARCK.csv", parse_dates=["time"])

4. Measurement Setup

The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

5. File Format (`DARCK.csv`)

The dataset is provided as a single comma-separated value (CSV) file.

The first row is a header containing the column names.
All power values are rounded to the first decimal place.
There are no missing values in the final dataset.
Each row represents 1 second, from start of measuring in March until the end in September.

Column Descriptions

Column Name	Data Type	Unit	Description
`time`	datetime	-	Timestamp for the reading in `YYYY-MM-DD HH:MM:SS`
`main`	float	Watt	Total aggregate power consumption for the apartment, measured at the main electrical panel.
`[appliance_name]`	float	Watt	Power consumption of an individual appliance (e.g., `lightbathroom`, `fridge`, `sherlockpc`). See Section 8 for a full list.
Aggregate Columns
`aggr_chargers`	float	Watt	The sum of `sherlockcharger`, `sherlocklaptop`, `watsoncharger`, `watsonlaptop`, `watsonipadcharger`, `kitchencharger`.
`aggr_stoveplates`	float	Watt	The sum of `stoveplatel1` and `stoveplatel2`.
`aggr_lights`	float	Watt	The sum of `lightbathroom`, `lighthallway`, `lightsherlock`, `lightkitchen`, `lightlivingroom`, `lightwatson`, `lightstoreroom`, `fcob`, `sherlockalarmclocklight`, `sherlockfloorlamphue`, `sherlockledstrip`, `livingfloorlamphue`, `sherlockglobe`, `watsonfloorlamp`, `watsondesklamp` and `watsonledmap`.
Analysis Columns
`inaccuracy`	float	Watt	As no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

6. Data Postprocessing Pipeline

The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

6.1. Main Meter (`main`) Postprocessing

The aggregate power data required several cleaning steps to ensure accuracy.

Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

6.2. Sub-metered Devices (`shellies`) Postprocessing

The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

Grouping: Data was grouped by the unique device identifier.
Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill().
This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

6.3. Merging and Finalization

Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

7. Manual Corrections and Known Data Issues

During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

8. Appliance Details and Multipurpose Plugs

The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance.

h
oldIT2modIT
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massimo Romano, oldIT2modIT [Dataset]. https://huggingface.co/datasets/cybernetic-m/oldIT2modIT
Explore at:
Authors
Massimo Romano
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Download the dataset

At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")

You can visualize the dataset with: df.head()

To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)

Dataset Description

This is an italian dataset formed by 200 old (ancient) italian sentence and… See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
zenodo.org
data.niaid.nih.gov
bin, csv, zip
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6965147
Dataset updated
Dec 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
H
Excel_file_inluding_measured_values
dataverse.harvard.edu
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elham Akbari (2025). Excel_file_inluding_measured_values [Dataset]. http://doi.org/10.7910/DVN/1BNFJ5
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/1BNFJ5
Dataset updated
Sep 30, 2025
Dataset provided by
Harvard Dataverse
Authors
Elham Akbari
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The excel files are the measured values (mostly particle properties from microscopic images) in .csv format. The files are readable by pandas dataframe and exported by pandas dataframe. Files with Intensity values are the intensity values from maximum z-stack projection of fluorescent micrographs taken from the particles inside the DLD device
H
National Water Model RouteLinks CSV
beta.hydroshare.org
hydroshare.org
+2more
zip
Updated Oct 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason A Regina; Austin Raney (2021). National Water Model RouteLinks CSV [Dataset]. http://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
Explore at:
zip(1.1 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
Dataset updated
Oct 15, 2021
Dataset provided by
HydroShare
Authors
Jason A Regina; Austin Raney
License
https://mit-license.org/https://mit-license.org/
Time period covered
Apr 12, 2019 - Oct 14, 2021
Area covered

Description
This resource contains "RouteLink" files for version 2.1.6 of the National Water Model which are used to associate feature identifiers for computational reaches to relevant metadata. These data are important for comparing NWM feature data to USGS streamflow and lake observations. The original RouteLink files are in NetCDF format and available here: https://www.nco.ncep.noaa.gov/pmb/codes/nwprod

This resource includes the files in a human-friendlier CSV format for easier use, and a machine-friendlier file in HDF5 format which contains a single pandas.DataFrame. The scripts and supporting utilities are also included for users that wish to rebuild these files. Source code is hosted here: https://github.com/jarq6c/NWM_RouteLinks
Z
GENEActiv accelerometer file related to the #120 OxWearables / stepcount...
data.niaid.nih.gov
zenodo.org
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wattelez, Guillaume (2024). GENEActiv accelerometer file related to the #120 OxWearables / stepcount issue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11557420
Explore at:
Dataset updated
Nov 25, 2024
Dataset provided by
University of New Caledonia
Authors
Wattelez, Guillaume
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An example of .bin file that have an IndexError when processing.

Consider #120 OxWearables / stepcount issue for more details.

The .csv files are 1-second epoch conversions from the .bin file and contain time, x, y, z columns. The conversion was done by:

reading the .bin with the GENEAread R package.

keeping only the time, x, y and z columns.

saving the data.frame into a .csv file.

The only difference between the .csv files is the column format used for the time column before saving:

time column in XXXXXX_....csv had a string class

time column in XXXXXT....csv had a "POSIXct" "POSIXt" class
Exploring the Relationship between Lipid Profile Changes, Growth and...
zenodo.org
Updated Nov 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saúl Fernandes; Saúl Fernandes; Diana Ilyaskina; Diana Ilyaskina (2023). Exploring the Relationship between Lipid Profile Changes, Growth and Reproduction in Folsomia candida Exposed to Teflubenzuron Over Time [Dataset]. http://doi.org/10.5281/zenodo.10069317
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10069317
Dataset updated
Nov 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Saúl Fernandes; Saúl Fernandes; Diana Ilyaskina; Diana Ilyaskina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This submission provides csv files with the data files from a comprehensive study aimed at investigating the effects of sublethal concentrations of the insecticide teflubenzuron on the survival, growth, reproduction, and lipid changes of theCollembola Folsomia candida over different exposure periods.
The dataset files are provided in CSV format with Comma Separated Values:
Survival_Growth_Reproduction_FolsomiaCandida_RawData.csv
Main_Lipid_Class_Log10_RawData.csv
Lipid_Categories_RawData.csv
Bioaccumulation_SoilQuantification_RawData.csv
Description of the files
The csv Survival_Growth_Reproduction_FolsomiaCandida_RawData.csv containes the dataframe in vertical format with the data for suvival of Folsomia candida, changes in biomass and reproduction.
The csv file Main_Lipid_Class_Log10_RawData provides the dataframe in horizontal format with the log10 transformed total lipid content of the main lipid classes in Folsomia candida. Data used to produce Figure 5 of the manuscript. Full name of lipids abbreviation are provided in supplementary information of the manuscript.
The csv file Lipid_Categories_RawData provides the dataframe with lipid categories in Folsomia candida. Data was not used in the data analysis described in the manuscript.
The csv file Bioaccumulation_SoilQuantification_RawData.csvontaines the dataframe in vertical format with the data for soil quantification of the insecticide in the soil, in the animals and the calculation for bioaccumulation factor.
Variables in the files:
File 1:
sample: sample unique ID
days: day of sampling
dose: dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
solvent: solvent used in the soil (acetone or water)
age: age of the animals Folsomia candida at the day of sampling
survival: number of surviving adults of Folsomia candida
total.biomass(mg): total biomass in mg of the pool of animals in each sample
biomass.individual(ng): "total.biomass(mg)" devided by the "survival" and converted to ng.
offspring: number of offspring produced in each sample (NA for samples where this number is not possible to acess)
Files 2 and 3:
sample: sample unique ID
days: day of sampling
dose: dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
File 4:
sample: sample unique ID
days: day of sampling
dose.nominal(ng/g): nominal dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
dose.measured(ng/g): measured dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
biomass.total.wet(g): "total.biomass(mg)" devided by the "survival" and converted to ng.
number.animals: umber of surviving adults of Folsomia candida in each sample
biomass.individual.dry(g): "total.biomass(mg)" devided by the "number.animals" and converted to ng.
measured.insecticide.animals(ng): measured amount of teflubenzuron (insecticide) in the pool of animals (mg a.s. kg soil -1)
accumulation.insecticide(ng/g of dry body weight): "measured.insecticide.animals(ng)" devided by "biomass.total.wet(g)"
baf: "accumulation.insecticide" devided by "dose.measured(ng/g)"
[NA stands for samples lost/ not measured]
This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 859891.
This publication reflects only the authors' view and the European Commission is not responsible for any use that may be made of the information it contains.
r
Myrstener et al. (2025) Downstream temperature effects of boreal forest...
researchdata.se
su.figshare.com
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caroline Greiser; Lenka Kuglerová; Maria Myrstener (2025). Myrstener et al. (2025) Downstream temperature effects of boreal forest clearcutting vary with riparian buffer width - Data and Code [Dataset]. http://doi.org/10.17045/STHLMUNI.27188004
Explore at:
Unique identifier
https://doi.org/10.17045/STHLMUNI.27188004
Dataset updated
Feb 17, 2025
Dataset provided by
Stockholm University
Authors
Caroline Greiser; Lenka Kuglerová; Maria Myrstener
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please read the readme.txt !

This depository contains raw and clean data (.csv), as well as the R-scripts (.r) that process the data, create the plots and the models.

We recommend to go through the R-scripts in their chronological order.

Code was developed in the R software:

R version 4.4.1 (2024-06-14 ucrt) -- "Race for Your Life" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64

****** List of files ********************************

Data

---raw

72 files from 72 Hobo data loggers

names: site_position_medium.csv

example: "20_20_down_water.csv" (site = 20, position = 20 m downstream, medium = water)

---clean

site_logger_position_medium.csv list of all sites, their loggers, their position and medium in which they were placed

loggerdata_compiled.csv all raw logger data (see above) compiled into one dataframe, for column names see below

Daily_loggerdata.csv all data aggregated to daily mean, max and min values, for column names see below

CG_site_distance_pairs.csv all logger positions for each stream and their pairwise geographical distance in meters

Discharge_site7.csv Discharge data for the same season as logger data from a reference stream

buffer_width_eniro_CG.csv measured and averaged buffer widths for each of the studied streams (in m)

Scripts

01_compile_clean_loggerdata.r

02_aggregate_loggerdata.r

03_model_stream_temp_summer.r

03b_model_stream_temp_autumn.r

04_calculate_warming_cooling_rates_summer.r

04b_calculate_warming_cooling_rates_autumn.r

05_model_air_temp_summer.r

05b_model_air_temp_autumn.r

06_plot_representative_time_series_temp_discharge.r

****** Column names ********************************

Most column names are self explaining, and are also explained in the R code.

Below some detailed info on two dataframes (.csv) - the column names are similar in other csv files

File "loggerdata_compiled.csv" [in Data/clean/ ]

"Logger.SN" Logger serial number

"Timestamp" Datetime, YYYY-MM-DD HH:MM:SS

"Temp" temperature in °C

"Illum" light in lux

"Year" YYYY

"Month" MM

"Day" DD

"Hour" HH

"Minute" MM

"Second" SS

"tz" time zone

"path" file path

"site" stream/site ID

"file" file name

"medium" "water" or "air"

"position" one of 6 positions along the stream: up, mid, end, 20, 70, 150

"date" YYYY-MM-DD

File "Daily_loggerdata.csv" [in Data/clean/ ]

"date" ... (see above)

"Logger.SN" Logger serial number

"mean_temp" mean daily temperature

"min_temp" minimum daily temperature

"max_temp" maximum daily temperature

"path" ...

"site" ...

"file" ...

"medium" ...

"position" ...

"buffer" one of 3 buffer categories: no, thin, wide

"Temp.max.ref" maximum daily temperature of the upstream reference logger

"Temp.min.ref" minimum daily temperature of the upstream reference logger

"Temp.mean.ref" mean daily temperature of the upstream reference logger

"Temp.max.dev" max. temperature difference to upstream reference

"Temp.min.dev" min. temperature difference to upstream reference

"Temp.mean.dev" mean temperature difference to upstream reference

Paper abstract:

Clearcutting increases temperatures of forest streams, and in temperate zones, the effects can extend far downstream. Here, we studied whether similar patterns are found in colder, boreal zones and if riparian buffers can prevent stream water from heating up. We recorded temperature at 45 locations across nine streams with varying buffer widths. In these streams, we compared upstream (control) reaches with reaches in clearcuts and up to 150 m downstream. In summer, we found daily maximum water temperature increases on clearcuts up to 4.1 °C with the warmest week ranging from 12.0 to 18.6 °C. We further found that warming was sustained downstream of clearcuts to 150 m in three out of six streams with buffers < 10 m. Surprisingly, temperature patterns in autumn resembled those in summer, yet with lower absolute temperatures (maximum warming was 1.9 °C in autumn). Clearcuts in boreal forests can indeed warm streams, and because these temperature effects are propagated downstream, we risk catchment-scale effects and cumulative warming when streams pass through several clearcuts. In this study, riparian buffers wider than 15 m protected against water temperature increases; hence, we call for a general increase of riparian buffer width along small streams in boreal forests.
Datasets for the Carpentry-style RNA-seq lesson
zenodo.org
application/gzip, tsv +1
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tijs bliek; marc galland; tijs bliek; marc galland (2023). Datasets for the Carpentry-style RNA-seq lesson [Dataset]. http://doi.org/10.5281/zenodo.6205896
Explore at:
tsv, application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6205896
Dataset updated
Mar 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
tijs bliek; marc galland; tijs bliek; marc galland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Lesson files

For all compressed files, go to the Shell and uncompress using `tar -xzvf myarchive.tar.gz`.

1) Bioinformatic files: bioinformatic_tutorial_files.tar.gz

This archive contains the following datasets:

FASTQ files from Arabidopsis leaf RNA-seq:

Arabidopsis_sample3.fq.gz

Arabidopsis_sample1.fq.gz

Arabidopsis_sample4.fq.gz

Arabidopsis_sample2.fq.gz

Arabidopsis thaliana genome assembly and genome annotation:

AtChromosome1.fa.gz

ath_annotation.gff3.gz

The sequence of sequencing adapters in adapters.fasta.

2) Gene counts usable with DESeq2 and R: tutorial.tar.gz

This archive contains the following datasets:

raw_counts.csv: a dataframe of the sample raw counts. It is a comma-separate values file therefore data are separated by commas ','.

samples_to_conditions.csv: a dataframe that indicates the correspondence between samples and experimental conditions (e.g. control, treated).

differential_genes.csv: a dataframe that contains the result of the DESeq2 analysis specifying this contrast in `DESEq2::results()` function: `contrast = c("infected", "Pseudomonas_syringae_DC3000", "mock")

The raw_counts.csv file was obtained by running the `v0.1.1` version of a RNA-Seq bioinformatic pipeline on the mRNA-Seq sequencing files from Vogel et al. (2016): https://www.ebi.ac.uk/ena/data/view/PRJEB13938.

Please read the original study (Vogel et al. 2016): https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.14036

====

Exercise files

1) NASA spaceflight

The NASA GeneLab experiment GLDS-38 performed transcriptomics and proteomics of Arabidopsis seedlings in microgravity by sending seedlings to the International Space Station (ISS).

The raw counts, scaled counts and sample to conditions files are available in the ZIP archive

2) Deforges 2019 hormone-treatments: deforges_2019.tar.gz

This archive contains:

arabidopsis_root_hormones_raw_counts.csv

arabidopsis_root_hormones_sample2condition.csv

dataset01_IAA_arabidopsis_root_raw_counts.csv

dataset02_ABA_arabidopsis_root_raw_counts.csv

dataset03_ACC_arabidopsis_root_raw_counts.csv

dataset04_MeJA_arabidopsis_root_raw_counts.csv

The arabidopsis_root_hormones_raw_counts.csv file contains all gene counts from all hormones. Separate datasets were made for each hormone for convenience.
Restaurant Dish Orders in Power BI
kaggle.com
zip
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fords (2024). Restaurant Dish Orders in Power BI [Dataset]. https://www.kaggle.com/datasets/fords001/restaurant-dish-orders
Explore at:
zip(620177 bytes)Available download formats
Dataset updated
Oct 30, 2024
Authors
Fords
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
In this data analysis, I used the dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground .Which has a license (License: Public Domain). Public domain license work is free for use by anyone for any purpose without restriction under copyright law. Public domain is the form of open/free, since no one owns or controls the material in any way. Dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground has 3 dataframes in csv format: ‘restaurant_db_data_dictionary.csv’ as an instruction or description of the relationships between tables. ‘order_details.csv’ - it has columns order_details_id,order_id, order_date, order_time,item_id ‘menu_items.csv‘ - it has columns menu_item_id , item_name ,category ,price .

Using 3 dataframes we will create new dataframe ‘order_details_table' (result dataframe in Power BI file restaurant_orders_result.pbix). Based on this new dataframe, we will generate various charts visualizations in the file restaurant_orders_result_charts.pbix and also attach the charts here .Below is a more detailed description of how I created the new dataframe 'order_details_table' ,and the visualizations, including bar charts and pie charts.

I will use Power Bi in this project . 1. Delete all rows where value rows is ‘NULL’ in the column ‘item_id’ from the dataframe ‘order_details’. For this, I use Power Query Editor and the ‘Keep Rows’ function. And keep all rows except for 'NULL' values . 2. Combine 2 columns ‘order_date’ and ‘order_time’ to 1 column ‘order_date_time’ in the format MM/DD/YY HH:MM:SS 3. We also need to merge two dataframes into one dataframe ‘order_details_table’ using the ‘Merge Queries’ function in Power Query Editor and choose inner join (only matching rows). In the dataframe ‘restaurant_db_data_dictionary.csv’ we find information that column ‘item_id’ from ‘order_details’ table matches the ‘menu_item_id’ in the ‘menu_items’ table and combine 2 tables by common column id ‘menu_item_id’ and ‘item_id’ . 4. We remove the columns that we don’t need and also create a new ‘order_id’ with unique number for each order.

As a result we have 6 columns in the new dataframe ‘order_details_table’ , such as: order_details_id: A unique identifier for each dish within an order, order_id : The unique identifier for each order or transaction , order_date_time : The date when the order was created in the format (MM/DD/YY HH:MM:SS) , menu_item_category : The category to which the dish belongs , menu_item_name : The name of the dish on the menu , menu_item_price : The price of the dish .

Table order_detail_tables from Power BI file restaurant_orders_result.pbix https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F1098315c0e34255b67ad3419aa113bf0%2Fdataframe.png?generation=1730269164808705&alt=media" alt="">

I have also created bar charts and pie charts to display the results from the new dataframe. These plots are included in the file ‘restaurant_orders_result_charts.pbix’ . And you can find pictures of charts below.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F4254696bbd3d7e0fc5f456c226c39114%2Fpicture_1.png?generation=1730269227195114&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F71092cf769862cf7364fe1ccac9fad83%2Fpicture_2.png?generation=1730269249147687&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F528ef51ecf21f006b0c21b65503e03fa%2Fpicture_3.png?generation=1730269284640753&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F147c240da4be5bfe9da057a8bc5d5939%2Fpicture_4.png?generation=1730269300346146&alt=media" alt="">

I also attached the original and new files to this project, thank you.
g
Dataset with four years of condition monitoring technical language...
gimi9.com
Updated Jan 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-5878-hafd-ms27/
Explore at:
Dataset updated
Jan 8, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Sweden
Description
This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title. Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']

Facebook

Twitter

Click to copy link

Link copied

Cite

Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall

Shopping Mall

Explore at:

zip(22852 bytes)Available download formats

Dataset updated

Dec 15, 2023

Authors

Anshul Pachauri

Description

Libraries Import:

Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").

Clear search

Close search

Google apps

Main menu

Shopping Mall

Dataframe of Significant Stems.csv

Merge number of excel file,convert into csv file

AI4Code Train Dataframe

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

property_based_matching

Load the dataset

Or load directly as pandas DataFrame

Pandas Example

Context

Content

Acknowledgments

Longitudinal corpus of privacy policies

descriptor_prediction

Load the dataset

Or load directly as pandas DataFrame

The Device Activity Report with Complete Knowledge (DARCK) for NILM

1. Abstract

2. Dataset Overview

3. Download and Usage

4. Measurement Setup

5. File Format (DARCK.csv)

Column Descriptions

Column Name

Data Type

Unit

Description

6. Data Postprocessing Pipeline

6.1. Main Meter (main) Postprocessing

6.2. Sub-metered Devices (shellies) Postprocessing

6.3. Merging and Finalization

7. Manual Corrections and Known Data Issues

8. Appliance Details and Multipurpose Plugs

oldIT2modIT

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

Excel_file_inluding_measured_values

National Water Model RouteLinks CSV

GENEActiv accelerometer file related to the #120 OxWearables / stepcount...

Exploring the Relationship between Lipid Profile Changes, Growth and...

Myrstener et al. (2025) Downstream temperature effects of boreal forest...

Datasets for the Carpentry-style RNA-seq lesson

Restaurant Dish Orders in Power BI

Dataset with four years of condition monitoring technical language...

Shopping Mall

5. File Format (`DARCK.csv`)

6.1. Main Meter (`main`) Postprocessing

6.2. Sub-metered Devices (`shellies`) Postprocessing