100+ datasets found
  1. Shopping Mall

    • kaggle.com
    zip
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
    Explore at:
    zip(22852 bytes)Available download formats
    Dataset updated
    Dec 15, 2023
    Authors
    Anshul Pachauri
    Description

    Libraries Import:

    Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

    Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

    Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

    Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

    Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

    Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

    Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

    Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

    Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").

  2. p

    Dataframe of Significant Stems.csv

    • psycharchives.org
    Updated Oct 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Dataframe of Significant Stems.csv [Dataset]. https://www.psycharchives.org/en/item/84d5c4b2-579d-48a0-8d4e-f02f2ae99192
    Explore at:
    Dataset updated
    Oct 8, 2019
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Systematic reviews are the method of choice to synthesize research evidence. To identify main topics (so-called hot spots) relevant to large corpora of original publications in need of a synthesis, one must address the “three Vs” of big data (volume, velocity, and variety), especially in loosely defined or fragmented disciplines. For this purpose, text mining and predictive modeling are very helpful. Thus, we applied these methods to a compilation of documents related to digitalization in aesthetic, arts, and cultural education, as a prototypical, loosely defined, fragmented discipline, and particularly to quantitative research within it (QRD-ACE). By broadly querying the abstract and citation database Scopus with terms indicative of QRD-ACE, we identified a corpus of N = 55,553 publications for the years 2013–2017. As the result of an iterative approach of text mining, priority screening, and predictive modeling, we identified n = 8,304 potentially relevant publications of which n = 1,666 were included after priority screening. Analysis of the subject distribution of the included publications revealed video games as a first hot spot of QRD-ACE. Topic modeling resulted in aesthetics and cultural activities on social media as a second hot spot, related to 4 of k = 8 identified topics. This way, we were able to identify current hot spots of QRD-ACE by screening less than 15% of the corpus. We discuss implications for harnessing text mining, predictive modeling, and priority screening in future research syntheses and avenues for future original research on QRD-ACE. Dataset for: Christ, A., Penthin, M., & Kröner, S. (2019). Big Data and Digital Aesthetic, Arts, and Cultural Education: Hot Spots of Current Quantitative Research. Social Science Computer Review, 089443931988845. https://doi.org/10.1177/0894439319888455:

  3. Merge number of excel file,convert into csv file

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashirvad pandey (2024). Merge number of excel file,convert into csv file [Dataset]. https://www.kaggle.com/datasets/aashirvadpandey/merge-number-of-excel-fileconvert-into-csv-file
    Explore at:
    zip(6731 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    Aashirvad pandey
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Description:

    Title: Pandas Data Manipulation and File Conversion

    Overview: This project aims to demonstrate the basic functionalities of Pandas, a powerful data manipulation library in Python. In this project, we will create a DataFrame, perform some data manipulation operations using Pandas, and then convert the DataFrame into both Excel and CSV formats.

    Key Objectives:

    1. DataFrame Creation: Utilize Pandas to create a DataFrame with sample data.
    2. Data Manipulation: Perform basic data manipulation tasks such as adding columns, filtering data, and performing calculations.
    3. File Conversion: Convert the DataFrame into Excel (.xlsx) and CSV (.csv) file formats.

    Tools and Libraries Used:

    • Python
    • Pandas

    Project Implementation:

    1. DataFrame Creation:

      • Import the Pandas library.
      • Create a DataFrame using either a dictionary, a list of dictionaries, or by reading data from an external source like a CSV file.
      • Populate the DataFrame with sample data representing various data types (e.g., integer, float, string, datetime).
    2. Data Manipulation:

      • Add new columns to the DataFrame representing derived data or computations based on existing columns.
      • Filter the DataFrame to include only specific rows based on certain conditions.
      • Perform basic calculations or transformations on the data, such as aggregation functions or arithmetic operations.
    3. File Conversion:

      • Utilize Pandas to convert the DataFrame into an Excel (.xlsx) file using the to_excel() function.
      • Convert the DataFrame into a CSV (.csv) file using the to_csv() function.
      • Save the generated files to the local file system for further analysis or sharing.

    Expected Outcome:

    Upon completion of this project, you will have gained a fundamental understanding of how to work with Pandas DataFrames, perform basic data manipulation tasks, and convert DataFrames into different file formats. This knowledge will be valuable for data analysis, preprocessing, and data export tasks in various data science and analytics projects.

    Conclusion:

    The Pandas library offers powerful tools for data manipulation and file conversion in Python. By completing this project, you will have acquired essential skills that are widely applicable in the field of data science and analytics. You can further extend this project by exploring more advanced Pandas functionalities or integrating it into larger data processing pipelines.in this data we add number of data and make that data a data frame.and save in single excel file as different sheet name and then convert that excel file in csv file .

  4. AI4Code Train Dataframe

    • kaggle.com
    zip
    Updated May 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2022). AI4Code Train Dataframe [Dataset]. https://www.kaggle.com/datasets/dschettler8845/ai4code-train-dataframe
    Explore at:
    zip(622120487 bytes)Available download formats
    Dataset updated
    May 12, 2022
    Authors
    Darien Schettler
    Description

    [EDIT/UPDATE]

    There are a few important updates.

    1. When SAVING the pd.Dataframe as a .csv, the following command should be used to avoid improper interpretation of newline character(s).
    train_df.to_csv(
      "train.csv", index=False, 
      encoding='utf-8', 
      quoting=csv.QUOTE_NONNUMERIC  # <== THIS IS REQUIRED
    )
    
    1. When LOADING the .csv as a pd.Dataframe, the following command must be used to avoid misinterpretation of NaN like strings (null, nan, ...) as pd.NaN values.
    train_df = pd.read_csv(
      "/kaggle/input/ai4code-train-dataframe/train.csv", 
      keep_default_na=False  # <== THIS IS REQUIRED
    )
    
  5. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Ơarƫnas Girdzijauskas; Ơarƫnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Ơarƫnas Girdzijauskas; Ơarƫnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit 

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema 

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys 

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    {
      _id: 
  6. h

    property_based_matching

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanhao Qu, property_based_matching [Dataset]. https://huggingface.co/datasets/yhqu/property_based_matching
    Explore at:
    Authors
    Yuanhao Qu
    Description

    Property Based Matching Dataset

    This dataset is part of the Deep Principle Bench collection.

      Files
    

    property_based_matching.csv: Main dataset file

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("yhqu/property_based_matching")

    Or load directly as pandas DataFrame

    df = pd.read_csv("hf://datasets/yhqu/property_based_matching/property_based_matching.csv")

      Citation
    

    Please cite this work if
 See the full description on the dataset page: https://huggingface.co/datasets/yhqu/property_based_matching.

  7. Pandas Example

    • kaggle.com
    zip
    Updated Jan 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Marques (2021). Pandas Example [Dataset]. https://www.kaggle.com/franciscomcm/pandas-example
    Explore at:
    zip(1342 bytes)Available download formats
    Dataset updated
    Jan 24, 2021
    Authors
    Francisco Marques
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Pandas is a powerful package to process tabular data. This dataset provides the absolute minimum amount of data needed to start exploring the capabilities of this package.

    Content

    This dataset contains very basic examples to explore handy operations with Pandas DataFrames. There are 3 CSV files in the dataset: - thermometer_A.csv and thermometer_B.csv contain synthetic data representing temperature measurements over a full day by two devices. - fertiliser_plant_growth.csv contains synthetic data represeting the growth of 3 groups of plants (control, fertilizer A and fertilizer B).

    Acknowledgments

    Banner image by Sid Balachandran on Unsplash

  8. Z

    Longitudinal corpus of privacy policies

    • data.niaid.nih.gov
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Isabel (2022). Longitudinal corpus of privacy policies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5841138
    Explore at:
    Dataset updated
    Dec 12, 2022
    Dataset provided by
    University of Basel
    Authors
    Wagner, Isabel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a corpus of 56,416 unique privacy policy texts spanning the years 1996-2021.

    policy-texts.zip contains a directory of text files with the policy texts. File names are the hashes of the policy text.

    policy-metadata.zip contains two CSV files (can be imported into a pandas dataframe) with policy metadata including readability measures for each policy text.

    labeled-policies.zip contains CSV files with content labels for each policy. Labeling was done using a BERT classifier.

    Details on the methodology can be found in the accompanying paper.

  9. h

    descriptor_prediction

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuanhao Qu, descriptor_prediction [Dataset]. https://huggingface.co/datasets/yhqu/descriptor_prediction
    Explore at:
    Authors
    Yuanhao Qu
    Description

    Descriptor Prediction Dataset

    This dataset is part of the Deep Principle Bench collection.

      Files
    

    descriptor_prediction.csv: Main dataset file

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("yhqu/descriptor_prediction")

    Or load directly as pandas DataFrame

    df = pd.read_csv("hf://datasets/yhqu/descriptor_prediction/descriptor_prediction.csv")

      Citation
    

    Please cite this work if you use
 See the full description on the dataset page: https://huggingface.co/datasets/yhqu/descriptor_prediction.

  10. The Device Activity Report with Complete Knowledge (DARCK) for NILM

    • zenodo.org
    bin, xz
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Anonymous; Anonymous Anonymous (2025). The Device Activity Report with Complete Knowledge (DARCK) for NILM [Dataset]. http://doi.org/10.5281/zenodo.17159850
    Explore at:
    bin, xzAvailable download formats
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Anonymous; Anonymous Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1. Abstract

    This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

    2. Dataset Overview

    • Apartment: Two-person apartment, approx. 58mÂČ, located in Aachen, Germany.
    • Aggregate Meter: eBZ DD3
    • Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
    • Sampling Rate: 1 Hz
    • Measured Quantity: Active Power
    • Unit of Measurement: Watt
    • Duration: 6 months
    • Format: Single CSV file (`DARCK.csv`)
    • Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
    • Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

    3. Download and Usage

    The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

    As it contains longer off periods with zeros, the CSV file is nicely compressible.


    To extract it use: xz -d DARCK.csv.xz.
    The compression leads to a 97% smaller file size (From 4GB to 90.9MB).


    To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

    python
    import pandas as pd

    df = pd.read_csv("DARCK.csv", parse_dates=["time"])

    4. Measurement Setup

    The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

    5. File Format (DARCK.csv)

    The dataset is provided as a single comma-separated value (CSV) file.

    • The first row is a header containing the column names.
    • All power values are rounded to the first decimal place.
    • There are no missing values in the final dataset.
    • Each row represents 1 second, from start of measuring in March until the end in September.

    Column Descriptions

    Column Name

    Data Type

    Unit

    Description

    timedatetime-Timestamp for the reading in YYYY-MM-DD HH:MM:SS
    mainfloatWattTotal aggregate power consumption for the apartment, measured at the main electrical panel.
    [appliance_name]floatWattPower consumption of an individual appliance (e.g., lightbathroom, fridge, sherlockpc). See Section 8 for a full list.
    Aggregate Columns
    aggr_chargersfloatWattThe sum of sherlockcharger, sherlocklaptop, watsoncharger, watsonlaptop, watsonipadcharger, kitchencharger.
    aggr_stoveplatesfloatWattThe sum of stoveplatel1 and stoveplatel2.
    aggr_lightsfloatWattThe sum of lightbathroom, lighthallway, lightsherlock, lightkitchen, lightlivingroom, lightwatson, lightstoreroom, fcob, sherlockalarmclocklight, sherlockfloorlamphue, sherlockledstrip, livingfloorlamphue, sherlockglobe, watsonfloorlamp, watsondesklamp and watsonledmap.
    Analysis Columns
    inaccuracyfloatWattAs no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

    6. Data Postprocessing Pipeline

    The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

    6.1. Main Meter (main) Postprocessing

    The aggregate power data required several cleaning steps to ensure accuracy.

    1. Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
    2. Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
    3. Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

    6.2. Sub-metered Devices (shellies) Postprocessing

    The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

    1. Grouping: Data was grouped by the unique device identifier.
    2. Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill().
      This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

    6.3. Merging and Finalization

    1. Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
    2. Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

    7. Manual Corrections and Known Data Issues

    During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

    1. March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
    2. May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

    8. Appliance Details and Multipurpose Plugs

    The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance.

  11. h

    oldIT2modIT

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Romano, oldIT2modIT [Dataset]. https://huggingface.co/datasets/cybernetic-m/oldIT2modIT
    Explore at:
    Authors
    Massimo Romano
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Download the dataset

    At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")

    You can visualize the dataset with: df.head()

    To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)

      Dataset Description
    

    This is an italian dataset formed by 200 old (ancient) italian sentence and
 See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.

  12. Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    • The data is licensed through the Creative Commons Attribution 4.0 International.
    • If you have used our data and are publishing your work, we ask that you please reference both:
      1. this database through its DOI, and
      2. any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    • Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.
    • Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.
    • Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data
      • Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.
      • We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Clean_Data_v1-0-0.zip: contains all the downsampled data
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Database_References_v1-0-0.bib
      • Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    • The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data
    • Time[s]: time in seconds since the start of the test
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas
    data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    • The first column is the index of each data point
    • S/No: sample number recorded by the DAQ
    • System Date: Date and time of sample
    • Time[s]: time in seconds since the start of the test
    • C_1_Force[kN]: load cell force
    • C_1_DĂ©form1[mm]: extensometer displacement
    • C_1_DĂ©placement[mm]: cross-head displacement
    • Eng_Stress[MPa]: engineering stress
    • Eng_Strain[]: engineering strain
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    • hidden_index: internal reference ID
    • grade: material grade
    • spec: specifications for the material
    • source: base material for the test specimen
    • id: internal name for the specimen
    • lp: load protocol
    • size: type of specimen (M8, M12, M20)
    • gage_length_mm_: unreduced section length in mm
    • avg_reduced_dia_mm_: average measured diameter for the reduced section in mm
    • avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm
    • avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm
    • fy_n_mpa_: nominal yield stress
    • fu_n_mpa_: nominal ultimate stress
    • t_a_deg_c_: ambient temperature in degC
    • date: date of test
    • investigator: person(s) who conducted the test
    • location: laboratory where test was conducted
    • machine: setup used to conduct test
    • pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control
    • pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control
    • pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control
    • citekey: reference corresponding to the Database_References.bib file
    • yield_stress_mpa_: computed yield stress in MPa
    • elastic_modulus_mpa_: computed elastic modulus in MPa
    • fracture_strain: computed average true strain across the fracture surface
    • c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass
    • file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
              index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
              keep_default_na=False, na_values='')
    • citekey: reference in "Campaign_References.bib".
    • Grade: material grade.
    • Spec.: specifications (e.g., J2+N).
    • Yield Stress [MPa]: initial yield stress in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
    • Elastic Modulus [MPa]: initial elastic modulus in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    • The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:
      • A500
      • A992_Gr50
      • BCP325
      • BCR295
      • HYP400
      • S460NL
      • S690QL/25mm
      • S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
  13. H

    Excel_file_inluding_measured_values

    • dataverse.harvard.edu
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elham Akbari (2025). Excel_file_inluding_measured_values [Dataset]. http://doi.org/10.7910/DVN/1BNFJ5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Elham Akbari
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The excel files are the measured values (mostly particle properties from microscopic images) in .csv format. The files are readable by pandas dataframe and exported by pandas dataframe. Files with Intensity values are the intensity values from maximum z-stack projection of fluorescent micrographs taken from the particles inside the DLD device

  14. H

    National Water Model RouteLinks CSV

    • beta.hydroshare.org
    • hydroshare.org
    • +2more
    zip
    Updated Oct 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason A Regina; Austin Raney (2021). National Water Model RouteLinks CSV [Dataset]. http://doi.org/10.4211/hs.d154f19f762c4ee9b74be55f504325d3
    Explore at:
    zip(1.1 MB)Available download formats
    Dataset updated
    Oct 15, 2021
    Dataset provided by
    HydroShare
    Authors
    Jason A Regina; Austin Raney
    License

    https://mit-license.org/https://mit-license.org/

    Time period covered
    Apr 12, 2019 - Oct 14, 2021
    Area covered
    Description

    This resource contains "RouteLink" files for version 2.1.6 of the National Water Model which are used to associate feature identifiers for computational reaches to relevant metadata. These data are important for comparing NWM feature data to USGS streamflow and lake observations. The original RouteLink files are in NetCDF format and available here: https://www.nco.ncep.noaa.gov/pmb/codes/nwprod

    This resource includes the files in a human-friendlier CSV format for easier use, and a machine-friendlier file in HDF5 format which contains a single pandas.DataFrame. The scripts and supporting utilities are also included for users that wish to rebuild these files. Source code is hosted here: https://github.com/jarq6c/NWM_RouteLinks

  15. Z

    GENEActiv accelerometer file related to the #120 OxWearables / stepcount...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wattelez, Guillaume (2024). GENEActiv accelerometer file related to the #120 OxWearables / stepcount issue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11557420
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    University of New Caledonia
    Authors
    Wattelez, Guillaume
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example of .bin file that have an IndexError when processing.

    Consider #120 OxWearables / stepcount issue for more details.

    The .csv files are 1-second epoch conversions from the .bin file and contain time, x, y, z columns. The conversion was done by:

    reading the .bin with the GENEAread R package.

    keeping only the time, x, y and z columns.

    saving the data.frame into a .csv file.

    The only difference between the .csv files is the column format used for the time column before saving:

    time column in XXXXXX_....csv had a string class

    time column in XXXXXT....csv had a "POSIXct" "POSIXt" class

  16. Exploring the Relationship between Lipid Profile Changes, Growth and...

    • zenodo.org
    Updated Nov 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SaĂșl Fernandes; SaĂșl Fernandes; Diana Ilyaskina; Diana Ilyaskina (2023). Exploring the Relationship between Lipid Profile Changes, Growth and Reproduction in Folsomia candida Exposed to Teflubenzuron Over Time [Dataset]. http://doi.org/10.5281/zenodo.10069317
    Explore at:
    Dataset updated
    Nov 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    SaĂșl Fernandes; SaĂșl Fernandes; Diana Ilyaskina; Diana Ilyaskina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This submission provides csv files with the data files from a comprehensive study aimed at investigating the effects of sublethal concentrations of the insecticide teflubenzuron on the survival, growth, reproduction, and lipid changes of theCollembola Folsomia candida over different exposure periods.

    The dataset files are provided in CSV format with Comma Separated Values:

    1. Survival_Growth_Reproduction_FolsomiaCandida_RawData.csv
    2. Main_Lipid_Class_Log10_RawData.csv
    3. Lipid_Categories_RawData.csv
    4. Bioaccumulation_SoilQuantification_RawData.csv

    Description of the files

    1. The csv Survival_Growth_Reproduction_FolsomiaCandida_RawData.csv containes the dataframe in vertical format with the data for suvival of Folsomia candida, changes in biomass and reproduction.
    2. The csv file Main_Lipid_Class_Log10_RawData provides the dataframe in horizontal format with the log10 transformed total lipid content of the main lipid classes in Folsomia candida. Data used to produce Figure 5 of the manuscript. Full name of lipids abbreviation are provided in supplementary information of the manuscript.
    3. The csv file Lipid_Categories_RawData provides the dataframe with lipid categories in Folsomia candida. Data was not used in the data analysis described in the manuscript.
    4. The csv file Bioaccumulation_SoilQuantification_RawData.csvontaines the dataframe in vertical format with the data for soil quantification of the insecticide in the soil, in the animals and the calculation for bioaccumulation factor.

    Variables in the files:

    File 1:

    sample: sample unique ID

    • days: day of sampling
    • dose: dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
    • solvent: solvent used in the soil (acetone or water)
    • age: age of the animals Folsomia candida at the day of sampling
    • survival: number of surviving adults of Folsomia candida
    • total.biomass(mg): total biomass in mg of the pool of animals in each sample
    • biomass.individual(ng): "total.biomass(mg)" devided by the "survival" and converted to ng.
    • offspring: number of offspring produced in each sample (NA for samples where this number is not possible to acess)

    Files 2 and 3:

    • sample: sample unique ID
    • days: day of sampling
    • dose: dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)

    File 4:

    • sample: sample unique ID
    • days: day of sampling
    • dose.nominal(ng/g): nominal dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
    • dose.measured(ng/g): measured dose of teflubenzuron (insecticide) in soil (mg a.s. kg soil -1)
    • biomass.total.wet(g): "total.biomass(mg)" devided by the "survival" and converted to ng.
    • number.animals: umber of surviving adults of Folsomia candida in each sample
    • biomass.individual.dry(g): "total.biomass(mg)" devided by the "number.animals" and converted to ng.
    • measured.insecticide.animals(ng): measured amount of teflubenzuron (insecticide) in the pool of animals (mg a.s. kg soil -1)
    • accumulation.insecticide(ng/g of dry body weight): "measured.insecticide.animals(ng)" devided by "biomass.total.wet(g)"
    • baf: "accumulation.insecticide" devided by "dose.measured(ng/g)"

    [NA stands for samples lost/ not measured]

    This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie SkƂodowska-Curie grant agreement No 859891.

    This publication reflects only the authors' view and the European Commission is not responsible for any use that may be made of the information it contains.

  17. r

    Myrstener et al. (2025) Downstream temperature effects of boreal forest...

    • researchdata.se
    • su.figshare.com
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caroline Greiser; Lenka KuglerovĂĄ; Maria Myrstener (2025). Myrstener et al. (2025) Downstream temperature effects of boreal forest clearcutting vary with riparian buffer width - Data and Code [Dataset]. http://doi.org/10.17045/STHLMUNI.27188004
    Explore at:
    Dataset updated
    Feb 17, 2025
    Dataset provided by
    Stockholm University
    Authors
    Caroline Greiser; Lenka KuglerovĂĄ; Maria Myrstener
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please read the readme.txt !

    This depository contains raw and clean data (.csv), as well as the R-scripts (.r) that process the data, create the plots and the models.

    We recommend to go through the R-scripts in their chronological order.

    Code was developed in the R software:

    R version 4.4.1 (2024-06-14 ucrt) -- "Race for Your Life" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64

    ****** List of files ********************************

    • Data

    ---raw

    72 files from 72 Hobo data loggers

    names: site_position_medium.csv

    example: "20_20_down_water.csv" (site = 20, position = 20 m downstream, medium = water)

    ---clean

    site_logger_position_medium.csv list of all sites, their loggers, their position and medium in which they were placed

    loggerdata_compiled.csv all raw logger data (see above) compiled into one dataframe, for column names see below

    Daily_loggerdata.csv all data aggregated to daily mean, max and min values, for column names see below

    CG_site_distance_pairs.csv all logger positions for each stream and their pairwise geographical distance in meters

    Discharge_site7.csv Discharge data for the same season as logger data from a reference stream

    buffer_width_eniro_CG.csv measured and averaged buffer widths for each of the studied streams (in m)

    • Scripts

    01_compile_clean_loggerdata.r

    02_aggregate_loggerdata.r

    03_model_stream_temp_summer.r

    03b_model_stream_temp_autumn.r

    04_calculate_warming_cooling_rates_summer.r

    04b_calculate_warming_cooling_rates_autumn.r

    05_model_air_temp_summer.r

    05b_model_air_temp_autumn.r

    06_plot_representative_time_series_temp_discharge.r

    ****** Column names ********************************

    Most column names are self explaining, and are also explained in the R code.

    Below some detailed info on two dataframes (.csv) - the column names are similar in other csv files

    File "loggerdata_compiled.csv" [in Data/clean/ ]

    "Logger.SN" Logger serial number

    "Timestamp" Datetime, YYYY-MM-DD HH:MM:SS

    "Temp" temperature in °C

    "Illum" light in lux

    "Year" YYYY

    "Month" MM

    "Day" DD

    "Hour" HH

    "Minute" MM

    "Second" SS

    "tz" time zone

    "path" file path

    "site" stream/site ID

    "file" file name

    "medium" "water" or "air"

    "position" one of 6 positions along the stream: up, mid, end, 20, 70, 150

    "date" YYYY-MM-DD

    File "Daily_loggerdata.csv" [in Data/clean/ ]

    "date" ... (see above)

    "Logger.SN" Logger serial number

    "mean_temp" mean daily temperature

    "min_temp" minimum daily temperature

    "max_temp" maximum daily temperature

    "path" ...

    "site" ...

    "file" ...

    "medium" ...

    "position" ...

    "buffer" one of 3 buffer categories: no, thin, wide

    "Temp.max.ref" maximum daily temperature of the upstream reference logger

    "Temp.min.ref" minimum daily temperature of the upstream reference logger

    "Temp.mean.ref" mean daily temperature of the upstream reference logger

    "Temp.max.dev" max. temperature difference to upstream reference

    "Temp.min.dev" min. temperature difference to upstream reference

    "Temp.mean.dev" mean temperature difference to upstream reference

    Paper abstract:

    Clearcutting increases temperatures of forest streams, and in temperate zones, the effects can extend far downstream. Here, we studied whether similar patterns are found in colder, boreal zones and if riparian buffers can prevent stream water from heating up. We recorded temperature at 45 locations across nine streams with varying buffer widths. In these streams, we compared upstream (control) reaches with reaches in clearcuts and up to 150 m downstream. In summer, we found daily maximum water temperature increases on clearcuts up to 4.1 °C with the warmest week ranging from 12.0 to 18.6 °C. We further found that warming was sustained downstream of clearcuts to 150 m in three out of six streams with buffers < 10 m. Surprisingly, temperature patterns in autumn resembled those in summer, yet with lower absolute temperatures (maximum warming was 1.9 °C in autumn). Clearcuts in boreal forests can indeed warm streams, and because these temperature effects are propagated downstream, we risk catchment-scale effects and cumulative warming when streams pass through several clearcuts. In this study, riparian buffers wider than 15 m protected against water temperature increases; hence, we call for a general increase of riparian buffer width along small streams in boreal forests.

  18. Datasets for the Carpentry-style RNA-seq lesson

    • zenodo.org
    application/gzip, tsv +1
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tijs bliek; marc galland; tijs bliek; marc galland (2023). Datasets for the Carpentry-style RNA-seq lesson [Dataset]. http://doi.org/10.5281/zenodo.6205896
    Explore at:
    tsv, application/gzip, txtAvailable download formats
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    tijs bliek; marc galland; tijs bliek; marc galland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Lesson files

    For all compressed files, go to the Shell and uncompress using `tar -xzvf myarchive.tar.gz`.

    1) Bioinformatic files: bioinformatic_tutorial_files.tar.gz

    This archive contains the following datasets:

    FASTQ files from Arabidopsis leaf RNA-seq:

    • Arabidopsis_sample3.fq.gz
    • Arabidopsis_sample1.fq.gz
    • Arabidopsis_sample4.fq.gz
    • Arabidopsis_sample2.fq.gz

    Arabidopsis thaliana genome assembly and genome annotation:

    • AtChromosome1.fa.gz
    • ath_annotation.gff3.gz

    The sequence of sequencing adapters in adapters.fasta.

    2) Gene counts usable with DESeq2 and R: tutorial.tar.gz

    This archive contains the following datasets:

    • raw_counts.csv: a dataframe of the sample raw counts. It is a comma-separate values file therefore data are separated by commas ','.
    • samples_to_conditions.csv: a dataframe that indicates the correspondence between samples and experimental conditions (e.g. control, treated).
    • differential_genes.csv: a dataframe that contains the result of the DESeq2 analysis specifying this contrast in `DESEq2::results()` function: `contrast = c("infected", "Pseudomonas_syringae_DC3000", "mock")

    The raw_counts.csv file was obtained by running the `v0.1.1` version of a RNA-Seq bioinformatic pipeline on the mRNA-Seq sequencing files from Vogel et al. (2016): https://www.ebi.ac.uk/ena/data/view/PRJEB13938.

    Please read the original study (Vogel et al. 2016): https://nph.onlinelibrary.wiley.com/doi/full/10.1111/nph.14036

    ====

    Exercise files

    1) NASA spaceflight

    The NASA GeneLab experiment GLDS-38 performed transcriptomics and proteomics of Arabidopsis seedlings in microgravity by sending seedlings to the International Space Station (ISS).

    The raw counts, scaled counts and sample to conditions files are available in the ZIP archive

    2) Deforges 2019 hormone-treatments: deforges_2019.tar.gz

    This archive contains:

    • arabidopsis_root_hormones_raw_counts.csv
    • arabidopsis_root_hormones_sample2condition.csv
    • dataset01_IAA_arabidopsis_root_raw_counts.csv
    • dataset02_ABA_arabidopsis_root_raw_counts.csv
    • dataset03_ACC_arabidopsis_root_raw_counts.csv
    • dataset04_MeJA_arabidopsis_root_raw_counts.csv

    The arabidopsis_root_hormones_raw_counts.csv file contains all gene counts from all hormones. Separate datasets were made for each hormone for convenience.

  19. Restaurant Dish Orders in Power BI

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fords (2024). Restaurant Dish Orders in Power BI [Dataset]. https://www.kaggle.com/datasets/fords001/restaurant-dish-orders
    Explore at:
    zip(620177 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Fords
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    In this data analysis, I used the dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground .Which has a license (License: Public Domain). Public domain license work is free for use by anyone for any purpose without restriction under copyright law. Public domain is the form of open/free, since no one owns or controls the material in any way. Dataset ‘Restaurant Orders’ , from https://mavenanalytics.io/data-playground has 3 dataframes in csv format: ‘restaurant_db_data_dictionary.csv’ as an instruction or description of the relationships between tables. ‘order_details.csv’ - it has columns order_details_id,order_id, order_date, order_time,item_id ‘menu_items.csv‘ - it has columns menu_item_id , item_name ,category ,price .

    Using 3 dataframes we will create new dataframe ‘order_details_table' (result dataframe in Power BI file restaurant_orders_result.pbix). Based on this new dataframe, we will generate various charts visualizations in the file restaurant_orders_result_charts.pbix and also attach the charts here .Below is a more detailed description of how I created the new dataframe 'order_details_table' ,and the visualizations, including bar charts and pie charts.

    I will use Power Bi in this project . 1. Delete all rows where value rows is ‘NULL’ in the column ‘item_id’ from the dataframe ‘order_details’. For this, I use Power Query Editor and the ‘Keep Rows’ function. And keep all rows except for 'NULL' values . 2. Combine 2 columns ‘order_date’ and ‘order_time’ to 1 column ‘order_date_time’ in the format MM/DD/YY HH:MM:SS 3. We also need to merge two dataframes into one dataframe ‘order_details_table’ using the ‘Merge Queries’ function in Power Query Editor and choose inner join (only matching rows). In the dataframe ‘restaurant_db_data_dictionary.csv’ we find information that column ‘item_id’ from ‘order_details’ table matches the ‘menu_item_id’ in the ‘menu_items’ table and combine 2 tables by common column id ‘menu_item_id’ and ‘item_id’ . 4. We remove the columns that we don’t need and also create a new ‘order_id’ with unique number for each order.

    As a result we have 6 columns in the new dataframe ‘order_details_table’ , such as: order_details_id: A unique identifier for each dish within an order, order_id : The unique identifier for each order or transaction , order_date_time : The date when the order was created in the format (MM/DD/YY HH:MM:SS) , menu_item_category : The category to which the dish belongs , menu_item_name : The name of the dish on the menu , menu_item_price : The price of the dish .

    Table order_detail_tables from Power BI file restaurant_orders_result.pbix https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F1098315c0e34255b67ad3419aa113bf0%2Fdataframe.png?generation=1730269164808705&alt=media" alt="">

    I have also created bar charts and pie charts to display the results from the new dataframe. These plots are included in the file ‘restaurant_orders_result_charts.pbix’ . And you can find pictures of charts below.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F4254696bbd3d7e0fc5f456c226c39114%2Fpicture_1.png?generation=1730269227195114&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F71092cf769862cf7364fe1ccac9fad83%2Fpicture_2.png?generation=1730269249147687&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F528ef51ecf21f006b0c21b65503e03fa%2Fpicture_3.png?generation=1730269284640753&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13670445%2F147c240da4be5bfe9da057a8bc5d5939%2Fpicture_4.png?generation=1730269300346146&alt=media" alt="">

    I also attached the original and new files to this project, thank you.

  20. g

    Dataset with four years of condition monitoring technical language...

    • gimi9.com
    Updated Jan 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Dataset with four years of condition monitoring technical language annotations from paper machine industries in northern Sweden | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-doi-org-10-5878-hafd-ms27/
    Explore at:
    Dataset updated
    Jan 8, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Sweden
    Description

    This dataset consists of four years of technical language annotations from two paper machines in northern Sweden, structured as a Pandas dataframe. The same data is also available as a semicolon-separated .csv file. The data consists of two columns, where the first column corresponds to annotation note contents, and the second column corresponds to annotation titles. The annotations are in Swedish, and processed so that all mentions of personal information are replaced with the string ‘egennamn’, meaning “personal name” in Swedish. Each row corresponds to one annotation with the corresponding title. Data can be accessed in Python with: import pandas as pd annotations_df = pd.read_pickle("Technical_Language_Annotations.pkl") annotation_contents = annotations_df['noteComment'] annotation_titles = annotations_df['title']

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
Organization logo

Shopping Mall

Explore at:
zip(22852 bytes)Available download formats
Dataset updated
Dec 15, 2023
Authors
Anshul Pachauri
Description

Libraries Import:

Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").

Search
Clear search
Close search
Google apps
Main menu