100+ datasets found
  1. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  2. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  3. u

    Data from: Supplementary Material for "Sonification for Exploratory Data...

    • pub.uni-bielefeld.de
    • search.datacite.org
    Updated Feb 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. https://pub.uni-bielefeld.de/record/2920448
    Explore at:
    Dataset updated
    Feb 5, 2019
    Authors
    Thomas Hermann
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Sonification for Exploratory Data Analysis

    Chapter 8: Sonification Models

    In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data.

    8.1 Data Sonograms

    Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space.

    • Table 8.2, page 87: Sound examples for Data Sonograms
    File:
    Iris dataset: started in plot "https://pub.uni-bielefeld.de/download/2920448/2920454">(a) at S0 (b) at S1 (c) at S2
    10d noisy circle dataset: started in plot (c) at "https://pub.uni-bielefeld.de/download/2920448/2920451">S0 (mean) (d) at S1 (edge)
    10d Gaussian: plot (d) started at S0
    3 clusters: Example 1
    3 clusters: invisible columns used as output variables: "https://pub.uni-bielefeld.de/download/2920448/2920450">Example 2
    Description:
    Data Sonogram Sound examples for synthetic datasets and the Iris dataset
    Duration:
    about 5 s
    8.2 Particle Trajectory Sonification Model

    This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset.

    • Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x).
    • Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters.
    • Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters.
    • Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster.
    • Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters
    • Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster
    • Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step.
    • Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step.
    • Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset.
    8.3 Markov chain Monte Carlo Sonification

    The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound.

    • Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes.
    • Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset
    • McMC Sonification for Cluster Analysis, dataset with three clusters, page 107
    • McMC Sonification for Cluster
  4. 2010 Census: Iowa Population by ZCTA

    • kaggle.com
    zip
    Updated May 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Mucchetti (2020). 2010 Census: Iowa Population by ZCTA [Dataset]. https://www.kaggle.com/markmucchetti/2010-census-iowa-population-by-zcta
    Explore at:
    zip(8057 bytes)Available download formats
    Dataset updated
    May 2, 2020
    Authors
    Mark Mucchetti
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Iowa
    Description

    Context

    This is a sample dataset derived from 2010 U.S. Government Census data. It is intended to be used in combination with example analyses on the public dataset "Iowa Liquor Sales", available as a Google Public Dataset, on Kaggle, and at https://data.iowa.gov/Sales-Distribution/Iowa-Liquor-Sales/m3tr-qhgy.

    Usage

    This dataset is intended for use as an example. Columns have purposely not been filtered by string manipulation in order to explore joining data between two pandas DataFrames and to do further processing.

    Because this data is at the Zip Code Tabulation Area (ZCTA) level, additional processing is required to join it with general-purpose datasets, which may be specified at the zip code, county name, county FIPS code, or coordinate level. This is intentional.

  5. f

    Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...

    • acs.figshare.com
    xlsx
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford (2023). The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE) [Dataset]. http://doi.org/10.1021/acs.jcim.1c00244.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.

  6. f

    Data from: Multivariate Outliers and the O3 Plot

    • figshare.com
    • tandf.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antony Unwin (2023). Multivariate Outliers and the O3 Plot [Dataset]. http://doi.org/10.6084/m9.figshare.7792115.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Antony Unwin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identifying and dealing with outliers is an important part of data analysis. A new visualization, the O3 plot, is introduced to aid in the display and understanding of patterns of multivariate outliers. It uses the results of identifying outliers for every possible combination of dataset variables to provide insight into why particular cases are outliers. The O3 plot can be used to compare the results from up to six different outlier identification methods. There is anRpackage OutliersO3 implementing the plot. The article is illustrated with outlier analyses of German demographic and economic data. Supplementary materials for this article are available online.

  7. Calculus Video Worked Example Data

    • kaggle.com
    zip
    Updated Aug 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jocelyn Dumlao (2023). Calculus Video Worked Example Data [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/calculus-video-worked-example-data
    Explore at:
    zip(2757 bytes)Available download formats
    Dataset updated
    Aug 15, 2023
    Authors
    Jocelyn Dumlao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    Summary data from a Calculus II class where students were required to watch an instructional video before or after the lecture. The dataset includes gender (1=female; 2=male), vgroup (-1=before lecture; 1=after lecture), binary flag for 26 individual videos (1=watched 80% or more of length of video; 0=not watched), videosum (sum of number of videos watched), final_raw (raw grade student received on cumulative final course exam), sat_math (scaled SAT-Math score out of 800), math_place (institutional calculus readiness score out of 100), watched20 (grouping flag for students who watched 20 or more videos).

    Categories:

    Mathematics Education

    Acknowledgements:

    DeFranco, Thomas; Judd, Jamison

  8. o

    Whistlerlib: a distributed computing library for exploratory data analysis...

    • repositorio.observatoriogeo.mx
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Whistlerlib: a distributed computing library for exploratory data analysis on large social network datasets - Dataset - Repositorio del Observatorio Metropolitano CentroGeo [Dataset]. http://repositorio.observatoriogeo.mx/dataset/1ee805b50082
    Explore at:
    Dataset updated
    Oct 21, 2025
    Description

    At least 350k posts are published on X, 510k comments are posted on Facebook, and 66k pictures and videos are shared on Instagram each minute. These large datasets require substantial processing power, even if only a percentage is collected for analysis and research. To face this challenge, data scientists can now use computer clusters deployed on various IaaS and PaaS services in the cloud. However, scientists still have to master the design of distributed algorithms and be familiar with using distributed computing programming frameworks. It is thus essential to generate tools that provide analysis methods to leverage the advantages of computer clusters for processing large amounts of social network text. This paper presents Whistlerlib, a new Python library for conducting exploratory analysis on large text datasets on social networks. Whistlerlib implements distributed versions of various social media, sentiment, and social network analysis methods that can run atop computer clusters. We experimentally demonstrate the scalability of the various Whistlerlib distributed methods when deployed on a public cloud platform. We also present a practical example of the analysis of posts on the social network X about the Mexico City subway to showcase the features of Whistlerlib in scenarios where social network analysis tools are needed to address issues with a social dimension.

  9. r

    Exploratory data analysis of infrared spectra from 3D-printing polymers

    • researchdata.edu.au
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Lewis; Michael V. Adamos; Kari Pitts; Georgina Sauzier (2025). Exploratory data analysis of infrared spectra from 3D-printing polymers [Dataset]. http://doi.org/10.25917/FN6A-AZ80
    Explore at:
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    Curtin University
    Authors
    Simon Lewis; Michael V. Adamos; Kari Pitts; Georgina Sauzier
    Description

    Data description: This dataset consists of spectroscopic data files and associated R-scripts for exploratory data analysis. Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectra were collected from 67 samples of polymer filaments potentially used to produce illicit 3D-printed items. Principal component analysis (PCA) was used to determine if any individual filaments gave distinctive spectral signatures, potentially allowing traceability of 3D-printed items for forensic purposes. The project also investigated potential chemical variations induced by the filament manufacturing or 3D-printing process. Data was collected and analysed by Michael Adamos at Curtin University (Perth, Western Australia), under the supervision of Dr Georgina Sauzier and Prof. Simon Lewis and with specialist input from Dr Kari Pitts.

    Data collection time details: 2024
    Number of files/types: 3 .R files, 702 .JDX files
    Geographic information (if relevant): Australia
    Keywords: 3D printing, polymers, infrared spectroscopy, forensic science

  10. ftmsRanalysis: An R package for exploratory data analysis and interactive...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue (2023). ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007654
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.

  11. f

    Robust Reproducible Network Exploration

    • tandf.figshare.com
    pdf
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masaki Toyoda; Yoshimasa Uematsu (2025). Robust Reproducible Network Exploration [Dataset]. http://doi.org/10.6084/m9.figshare.30259039.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Masaki Toyoda; Yoshimasa Uematsu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose a novel methodology for discovering the presence of relationships realized as binary time series between variables in high dimension. To make it visually intuitive, we regard the existence of a relationship as an edge connection, and call a collection of such edges a network. Our objective is thus rephrased as uncovering the network by selecting relevant edges, referred to as the network exploration. Our methodology is based on multiple testing for the presence or absence of each edge, designed to ensure statistical reproducibility via controlling the false discovery rate (FDR). In particular, we carefully construct p-variables, and apply the Benjamini-Hochberg (BH) procedure. We show that the BH with our p-variables controls the FDR under arbitrary dependence structure with any sample size and dimension, and has asymptotic power one under mild conditions. The validity is also confirmed by simulations and a real data example.

  12. FPS/Status: ASL data and calculations

    • figshare.com
    pdf
    Updated Dec 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiva Bennett (2024). FPS/Status: ASL data and calculations [Dataset]. http://doi.org/10.6084/m9.figshare.27146745.v3
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 22, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Kiva Bennett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains the files used in the ASL analyses of my study: All of the data and calculations for my primary analysis, my exploratory analyses (except the one using a video from The Daily Moth, which can be found in a separate folder), and the ASL portions of my secondary analysis. As described in my dissertation, I am not sharing the original video files in order to protect the privacy of those who participated in my study.Each file is shared in one or more of the formats listed below, as appropriate:PDF.csv files (one file for each sheet)Link to my Google Sheets file

  13. f

    Data from: MAIN MINERALS AND ORGANIC COMPOUNDS IN COMMERCIAL ROASTED AND...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flores, Eder Lisandro Moraes; Leite, Oldair Donizete; Canan, Cristiane; Kalschne, Daneysa Lahis; de Toledo Benassi, Marta; Silva, Nathalia Karen (2021). MAIN MINERALS AND ORGANIC COMPOUNDS IN COMMERCIAL ROASTED AND GROUND COFFEE: AN EXPLORATORY DATA ANALYSIS [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000870684
    Explore at:
    Dataset updated
    Mar 24, 2021
    Authors
    Flores, Eder Lisandro Moraes; Leite, Oldair Donizete; Canan, Cristiane; Kalschne, Daneysa Lahis; de Toledo Benassi, Marta; Silva, Nathalia Karen
    Description

    Coffee is one of the most popular beverages in the world, however, little information is found regarding the mineral composition of commercial roasted and ground coffees (RG) and its correlation with organic bioactive compounds. 21 commercial Brazilian RG coffee brands - 9 traditional (T) and 12 extra strong (ES) roasted ones - were analyzed for the Cu, Ca, Mn, Mg, K, Zn, and Fe minerals, caffeine, 5-caffeoylquinic acid (5-CQA) and melanoidins contents. For minerals determination by flame atomic absorption spectrometry (FAAS), the samples were decomposed by microwave-assisted wet digestion. Caffeine and 5-CQA were determined by liquid chromatography and melanoidins by molecular absorption spectrometry. The minerals and organic compounds contents association in RG coffee was observed by a principal component analysis. The thermostable compounds (minerals and caffeine) were related to dimension 1 and 2, while 5-CQA and melanoidins were related to dimension 3, allowing for the T coffees segmentation from ES ones.

  14. Streaming Service Data

    • kaggle.com
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chad Wambles (2024). Streaming Service Data [Dataset]. https://www.kaggle.com/datasets/chadwambles/streaming-service-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chad Wambles
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A dataset I generated to showcase a sample set of user data for a fictional streaming service. This data is great for practicing SQL, Excel, Tableau, or Power BI.

    1000 rows and 25 columns of connected data.

    See below for column descriptions.

    Enjoy :)

  15. Data from: BikeShare Dataset

    • kaggle.com
    zip
    Updated Apr 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenniss Dillon (2022). BikeShare Dataset [Dataset]. https://www.kaggle.com/datasets/kennissdillon/bikeshare-dataset/data
    Explore at:
    zip(207350631 bytes)Available download formats
    Dataset updated
    Apr 18, 2022
    Authors
    Kenniss Dillon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data set is made available through the Google Analytics Coursera course. This data set is a part of a case study example, meant to showcase skills learned throughout the course.

  16. Data and analyses files for "To boldly go where no one has gone before –...

    • figshare.com
    png
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy Holt; Guido Grimm (2023). Data and analyses files for "To boldly go where no one has gone before – networks of moons" [Dataset]. http://doi.org/10.6084/m9.figshare.6555071.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Timothy Holt; Guido Grimm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset provides the basic data and analysis files used for a blogpost on the Genealogical World of Phylogenetic Networks by Guido Grimm and Timothy Holt entitled"To boldy go where no one has gone before – networks of moons"ContentFigures shown in the blogpost and a 7z-archive (7-zip.org) including— different versions of the basic data matrices including versions with code lines for the performed analysis with PAUP* (in JupiterMatrix99.simple.nex the code lines are explained to facilitate use by newbies)— results of the distance-based and parsimony analysesSee Readme.txt for labelling conventions, format, and further information.!!Important note!! In case you re-use the here provided data, make sure to cite (also) the original publication:Holt TR, Brown AJ, Nesvorný D, Horner J, Carter B (2018) Cladistical analysis of the Jovian and Saturnian satellite systems. Astrophysical Journal 859(2): 97, 20 ppPre-print version at arXiv: 1706.0142

  17. Data from: Best Practices for Your Exploratory Factor Analysis: A Factor...

    • scielo.figshare.com
    tiff
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Rogers (2023). Best Practices for Your Exploratory Factor Analysis: A Factor Tutorial [Dataset]. http://doi.org/10.6084/m9.figshare.20337249.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Pablo Rogers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Context: exploratory factor analysis (EFA) is one of the statistical methods most widely used in administration; however, its current practice coexists with rules of thumb and heuristics given half a century ago. Objective: the purpose of this article is to present the best practices and recent recommendations for a typical EFA in administration through a practical solution accessible to researchers. Methods: in this sense, in addition to discussing current practices versus recommended practices, a tutorial with real data on Factor is illustrated. The Factor software is still little known in the administration area, but is freeware, easy-to-use (point and click), and powerful. The step-by-step tutorial illustrated in the article, in addition to the discussions raised and an additional example, is also available in the format of tutorial videos. Conclusion: through the proposed didactic methodology (article-tutorial + video-tutorial), we encourage researchers/methodologists who have mastered a particular technique to do the same. Specifically about EFA, we hope that the presentation of the Factor software, as a first solution, can transcend the current outdated rules of thumb and heuristics, by making best practices accessible to administration researchers.

  18. Data: Anscombe's quintet

    • kaggle.com
    zip
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl McBride Ellis (2025). Data: Anscombe's quintet [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/data-anscombes-quartet/code
    Explore at:
    zip(712 bytes)Available download formats
    Dataset updated
    Apr 17, 2025
    Authors
    Carl McBride Ellis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This file is the data set form the famous publication Francis J. Anscombe "*Graphs in Statistical Analysis*", The American Statistician 27 pp. 17-21 (1973) (doi: 10.1080/00031305.1973.10478966). It consists of four data sets of 11 points each. Note the peculiarity that the same 'x' values are used for the first three data sets, and I have followed this exactly as in the original publication (originally done to save space), i.e. the first column (x123) serves as the 'x' for the next three 'y' columns; y1, y2 and y3.

    In the dataset Anscombe_quintet_data.csv there is a new column (y5) as an example of Simpson's paradox (C. McBride Ellis "*Anscombe dataset No. 5: Simpson's paradox*", Zenodo doi: 10.5281/zenodo.15209087 (2025)

  19. Design concepts identified by experts.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jan 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele Pretolesi; Ilaria Stanzani; Stefano Ravera; Andrea Vian; Annalisa Barla (2025). Design concepts identified by experts. [Dataset]. http://doi.org/10.1371/journal.pone.0315216.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 14, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Daniele Pretolesi; Ilaria Stanzani; Stefano Ravera; Andrea Vian; Annalisa Barla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper, we explore the application of Artificial Intelligence and network science methodologies in characterizing interdisciplinary disciplines, with a specific focus on the field of Italian design, taken as a paradigmatic example. Exploratory data analysis and the study of academic collaboration networks highlight how the field is evolving towards increased collaboration. Text analysis and semantic topic modelling identified the evolution of research interest over time, defining a ranking of pairs of keywords and three prominent research topics: User-Centric Experience Design, Innovative Product Design and Sustainable Service Design. Our results revealed a significant transformation in the field, with a shift from individual to collaborative research, as evidenced by the increasing complexity and collaboration within groups. We acknowledge the limitations faced by this work, suggesting that the methodology may be primarily suitable for bibliometric and more silos-like disciplines. However, we emphasize the urgency for the scientific community to address the future of research not indexed by large open-access databases like OpenAlex.

  20. m

    Data for "Best Practices for Your Exploratory Factor Analysis: Factor...

    • data.mendeley.com
    Updated Jul 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Rogers (2021). Data for "Best Practices for Your Exploratory Factor Analysis: Factor Tutorial" published by RAC-Revista de Administração Contemporânea [Dataset]. http://doi.org/10.17632/rdky78bk8r.1
    Explore at:
    Dataset updated
    Jul 16, 2021
    Authors
    Pablo Rogers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains material related to the analysis performed in the article "Best Practices for Your Exploratory Factor Analysis: Factor Tutorial". The material includes the data used in the analyses in .dat format, the labels (.txt) of the variables used in the Factor software, the outputs (.txt) evaluated in the article, and videos (.mp4 with English subtitles) recorded for the purpose of explaining the article. The videos can also be accessed in the following playlist: https://youtube.com/playlist?list=PLDfyRtHbxiZ3R-T3H1cY8dusz273aUFVe. Below is a summary of the article:

    "Exploratory Factor Analysis (EFA) is one of the statistical methods most widely used in Administration, however, its current practice coexists with rules of thumb and heuristics given half a century ago. The purpose of this article is to present the best practices and recent recommendations for a typical EFA in Administration through a practical solution accessible to researchers. In this sense, in addition to discussing current practices versus recommended practices, a tutorial with real data on Factor is illustrated, a software that is still little known in the Administration area, but freeware, easy to use (point and click) and powerful. The step-by-step illustrated in the article, in addition to the discussions raised and an additional example, is also available in the format of tutorial videos. Through the proposed didactic methodology (article-tutorial + video-tutorial), we encourage researchers/methodologists who have mastered a particular technique to do the same. Specifically, about EFA, we hope that the presentation of the Factor software, as a first solution, can transcend the current outdated rules of thumb and heuristics, by making best practices accessible to Administration researchers"

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
Organization logo

Ecommerce Dataset for Data Analysis

Exploratory Data Analysis, Data Visualisation and Machine Learning

Explore at:
zip(2028853 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
Shrishti Manja
Description

This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

Search
Clear search
Close search
Google apps
Main menu