62 datasets found
  1. f

    Data from: pmartR: Quality Control and Statistics for Mass...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer (2023). pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00760.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography–MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.

  2. COVID 19 Dataset

    • kaggle.com
    Updated Sep 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Gupta (2020). COVID 19 Dataset [Dataset]. https://www.kaggle.com/rahulgupta21/datahub-covid19/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rahul Gupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

    This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

    confirmed tested cases of Coronavirus infection the number of people who have reportedly died while sick with Coronavirus the number of people who have reportedly recovered from it

    Content

    Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

    We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.

  3. Data from: Supplementary Material for "Sonification for Exploratory Data...

    • search.datacite.org
    • pub.uni-bielefeld.de
    Updated Feb 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. http://doi.org/10.4119/unibi/2920448
    Explore at:
    Dataset updated
    Feb 5, 2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Bielefeld University
    Authors
    Thomas Hermann
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Sonification for Exploratory Data Analysis #### Chapter 8: Sonification Models In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data. ##### 8.1 Data Sonograms Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space. * Table 8.2, page 87: Sound examples for Data Sonograms File: Iris dataset: started in plot (a) at S0 (b) at S1 (c) at S2
    10d noisy circle dataset: started in plot (c) at S0 (mean) (d) at S1 (edge)
    10d Gaussian: plot (d) started at S0
    3 clusters: Example 1
    3 clusters: invisible columns used as output variables: Example 2 Description: Data Sonogram Sound examples for synthetic datasets and the Iris dataset Duration: about 5 s ##### 8.2 Particle Trajectory Sonification Model This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset. * Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x). * Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster. * Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters * Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster * Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step. * Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step. * Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset. ##### 8.3 Markov chain Monte Carlo Sonification The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound. * Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes. * Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset * McMC Sonification for Cluster Analysis, dataset with three clusters, page 107 * Stream 1 MCMC-Ex-3.1 * Stream 2 MCMC-Ex-3.2 * Stream 3 MCMC-Ex-3.3 * Mix MCMC-Ex-3.4 * McMC Sonification for Cluster Analysis, dataset with three clusters, T =0.002s, page 107 * Stream 1 MCMC-Ex-4.1 (stream 1) * Stream 2 MCMC-Ex-4.2 (stream 2) * Stream 3 MCMC-Ex-4.3 (stream 3) * Mix MCMC-Ex-4.4 * McMC Sonification for Cluster Analysis, density with 6 modes, T=0.008s, page 107 * Stream 1 MCMC-Ex-5.1 (stream 1) * Stream 2 MCMC-Ex-5.2 (stream 2) * Stream 3 MCMC-Ex-5.3 (stream 3) * Mix MCMC-Ex-5.4 * McMC Sonification for the Iris dataset, page 108 * MCMC-Ex-6.1 * MCMC-Ex-6.2 * MCMC-Ex-6.3 * MCMC-Ex-6.4 * MCMC-Ex-6.5 * MCMC-Ex-6.6 * MCMC-Ex-6.7 * MCMC-Ex-6.8 ##### 8.4 Principal Curve Sonification Principal Curve Sonification represents data by synthesizing the soundscape while a virtual listener moves along the principal curve of the dataset through the model space. * Noisy Spiral dataset, PCS-Ex-1.1 , page 113 * Noisy Spiral dataset with variance modulation PCS-Ex-1.2 , page 114 * 9d tetrahedron cluster dataset (10 clusters) PCS-Ex-2 , page 114 * Iris dataset, class label used as pitch of auditory grains PCS-Ex-3 , page 114 ##### 8.5 Data Crystallization Sonification Model * Table 8.6, page 122: Sound examples for Crystallization Sonification for 5d Gaussian distribution File: DCS started at center, in tail, from far outside Description: DCS for dataset sampled from N{0, I_5} excited at different locations Duration: 1.4 s * Mixture of 2 Gaussians, page 122 * DCS started at point A DCS-Ex1A * DCS started at point B DCS-Ex1B * Table 8.7, page 124: Sound examples for DCS on variation of the harmonics factor File: h_omega = 1, 2, 3, 4, 5, 6 Description: DCS for a mixture of two Gaussians with varying harmonics factor Duration: 1.4 s * Table 8.8, page 124: Sound examples for DCS on variation of the energy decay time File: tau_(1/2) = 0.001, 0.005, 0.01, 0.05, 0.1, 0.2 Description: DCS for a mixture of two Gaussians varying the energy decay time tau_(1/2) Duration: 1.4 s * Table 8.9, page 125: Sound examples for DCS on variation of the sonification time File: T = 0.2, 0.5, 1, 2, 4, 8 Description: DCS for a mixture of two Gaussians on varying the duration T Duration: 0.2s -- 8s * Table 8.10, page 125: Sound examples for DCS on variation of model space dimension File: selected columns of the dataset: (x0) (x0,x1) (x0,...,x2) (x0,...,x3) (x0,...,x4) (x0,...,x5) Description: DCS for a mixture of two Gaussians varying the dimension Duration: 1.4 s * Table 8.11, page 126: Sound examples for DCS for different excitation locations File: starting point: C0, C1, C2 Description: DCS for a mixture of three Gaussians in 10d space with different rank(S) = {2,4,8} Duration: 1.9 s * Table 8.12, page 126: Sound examples for DCS for the mixture of a 2d distribution and a 5d cluster File: condensation nucleus in (x0,x1)-plane at: (-6,0)=C1, (-3,0)=C2, ( 0,0)=C0 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s * Table 8.13, page 127: Sound examples for DCS for the cancer dataset File: condensation nucleus in (x0,x1)-plane at: benign 1, benign 2
    malignant 1, malignant 2 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s ##### 8.6 Growing Neural Gas Sonification * Table 8.14, page 133: Sound examples for GNGS Probing File: Cluster C0 (2d): a, b, c
    Cluster C1 (4d): a, b, c
    Cluster C2 (8d): a, b, c Description: GNGS for a mixture of 3 Gaussians in 10d space Duration: 1 s * Table 8.15, page 134: Sound examples for GNGS for the noisy spiral dataset File: (a) GNG with 3 neurons 1, 2
    (b) GNG with 20 neurons end, middle, inner end
    (c) GNG with 45 neurons outer end, middle, close to inner end, at inner end
    (d) GNG with 150 neurons outer end, in the middle, inner end
    (e) GNG with 20 neurons outer end, in the middle, inner end
    (f) GNG with 45 neurons outer end, in the middle, inner end Description: GNG probing sonification for 2d noisy spiral dataset Duration: 1 s * Table 8.16, page 136: Sound examples for GNG Process Monitoring Sonification for different data distributions File: Noisy spiral with 1 rotation: sound
    Noisy spiral with 2 rotations: sound
    Gaussian in 5d: sound
    Mixture of 5d and 2d distributions: sound Description: GNG process sonification examples Duration: 5 s #### Chapter 9: Extensions #### In this chapter, two extensions for Parameter Mapping

  4. d

    Data from: Research and exploratory analysis driven - time-data...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Jan 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko (2022). Research and exploratory analysis driven - time-data visualization (read-tv) software [Dataset]. http://doi.org/10.5061/dryad.d51c5b02g
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 30, 2022
    Dataset provided by
    Dryad
    Authors
    John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko
    Time period covered
    2021
    Description

    This section does not describe the methods of read-tv software development, which can be found in the associated manuscript from JAMIA Open (JAMIO-2020-0121.R1). This section describes the methods involved in the surgical work flow disruption data collection. A curated, PHI-free (protected health information) version of this dataset was used as a use case for this manuscript.

    Observer training

    Trained human factors researchers conducted each observation following the completion of observer training. The researchers were two full-time research assistants based in the department of surgery at site 3 who visited the other two sites to collect data. Human Factors experts guided and trained each observer in the identification and standardized collection of FDs. The observers were also trained in the basic components of robotic surgery in order to be able to tangibly isolate and describe such disruptive events.

    Comprehensive observer training was ensured with both classroom and floor train...

  5. l

    Datasets to accompany Resilience, where to begin? A lay theories approach....

    • figshare.le.ac.uk
    bin
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Maltby (2019). Datasets to accompany Resilience, where to begin? A lay theories approach. (Currently under submission) [Dataset]. http://doi.org/10.25392/leicester.data.9632213.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 24, 2019
    Dataset provided by
    University of Leicester
    Authors
    John Maltby
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Samples relating to 12 analyses of lay-theories of resilience among participants from USA, New Zealand, India, Iran, Russia (Moscow; Kazan). Central variables relate to participant endorsements of resilience descriptors. Demographic data includes (though not for all samples), Sex/Gender, Age, Ethnicity, Work, and Educational Status. Analysis 1. USA Exploratory Factor Analysis dataAnalysis 2. New Zealand Exploratory Factor Analysis dataAnalysis 3. India Exploratory Factor Analysis dataAnalysis 4. Iran Exploratory Factor Analysis dataAnalysis 5. Russian (Moscow) Exploratory Factor Analysis dataAnalysis 6. Russian (Kazan) Exploratory Factor Analysis dataAnalysis 7. USA Confirmatory Factor Analysis dataAnalysis 8. New Zealand Confirmatory Factor Analysis dataAnalysis 9. India Confirmatory Factor Analysis dataAnalysis 10. Iran Confirmatory Factor Analysis dataAnalysis 11. Russian (Moscow) Confirmatory Factor Analysis dataAnalysis 12. Russian (Kazan) Confirmatory Factor Analysis data

  6. f

    Supplementary materials for the article: Exploratory factor analysis with...

    • figshare.com
    • data.4tu.nl
    • +1more
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost de Winter; Dimitra Dodou (2023). Supplementary materials for the article: Exploratory factor analysis with small sample sizes [Dataset]. http://doi.org/10.4121/uuid:de29c01b-d8a3-44b4-a6d1-45af4c61a919
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Joost de Winter; Dimitra Dodou
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supplementary materials for the article: De Winter, J. C. F., Dodou, D., & Wieringa, P. A. (2009). Exploratory factor analysis with small sample sizes. Multivariate Behavioral Research, 44, 147–181.

  7. f

    ftmsRanalysis: An R package for exploratory data analysis and interactive...

    • plos.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue (2023). ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007654
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.

  8. Bank Marketing Classification Dataset

    • kaggle.com
    Updated Aug 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BALAJI VARA PRASAD DEGA (2024). Bank Marketing Classification Dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/5532086
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BALAJI VARA PRASAD DEGA
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

    There are four datasets: 1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014] 2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs. 3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs). 4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs). The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).

    The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).

  9. f

    Data from: Visualization for Interval Data

    • tandf.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muzi Zhang; Dennis K. J. Lin (2023). Visualization for Interval Data [Dataset]. http://doi.org/10.6084/m9.figshare.19617396.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Muzi Zhang; Dennis K. J. Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interval data are widely used in many fields, notably in economics, industry, and health areas. Analogous to the scatterplot for single-value data, the rectangle plot and cross plot are the conventional visualization methods for the relationship between two variables in interval forms. These methods do not provide much information to assess complicated relationships, however. In this article, we propose two visualization methods: Segment and Dandelion plots. They offer much more information than the existing visualization methods and allow us to have a much better understanding of the relationship between two variables in interval forms. A general guide for reading these plots is provided. Relevant theoretical support is developed. Both empirical and real data examples are provided to demonstrate the advantages of the proposed visualization methods. Supplementary materials for this article are available online.

  10. Customer Personality Analysis

    • kaggle.com
    zip
    Updated Aug 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Patel (2021). Customer Personality Analysis [Dataset]. https://www.kaggle.com/imakash3011/customer-personality-analysis
    Explore at:
    zip(63450 bytes)Available download formats
    Dataset updated
    Aug 22, 2021
    Authors
    Akash Patel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Problem Statement

    Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors and concerns of different types of customers.

    Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

    Content

    Attributes

    People

    • ID: Customer's unique identifier
    • Year_Birth: Customer's birth year
    • Education: Customer's education level
    • Marital_Status: Customer's marital status
    • Income: Customer's yearly household income
    • Kidhome: Number of children in customer's household
    • Teenhome: Number of teenagers in customer's household
    • Dt_Customer: Date of customer's enrollment with the company
    • Recency: Number of days since customer's last purchase
    • Complain: 1 if the customer complained in the last 2 years, 0 otherwise

    Products

    • MntWines: Amount spent on wine in last 2 years
    • MntFruits: Amount spent on fruits in last 2 years
    • MntMeatProducts: Amount spent on meat in last 2 years
    • MntFishProducts: Amount spent on fish in last 2 years
    • MntSweetProducts: Amount spent on sweets in last 2 years
    • MntGoldProds: Amount spent on gold in last 2 years

    Promotion

    • NumDealsPurchases: Number of purchases made with a discount
    • AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
    • AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
    • AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
    • AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
    • AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
    • Response: 1 if customer accepted the offer in the last campaign, 0 otherwise

    Place

    • NumWebPurchases: Number of purchases made through the company’s website
    • NumCatalogPurchases: Number of purchases made using a catalogue
    • NumStorePurchases: Number of purchases made directly in stores
    • NumWebVisitsMonth: Number of visits to company’s website in the last month

    Target

    Need to perform clustering to summarize customer segments.

    Acknowledgement

    The dataset for this project is provided by Dr. Omar Romero-Hernandez.

    Solution

    You can take help from following link to know more about the approach to solve this problem. Visit this URL

    Inspiration

    happy learning....

    Hope you like this dataset please don't forget to like this dataset

  11. Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Mar 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2024). Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary Guidelines for Americans [Dataset]. https://catalog.data.gov/dataset/data-from-an-exploratory-analysis-of-barriers-to-usage-of-the-usda-dietary-guidelines-for--bb6c7
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The average American’s diet does not align with the Dietary Guidelines for Americans (DGA) provided by the U.S. Department of Agriculture and the U.S. Department of Health and Human Services (2020). The present study aimed to compare fruit and vegetable consumption among those who had and had not heard of the DGA, identify characteristics of DGA users, and identify barriers to DGA use. A nationwide survey of 943 Americans revealed that those who had heard of the DGA ate more fruits and vegetables than those who had not. Men, African Americans, and those who have more education had greater odds of using the DGA as a guide when preparing meals relative to their respective counterparts. Disinterest, effort, and time were among the most cited reasons for not using the DGA. Future research should examine how to increase DGA adherence among those unaware of or who do not use the DGA. Comparative analyses of fruit and vegetable consumption among those who were aware/unaware and use/do not use the DGA were completed using independent samples t tests. Fruit and vegetable consumption variables were log-transformed for analysis. Binary logistic regression was used to examine whether demographic features (race, gender, and age) predict DGA awareness and usage. Data were analyzed using SPSS version 28.1 and SAS/STAT® version 9.4 TS1M7 (2023 SAS Institute Inc).

  12. Z

    Funding Covid-19 research: Insights from an exploratory analysis using open...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mugabushaka, Alexis-Michel (2022). Funding Covid-19 research: Insights from an exploratory analysis using open data infrastructures - Supplementary material [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6112761
    Explore at:
    Dataset updated
    Jul 7, 2022
    Dataset provided by
    Mugabushaka, Alexis-Michel
    Waltman, Ludo
    Van Eck, Nees Jan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains supplementary material for the paper 'Funding Covid-19 research: Insights from an exploratory analysis using open data infrastructures' by Alexis-Michel Mugabushaka, Nees Jan van Eck, and Ludo Waltman.

    supplementary_material_1_dataset.ods: Dataset of Covid-19 publications.

    supplementary_material_2_sample.ods: Samples of publications used to assess the accuracy of funding data in the different databases.

    supplementary_material_3_tables_and_figures.ods: Statistics underlying the tables and figures presented in the paper.

  13. Experimental Data Set for the study "Exploratory Landscape Analysis is...

    • zenodo.org
    • explore.openaire.eu
    csv, text/x-python +1
    Updated Jan 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quentin Renau; Carola Doerr; Carola Doerr; Johann Dreo; Johann Dreo; Benjamin Doerr; Benjamin Doerr; Quentin Renau (2021). Experimental Data Set for the study "Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy" [Dataset]. http://doi.org/10.5281/zenodo.3886816
    Explore at:
    text/x-python, csv, zipAvailable download formats
    Dataset updated
    Jan 28, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Quentin Renau; Carola Doerr; Carola Doerr; Johann Dreo; Johann Dreo; Benjamin Doerr; Benjamin Doerr; Quentin Renau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This are the feature values used in the study "Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy".

    The dataset regroups feature values for every "cheap" features available in the R package flacco and are computed using 5 sampling strategies and in dimension \($d=5$\):

    1. Random: the classical Mersenne-Twister algorithm;
    2. Randu: a random number generator that is notoriously bad;
    3. LHS: a centered Latin Hypercube Design;
    4. iLHS: an improved Latin Hypercube Design;
    5. Sobol: points extracted from a Sobol' low-discrepancy sequence.

    The csv file features_summury_dim_5_ppsn.csv regroups 100 values for every features whereas features_summury_dim_5_ppsn_median.csv regroups for every feature the median of the 100 values.

    In the folder PPSN_feature_plots are the histograms of feature values on the 24 COCO functions for 3 sampling strategies: Random, LHS and Sobol.

    The Python file sampling_ppsn.py is the code used to generate the sample points from which the feature values are computed.

    The file stats50_knn_dt.csv provide the raw data of median and IQR (inter quartile interval) for the heatmaps and boxplots available in the paper.

    Finally, the files results_classif_knn100.csv (resp. dt) provide the accuracy of 100 classifications for every settings.

  14. Enterprise-Driven Open Source Software

    • zenodo.org
    application/gzip
    Updated Apr 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas (2020). Enterprise-Driven Open Source Software [Dataset]. http://doi.org/10.5281/zenodo.3653878
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

    The main dataset is provided as a 17,252 record tab-separated file named enterprise_projects.txt with the following 27 fields.

    • url: the project's GitHub URL
    • project_id: the project's GHTorrent identifier
    • sdtc: true if selected using the same domain top committers heuristic (9,006 records)
    • mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,289 records)
    • mcve: true if selected using the multiple committers from a probable company heuristic (7,990 records),
    • star_number: number of GitHub watchers
    • commit_count: number of commits
    • files: number of files in current main branch
    • lines: corresponding number of lines in text files
    • pull_requests: number of pull requests
    • most_recent_commit: date of the most recent commit
    • committer_count: number of different committers
    • author_count: number of different authors
    • dominant_domain: the projects dominant email domain
    • dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain
    • dominant_domain_author_commits: corresponding number for commit authors
    • dominant_domain_committers: number of committers whose email matches the project's dominant domain
    • dominant_domain_authors: corresponding number of commit authors
    • cik: SEC's EDGAR "central index key"
    • fg500: true if this is a Fortune Global 500 company (2,232 records)
    • sec10k: true if the company files SEC 10-K forms (4,178 records)
    • sec20f: true if the company files SEC 20-F forms (429 records)
    • project_name: GitHub project name
    • owner_login: GitHub project's owner login
    • company_name: company name as derived from the SEC and Fortune 500 data
    • owner_company: GitHub project's owner company name
    • license: SPDX license identifier

    The file cohost_project_details.txt provides the full set of 309,531 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

    • url: the project's GitHub URL
    • project_id: the project's GHTorrent identifier
    • stars: number of GitHub watchers
    • commit_count: number of commits
  15. d

    EXPLORATORY POLLEN ANALYSIS OF SAMPLES FROM SAN PEDRO SIRIS, BELIZE

    • search.dataone.org
    Updated Dec 13, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cummings, Linda Scott (PaleoResearch Institute) (2012). EXPLORATORY POLLEN ANALYSIS OF SAMPLES FROM SAN PEDRO SIRIS, BELIZE [Dataset]. http://doi.org/10.6067/XCV8W66K7Q
    Explore at:
    Dataset updated
    Dec 13, 2012
    Dataset provided by
    the Digital Archaeological Record
    Authors
    Cummings, Linda Scott (PaleoResearch Institute)
    Description

    Ten pollen samples collected from sediments in the East Field system at San Pedro Siris, situated in the Yalbac area of the Cayo District of central Belize, were examined for pollen evidence of crops.

  16. d

    EXPLORATORY POLLEN ANALYSIS OF THE LOWEST STRATIGRAPHIC SAMPLE FROM HELL...

    • search.dataone.org
    Updated Aug 21, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cummings, Linda Scott (PaleoResearch Institute) (2012). EXPLORATORY POLLEN ANALYSIS OF THE LOWEST STRATIGRAPHIC SAMPLE FROM HELL GAP, SOUTHEASTERN WYOMING [Dataset]. http://doi.org/10.6067/XCV82F7MQP
    Explore at:
    Dataset updated
    Aug 21, 2012
    Dataset provided by
    the Digital Archaeological Record
    Authors
    Cummings, Linda Scott (PaleoResearch Institute)
    Description

    The lowest sample collected from a stratigraphic column at Hell Gap was submitted for exploratory pollen analysis. Exploratory pollen analysis of this sample includes a pollen count, evaluation of the condition of the pollen and concentration of pollen in this sediment, and recommendations for the future.

  17. Degree of Geochemical Similarity (DOGS) using correlation of log-transformed...

    • ecat.ga.gov.au
    Updated Jan 1, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2015). Degree of Geochemical Similarity (DOGS) using correlation of log-transformed multi-element concentrations: methodology and applications [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/19b6faf4-4ede-c716-e053-12a3070a651b
    Explore at:
    Dataset updated
    Jan 1, 2015
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    RD
    Area covered
    Asia
    Description

    The 42 element, 1190 sample Mobile Metal Ion subset of the National Geochemical Survey of Australia database was used to develop and illustrate the concept of Degree of Geochemical Similarity of soil samples. Element concentrations were unified to parts per million units and log(10)-transformed. The degree of similarity of pairs of samples of known provenance in the Yilgarn Craton were obtained using least squares linear regression analysis and demonstrated that the method successfully assessed the degree of similarity of soils related to granitoid and greenstone lithologies. Exploratory Data Analysis symbol maps of all remaining samples in the database against various reference samples were used to obtain correlation maps for not only granitoid- and greenstone-related soil types, but also to distinguish between for example samples derived from marine vs regolith carbonate. Likewise, the distribution of soil samples having a geochemical fingerprint similar to mineralised provinces (e.g., Mt Isa) can be mapped and this can be used as a first order prospection tool. Sensitivity analysis confirmed the method to produce robust results without undue influence from either single elements with anomalous concentrations or elements with a high proportion of censored values.

  18. m

    Facebook Audience Insight on Food Choice

    • data.mendeley.com
    • narcis.nl
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lidia Mayangsari (2020). Facebook Audience Insight on Food Choice [Dataset]. http://doi.org/10.17632/9dbd9jcdvs.1
    Explore at:
    Dataset updated
    Nov 12, 2020
    Authors
    Lidia Mayangsari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented in this paper is used to examine the behavioral factors that influence the preferences of foods in Indonesia, and Indonesian audiences’ segmentation behind those preferences, provided by social media data. We collected the data through an online platform by performing a query search on Facebook Audience Insights Interests. The keywords that we use in the question quest are based on the United Nations Food and Agriculture Organisation (FAO) Food Balance Sheet (FBS) which is retrieved from FAOStat in May 2020. The data was gathered between 15 May and 2 July 2020. With a sample size of 100-150 million viewers or about 36.95 per cent-55.43 per cent of Indonesia 's 2019 population, we limited our sample to Indonesia. The dataset is made up of ten tables that can be separately analyzed. For each table, we carry out exploratory data analysis (EDA) to provide more insights. Such data could be of interest to various fields, including food scientists, government and policymakers, data scientists/analysts, and marketers. This data could also be the complementary source for the scarcity of food survey data from the government, particularly the behavioral aspects.

  19. o

    Data from: Exploratory and confirmatory analyses in sentence processing: A...

    • osf.io
    Updated Apr 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Nicenboim; Shravan Vasishth; Felix Engelmann; Katja Suckow (2022). Exploratory and confirmatory analyses in sentence processing: A case study of number interference in German [Dataset]. http://doi.org/10.17605/OSF.IO/MMR7S
    Explore at:
    Dataset updated
    Apr 18, 2022
    Dataset provided by
    Center For Open Science
    Authors
    Bruno Nicenboim; Shravan Vasishth; Felix Engelmann; Katja Suckow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Given the replication crisis in cognitive science, it is important to consider what researchers need to do in order to report results that are reliable. We consider three changes in current practice that have the potential to deliver more realistic and robust claims. First, the planned experiment should be divided up into two stages, an exploratory stage and a confirmatory stage. This clear separation allows the researcher to check whether any results found in the exploratory stage are robust. The second change is to carry out adequately powered studies. We show that this is imperative if we want to obtain realistic estimates of effects in psycholinguistics. The third change is to use Bayesian data-analytic methods rather than frequentist ones; the Bayesian framework allows us to focus on the best estimates we can obtain of the effect, rather than rejecting a strawman null. As a case study, we investigate number interference effects in German. Number feature interference is predicted by cue-based retrieval models of sentence processing (Van Dyke & Lewis, 2003; Vasishth & Lewis, 2006), but has shown inconsistent results. We show that by implementing the three changes mentioned, suggestive evidence emerges that is consistent with the predicted number interference effects.

  20. G

    Enhanced interpretation of regional geochemical stream sediment data from...

    • open.canada.ca
    • ouvert.canada.ca
    html
    Updated Oct 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Yukon (2024). Enhanced interpretation of regional geochemical stream sediment data from Yukon: catchment basin analysis and weighted sums modeling [Dataset]. https://open.canada.ca/data/en/dataset/d7a9dc77-f36b-08d5-9f68-5e0de251b234
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    Government of Yukon
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Yukon
    Description

    Geochemical data from regional geochemistry survey samples from Yukon have undergone exploratory data analysis and principal component analysis. The results of these analyses clearly demonstrate geological control on the distribution of a number of important commodity and mineral deposit pathfinder elements. Catchment basins have been delineated for the samples and the dominant simplified geological unit in each catchment basin used to level the geochemical data where appropriate. Levelling the geochemical data in this fashion generally fails to fully account for enrichments in many commodity and mineral deposit pathfinder elements in the bedrock due to practical limitations on the resolution of the mapping and knowledge of the relative contributions of different geological units, although the resulting data interpretation is an improvement on one based solely upon raw geochemical data. Weighted sums models have been generated for the deposit types that either exist within the individual map areas covered by this report or are considered by the authors to be of exploration significance. Separate catchment maps showing the distribution of stream water pH and the concentration of elements inferred to have accumulated through hydromorphic dispersion are also provided. An additional series of maps has been generated to display weighted sums models calculated using regression of commodity and mineral deposit pathfinder elements against those principal components containing the same elements that show the strongest spatial associations with bedrock geology. Both model types have been iteratively tested using known mineral occurrences in the relevant map areas and, for the most part, are compatible with the distribution of known mineralization where sampling coverage is adequate. Geochemical anomalies unrelated to known mineral occurrences are evident in both data sets and provide possible targets for further investigation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer (2023). pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00760.s001

Data from: pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography–MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.

Search
Clear search
Close search
Google apps
Main menu