62 datasets found

f
Data from: pmartR: Quality Control and Statistics for Mass...
acs.figshare.com
figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer (2023). pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00760.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.8b00760.s001
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography–MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.
COVID 19 Dataset
kaggle.com
Updated Sep 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Gupta (2020). COVID 19 Dataset [Dataset]. https://www.kaggle.com/rahulgupta21/datahub-covid19/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rahul Gupta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

confirmed tested cases of Coronavirus infection the number of people who have reportedly died while sick with Coronavirus the number of people who have reportedly recovered from it

Content

Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.
Data from: Supplementary Material for "Sonification for Exploratory Data...
search.datacite.org
pub.uni-bielefeld.de
Updated Feb 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. http://doi.org/10.4119/unibi/2920448
Explore at:
Unique identifier
https://doi.org/10.4119/unibi/2920448
Dataset updated
Feb 5, 2019
Dataset provided by
DataCitehttps://www.datacite.org/
Bielefeld University
Authors
Thomas Hermann
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Sonification for Exploratory Data Analysis #### Chapter 8: Sonification Models In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data. ##### 8.1 Data Sonograms Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space. * Table 8.2, page 87: Sound examples for Data Sonograms File: Iris dataset: started in plot (a) at S0 (b) at S1 (c) at S2
10d noisy circle dataset: started in plot (c) at S0 (mean) (d) at S1 (edge)
10d Gaussian: plot (d) started at S0
3 clusters: Example 1
3 clusters: invisible columns used as output variables: Example 2 Description: Data Sonogram Sound examples for synthetic datasets and the Iris dataset Duration: about 5 s ##### 8.2 Particle Trajectory Sonification Model This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset. * Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x). * Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters. * Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster. * Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters * Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster * Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step. * Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step. * Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset. ##### 8.3 Markov chain Monte Carlo Sonification The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound. * Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes. * Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset * McMC Sonification for Cluster Analysis, dataset with three clusters, page 107 * Stream 1 MCMC-Ex-3.1 * Stream 2 MCMC-Ex-3.2 * Stream 3 MCMC-Ex-3.3 * Mix MCMC-Ex-3.4 * McMC Sonification for Cluster Analysis, dataset with three clusters, T =0.002s, page 107 * Stream 1 MCMC-Ex-4.1 (stream 1) * Stream 2 MCMC-Ex-4.2 (stream 2) * Stream 3 MCMC-Ex-4.3 (stream 3) * Mix MCMC-Ex-4.4 * McMC Sonification for Cluster Analysis, density with 6 modes, T=0.008s, page 107 * Stream 1 MCMC-Ex-5.1 (stream 1) * Stream 2 MCMC-Ex-5.2 (stream 2) * Stream 3 MCMC-Ex-5.3 (stream 3) * Mix MCMC-Ex-5.4 * McMC Sonification for the Iris dataset, page 108 * MCMC-Ex-6.1 * MCMC-Ex-6.2 * MCMC-Ex-6.3 * MCMC-Ex-6.4 * MCMC-Ex-6.5 * MCMC-Ex-6.6 * MCMC-Ex-6.7 * MCMC-Ex-6.8 ##### 8.4 Principal Curve Sonification Principal Curve Sonification represents data by synthesizing the soundscape while a virtual listener moves along the principal curve of the dataset through the model space. * Noisy Spiral dataset, PCS-Ex-1.1 , page 113 * Noisy Spiral dataset with variance modulation PCS-Ex-1.2 , page 114 * 9d tetrahedron cluster dataset (10 clusters) PCS-Ex-2 , page 114 * Iris dataset, class label used as pitch of auditory grains PCS-Ex-3 , page 114 ##### 8.5 Data Crystallization Sonification Model * Table 8.6, page 122: Sound examples for Crystallization Sonification for 5d Gaussian distribution File: DCS started at center, in tail, from far outside Description: DCS for dataset sampled from N{0, I_5} excited at different locations Duration: 1.4 s * Mixture of 2 Gaussians, page 122 * DCS started at point A DCS-Ex1A * DCS started at point B DCS-Ex1B * Table 8.7, page 124: Sound examples for DCS on variation of the harmonics factor File: h_omega = 1, 2, 3, 4, 5, 6 Description: DCS for a mixture of two Gaussians with varying harmonics factor Duration: 1.4 s * Table 8.8, page 124: Sound examples for DCS on variation of the energy decay time File: tau_(1/2) = 0.001, 0.005, 0.01, 0.05, 0.1, 0.2 Description: DCS for a mixture of two Gaussians varying the energy decay time tau_(1/2) Duration: 1.4 s * Table 8.9, page 125: Sound examples for DCS on variation of the sonification time File: T = 0.2, 0.5, 1, 2, 4, 8 Description: DCS for a mixture of two Gaussians on varying the duration T Duration: 0.2s -- 8s * Table 8.10, page 125: Sound examples for DCS on variation of model space dimension File: selected columns of the dataset: (x0) (x0,x1) (x0,...,x2) (x0,...,x3) (x0,...,x4) (x0,...,x5) Description: DCS for a mixture of two Gaussians varying the dimension Duration: 1.4 s * Table 8.11, page 126: Sound examples for DCS for different excitation locations File: starting point: C0, C1, C2 Description: DCS for a mixture of three Gaussians in 10d space with different rank(S) = {2,4,8} Duration: 1.9 s * Table 8.12, page 126: Sound examples for DCS for the mixture of a 2d distribution and a 5d cluster File: condensation nucleus in (x0,x1)-plane at: (-6,0)=C1, (-3,0)=C2, ( 0,0)=C0 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s * Table 8.13, page 127: Sound examples for DCS for the cancer dataset File: condensation nucleus in (x0,x1)-plane at: benign 1, benign 2
malignant 1, malignant 2 Description: DCS for a mixture of a uniform 2d and a 5d Gaussian Duration: 2.16 s ##### 8.6 Growing Neural Gas Sonification * Table 8.14, page 133: Sound examples for GNGS Probing File: Cluster C0 (2d): a, b, c
Cluster C1 (4d): a, b, c
Cluster C2 (8d): a, b, c Description: GNGS for a mixture of 3 Gaussians in 10d space Duration: 1 s * Table 8.15, page 134: Sound examples for GNGS for the noisy spiral dataset File: (a) GNG with 3 neurons 1, 2
(b) GNG with 20 neurons end, middle, inner end
(c) GNG with 45 neurons outer end, middle, close to inner end, at inner end
(d) GNG with 150 neurons outer end, in the middle, inner end
(e) GNG with 20 neurons outer end, in the middle, inner end
(f) GNG with 45 neurons outer end, in the middle, inner end Description: GNG probing sonification for 2d noisy spiral dataset Duration: 1 s * Table 8.16, page 136: Sound examples for GNG Process Monitoring Sonification for different data distributions File: Noisy spiral with 1 rotation: sound
Noisy spiral with 2 rotations: sound
Gaussian in 5d: sound
Mixture of 5d and 2d distributions: sound Description: GNG process sonification examples Duration: 5 s #### Chapter 9: Extensions #### In this chapter, two extensions for Parameter Mapping
d
Data from: Research and exploratory analysis driven - time-data...
datadryad.org
data.niaid.nih.gov
zip
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko (2022). Research and exploratory analysis driven - time-data visualization (read-tv) software [Dataset]. http://doi.org/10.5061/dryad.d51c5b02g
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.d51c5b02g
Dataset updated
Jan 30, 2022
Dataset provided by
Dryad
Authors
John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko
Time period covered
2021
Description
This section does not describe the methods of read-tv software development, which can be found in the associated manuscript from JAMIA Open (JAMIO-2020-0121.R1). This section describes the methods involved in the surgical work flow disruption data collection. A curated, PHI-free (protected health information) version of this dataset was used as a use case for this manuscript.

Observer training

Trained human factors researchers conducted each observation following the completion of observer training. The researchers were two full-time research assistants based in the department of surgery at site 3 who visited the other two sites to collect data. Human Factors experts guided and trained each observer in the identification and standardized collection of FDs. The observers were also trained in the basic components of robotic surgery in order to be able to tangibly isolate and describe such disruptive events.

Comprehensive observer training was ensured with both classroom and floor train...
l
Datasets to accompany Resilience, where to begin? A lay theories approach....
figshare.le.ac.uk
bin
Updated Oct 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Maltby (2019). Datasets to accompany Resilience, where to begin? A lay theories approach. (Currently under submission) [Dataset]. http://doi.org/10.25392/leicester.data.9632213.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.9632213.v1
Dataset updated
Oct 24, 2019
Dataset provided by
University of Leicester
Authors
John Maltby
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Samples relating to 12 analyses of lay-theories of resilience among participants from USA, New Zealand, India, Iran, Russia (Moscow; Kazan). Central variables relate to participant endorsements of resilience descriptors. Demographic data includes (though not for all samples), Sex/Gender, Age, Ethnicity, Work, and Educational Status. Analysis 1. USA Exploratory Factor Analysis dataAnalysis 2. New Zealand Exploratory Factor Analysis dataAnalysis 3. India Exploratory Factor Analysis dataAnalysis 4. Iran Exploratory Factor Analysis dataAnalysis 5. Russian (Moscow) Exploratory Factor Analysis dataAnalysis 6. Russian (Kazan) Exploratory Factor Analysis dataAnalysis 7. USA Confirmatory Factor Analysis dataAnalysis 8. New Zealand Confirmatory Factor Analysis dataAnalysis 9. India Confirmatory Factor Analysis dataAnalysis 10. Iran Confirmatory Factor Analysis dataAnalysis 11. Russian (Moscow) Confirmatory Factor Analysis dataAnalysis 12. Russian (Kazan) Confirmatory Factor Analysis data
f
Supplementary materials for the article: Exploratory factor analysis with...
figshare.com
data.4tu.nl
+1more
zip
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joost de Winter; Dimitra Dodou (2023). Supplementary materials for the article: Exploratory factor analysis with small sample sizes [Dataset]. http://doi.org/10.4121/uuid:de29c01b-d8a3-44b4-a6d1-45af4c61a919
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:de29c01b-d8a3-44b4-a6d1-45af4c61a919
Dataset updated
Jun 2, 2023
Dataset provided by
4TU.ResearchData
Authors
Joost de Winter; Dimitra Dodou
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Supplementary materials for the article: De Winter, J. C. F., Dodou, D., & Wieringa, P. A. (2009). Exploratory factor analysis with small sample sizes. Multivariate Behavioral Research, 44, 147–181.
f
ftmsRanalysis: An R package for exploratory data analysis and interactive...
plos.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue (2023). ftmsRanalysis: An R package for exploratory data analysis and interactive visualization of FT-MS data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007654
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007654
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Computational Biology
Authors
Lisa M. Bramer; Amanda M. White; Kelly G. Stratton; Allison M. Thompson; Daniel Claborne; Kirsten Hofmockel; Lee Ann McCue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.
Bank Marketing Classification Dataset
kaggle.com
Updated Aug 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BALAJI VARA PRASAD DEGA (2024). Bank Marketing Classification Dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/5532086
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/5532086
Dataset updated
Aug 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BALAJI VARA PRASAD DEGA
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

There are four datasets: 1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014] 2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs. 3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs). 4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs). The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).

The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).
f
Data from: Visualization for Interval Data
tandf.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muzi Zhang; Dennis K. J. Lin (2023). Visualization for Interval Data [Dataset]. http://doi.org/10.6084/m9.figshare.19617396.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19617396.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Muzi Zhang; Dennis K. J. Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interval data are widely used in many fields, notably in economics, industry, and health areas. Analogous to the scatterplot for single-value data, the rectangle plot and cross plot are the conventional visualization methods for the relationship between two variables in interval forms. These methods do not provide much information to assess complicated relationships, however. In this article, we propose two visualization methods: Segment and Dandelion plots. They offer much more information than the existing visualization methods and allow us to have a much better understanding of the relationship between two variables in interval forms. A general guide for reading these plots is provided. Relevant theoretical support is developed. Both empirical and real data examples are provided to demonstrate the advantages of the proposed visualization methods. Supplementary materials for this article are available online.
Customer Personality Analysis
kaggle.com
zip
Updated Aug 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Patel (2021). Customer Personality Analysis [Dataset]. https://www.kaggle.com/imakash3011/customer-personality-analysis
Explore at:
zip(63450 bytes)Available download formats
Dataset updated
Aug 22, 2021
Authors
Akash Patel
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Problem Statement

Customer Personality Analysis is a detailed analysis of a company’s ideal customers. It helps a business to better understand its customers and makes it easier for them to modify products according to the specific needs, behaviors and concerns of different types of customers.

Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

Content

Attributes

People

ID: Customer's unique identifier

Year_Birth: Customer's birth year

Education: Customer's education level

Marital_Status: Customer's marital status

Income: Customer's yearly household income

Kidhome: Number of children in customer's household

Teenhome: Number of teenagers in customer's household

Dt_Customer: Date of customer's enrollment with the company

Recency: Number of days since customer's last purchase

Complain: 1 if the customer complained in the last 2 years, 0 otherwise

Products

MntWines: Amount spent on wine in last 2 years

MntFruits: Amount spent on fruits in last 2 years

MntMeatProducts: Amount spent on meat in last 2 years

MntFishProducts: Amount spent on fish in last 2 years

MntSweetProducts: Amount spent on sweets in last 2 years

MntGoldProds: Amount spent on gold in last 2 years

Promotion

NumDealsPurchases: Number of purchases made with a discount

AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise

AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise

AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise

AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise

AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise

Response: 1 if customer accepted the offer in the last campaign, 0 otherwise

Place

NumWebPurchases: Number of purchases made through the company’s website

NumCatalogPurchases: Number of purchases made using a catalogue

NumStorePurchases: Number of purchases made directly in stores

NumWebVisitsMonth: Number of visits to company’s website in the last month

Target

Need to perform clustering to summarize customer segments.

Acknowledgement

The dataset for this project is provided by Dr. Omar Romero-Hernandez.

Solution

You can take help from following link to know more about the approach to solve this problem. Visit this URL

Inspiration

happy learning....

Hope you like this dataset please don't forget to like this dataset
Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Mar 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary Guidelines for Americans [Dataset]. https://catalog.data.gov/dataset/data-from-an-exploratory-analysis-of-barriers-to-usage-of-the-usda-dietary-guidelines-for--bb6c7
Explore at:
Dataset updated
Mar 30, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The average American’s diet does not align with the Dietary Guidelines for Americans (DGA) provided by the U.S. Department of Agriculture and the U.S. Department of Health and Human Services (2020). The present study aimed to compare fruit and vegetable consumption among those who had and had not heard of the DGA, identify characteristics of DGA users, and identify barriers to DGA use. A nationwide survey of 943 Americans revealed that those who had heard of the DGA ate more fruits and vegetables than those who had not. Men, African Americans, and those who have more education had greater odds of using the DGA as a guide when preparing meals relative to their respective counterparts. Disinterest, effort, and time were among the most cited reasons for not using the DGA. Future research should examine how to increase DGA adherence among those unaware of or who do not use the DGA. Comparative analyses of fruit and vegetable consumption among those who were aware/unaware and use/do not use the DGA were completed using independent samples t tests. Fruit and vegetable consumption variables were log-transformed for analysis. Binary logistic regression was used to examine whether demographic features (race, gender, and age) predict DGA awareness and usage. Data were analyzed using SPSS version 28.1 and SAS/STAT® version 9.4 TS1M7 (2023 SAS Institute Inc).
Z
Funding Covid-19 research: Insights from an exploratory analysis using open...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jul 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mugabushaka, Alexis-Michel (2022). Funding Covid-19 research: Insights from an exploratory analysis using open data infrastructures - Supplementary material [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6112761
Explore at:
Dataset updated
Jul 7, 2022
Dataset provided by
Mugabushaka, Alexis-Michel
Waltman, Ludo
Van Eck, Nees Jan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains supplementary material for the paper 'Funding Covid-19 research: Insights from an exploratory analysis using open data infrastructures' by Alexis-Michel Mugabushaka, Nees Jan van Eck, and Ludo Waltman.

supplementary_material_1_dataset.ods: Dataset of Covid-19 publications.

supplementary_material_2_sample.ods: Samples of publications used to assess the accuracy of funding data in the different databases.

supplementary_material_3_tables_and_figures.ods: Statistics underlying the tables and figures presented in the paper.
Experimental Data Set for the study "Exploratory Landscape Analysis is...
zenodo.org
explore.openaire.eu
csv, text/x-python +1
Updated Jan 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quentin Renau; Carola Doerr; Carola Doerr; Johann Dreo; Johann Dreo; Benjamin Doerr; Benjamin Doerr; Quentin Renau (2021). Experimental Data Set for the study "Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy" [Dataset]. http://doi.org/10.5281/zenodo.3886816
Explore at:
text/x-python, csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3886816
Dataset updated
Jan 28, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Quentin Renau; Carola Doerr; Carola Doerr; Johann Dreo; Johann Dreo; Benjamin Doerr; Benjamin Doerr; Quentin Renau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This are the feature values used in the study "Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy".

The dataset regroups feature values for every "cheap" features available in the R package flacco and are computed using 5 sampling strategies and in dimension $$d=5$$:

Random: the classical Mersenne-Twister algorithm;

Randu: a random number generator that is notoriously bad;

LHS: a centered Latin Hypercube Design;

iLHS: an improved Latin Hypercube Design;

Sobol: points extracted from a Sobol' low-discrepancy sequence.

The csv file features_summury_dim_5_ppsn.csv regroups 100 values for every features whereas features_summury_dim_5_ppsn_median.csv regroups for every feature the median of the 100 values.

In the folder PPSN_feature_plots are the histograms of feature values on the 24 COCO functions for 3 sampling strategies: Random, LHS and Sobol.

The Python file sampling_ppsn.py is the code used to generate the sample points from which the feature values are computed.

The file stats50_knn_dt.csv provide the raw data of median and IQR (inter quartile interval) for the heatmaps and boxplots available in the paper.

Finally, the files results_classif_knn100.csv (resp. dt) provide the accuracy of 100 classifications for every settings.
Enterprise-Driven Open Source Software
zenodo.org
application/gzip
Updated Apr 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas (2020). Enterprise-Driven Open Source Software [Dataset]. http://doi.org/10.5281/zenodo.3653878
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3653878
Dataset updated
Apr 22, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

The main dataset is provided as a 17,252 record tab-separated file named enterprise_projects.txt with the following 27 fields.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

sdtc: true if selected using the same domain top committers heuristic (9,006 records)

mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,289 records)

mcve: true if selected using the multiple committers from a probable company heuristic (7,990 records),

star_number: number of GitHub watchers

commit_count: number of commits

files: number of files in current main branch

lines: corresponding number of lines in text files

pull_requests: number of pull requests

most_recent_commit: date of the most recent commit

committer_count: number of different committers

author_count: number of different authors

dominant_domain: the projects dominant email domain

dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain

dominant_domain_author_commits: corresponding number for commit authors

dominant_domain_committers: number of committers whose email matches the project's dominant domain

dominant_domain_authors: corresponding number of commit authors

cik: SEC's EDGAR "central index key"

fg500: true if this is a Fortune Global 500 company (2,232 records)

sec10k: true if the company files SEC 10-K forms (4,178 records)

sec20f: true if the company files SEC 20-F forms (429 records)

project_name: GitHub project name

owner_login: GitHub project's owner login

company_name: company name as derived from the SEC and Fortune 500 data

owner_company: GitHub project's owner company name

license: SPDX license identifier

The file cohost_project_details.txt provides the full set of 309,531 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

stars: number of GitHub watchers

commit_count: number of commits
d
EXPLORATORY POLLEN ANALYSIS OF SAMPLES FROM SAN PEDRO SIRIS, BELIZE
search.dataone.org
Updated Dec 13, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cummings, Linda Scott (PaleoResearch Institute) (2012). EXPLORATORY POLLEN ANALYSIS OF SAMPLES FROM SAN PEDRO SIRIS, BELIZE [Dataset]. http://doi.org/10.6067/XCV8W66K7Q
Explore at:
Unique identifier
https://doi.org/10.6067/XCV8W66K7Q
Dataset updated
Dec 13, 2012
Dataset provided by
the Digital Archaeological Record
Authors
Cummings, Linda Scott (PaleoResearch Institute)
Description
Ten pollen samples collected from sediments in the East Field system at San Pedro Siris, situated in the Yalbac area of the Cayo District of central Belize, were examined for pollen evidence of crops.
d
EXPLORATORY POLLEN ANALYSIS OF THE LOWEST STRATIGRAPHIC SAMPLE FROM HELL...
search.dataone.org
Updated Aug 21, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cummings, Linda Scott (PaleoResearch Institute) (2012). EXPLORATORY POLLEN ANALYSIS OF THE LOWEST STRATIGRAPHIC SAMPLE FROM HELL GAP, SOUTHEASTERN WYOMING [Dataset]. http://doi.org/10.6067/XCV82F7MQP
Explore at:
Unique identifier
https://doi.org/10.6067/XCV82F7MQP
Dataset updated
Aug 21, 2012
Dataset provided by
the Digital Archaeological Record
Authors
Cummings, Linda Scott (PaleoResearch Institute)
Description
The lowest sample collected from a stratigraphic column at Hell Gap was submitted for exploratory pollen analysis. Exploratory pollen analysis of this sample includes a pollen count, evaluation of the condition of the pollen and concentration of pollen in this sediment, and recommendations for the future.
Degree of Geochemical Similarity (DOGS) using correlation of log-transformed...
ecat.ga.gov.au
Updated Jan 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2015). Degree of Geochemical Similarity (DOGS) using correlation of log-transformed multi-element concentrations: methodology and applications [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/19b6faf4-4ede-c716-e053-12a3070a651b
Explore at:
Dataset updated
Jan 1, 2015
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
RD
Area covered
Asia
Description
The 42 element, 1190 sample Mobile Metal Ion subset of the National Geochemical Survey of Australia database was used to develop and illustrate the concept of Degree of Geochemical Similarity of soil samples. Element concentrations were unified to parts per million units and log(10)-transformed. The degree of similarity of pairs of samples of known provenance in the Yilgarn Craton were obtained using least squares linear regression analysis and demonstrated that the method successfully assessed the degree of similarity of soils related to granitoid and greenstone lithologies. Exploratory Data Analysis symbol maps of all remaining samples in the database against various reference samples were used to obtain correlation maps for not only granitoid- and greenstone-related soil types, but also to distinguish between for example samples derived from marine vs regolith carbonate. Likewise, the distribution of soil samples having a geochemical fingerprint similar to mineralised provinces (e.g., Mt Isa) can be mapped and this can be used as a first order prospection tool. Sensitivity analysis confirmed the method to produce robust results without undue influence from either single elements with anomalous concentrations or elements with a high proportion of censored values.
m
Facebook Audience Insight on Food Choice
data.mendeley.com
narcis.nl
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lidia Mayangsari (2020). Facebook Audience Insight on Food Choice [Dataset]. http://doi.org/10.17632/9dbd9jcdvs.1
Explore at:
Unique identifier
https://doi.org/10.17632/9dbd9jcdvs.1
Dataset updated
Nov 12, 2020
Authors
Lidia Mayangsari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data presented in this paper is used to examine the behavioral factors that influence the preferences of foods in Indonesia, and Indonesian audiences’ segmentation behind those preferences, provided by social media data. We collected the data through an online platform by performing a query search on Facebook Audience Insights Interests. The keywords that we use in the question quest are based on the United Nations Food and Agriculture Organisation (FAO) Food Balance Sheet (FBS) which is retrieved from FAOStat in May 2020. The data was gathered between 15 May and 2 July 2020. With a sample size of 100-150 million viewers or about 36.95 per cent-55.43 per cent of Indonesia 's 2019 population, we limited our sample to Indonesia. The dataset is made up of ten tables that can be separately analyzed. For each table, we carry out exploratory data analysis (EDA) to provide more insights. Such data could be of interest to various fields, including food scientists, government and policymakers, data scientists/analysts, and marketers. This data could also be the complementary source for the scarcity of food survey data from the government, particularly the behavioral aspects.
o
Data from: Exploratory and confirmatory analyses in sentence processing: A...
osf.io
Updated Apr 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno Nicenboim; Shravan Vasishth; Felix Engelmann; Katja Suckow (2022). Exploratory and confirmatory analyses in sentence processing: A case study of number interference in German [Dataset]. http://doi.org/10.17605/OSF.IO/MMR7S
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/MMR7S
Dataset updated
Apr 18, 2022
Dataset provided by
Center For Open Science
Authors
Bruno Nicenboim; Shravan Vasishth; Felix Engelmann; Katja Suckow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Given the replication crisis in cognitive science, it is important to consider what researchers need to do in order to report results that are reliable. We consider three changes in current practice that have the potential to deliver more realistic and robust claims. First, the planned experiment should be divided up into two stages, an exploratory stage and a confirmatory stage. This clear separation allows the researcher to check whether any results found in the exploratory stage are robust. The second change is to carry out adequately powered studies. We show that this is imperative if we want to obtain realistic estimates of effects in psycholinguistics. The third change is to use Bayesian data-analytic methods rather than frequentist ones; the Bayesian framework allows us to focus on the best estimates we can obtain of the effect, rather than rejecting a strawman null. As a case study, we investigate number interference effects in German. Number feature interference is predicted by cue-based retrieval models of sentence processing (Van Dyke & Lewis, 2003; Vasishth & Lewis, 2006), but has shown inconsistent results. We show that by implementing the three changes mentioned, suggestive evidence emerges that is consistent with the predicted number interference effects.
G
Enhanced interpretation of regional geochemical stream sediment data from...
open.canada.ca
ouvert.canada.ca
html
Updated Oct 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Yukon (2024). Enhanced interpretation of regional geochemical stream sediment data from Yukon: catchment basin analysis and weighted sums modeling [Dataset]. https://open.canada.ca/data/en/dataset/d7a9dc77-f36b-08d5-9f68-5e0de251b234
Explore at:
htmlAvailable download formats
Dataset updated
Oct 30, 2024
Dataset provided by
Government of Yukon
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Yukon
Description
Geochemical data from regional geochemistry survey samples from Yukon have undergone exploratory data analysis and principal component analysis. The results of these analyses clearly demonstrate geological control on the distribution of a number of important commodity and mineral deposit pathfinder elements. Catchment basins have been delineated for the samples and the dominant simplified geological unit in each catchment basin used to level the geochemical data where appropriate. Levelling the geochemical data in this fashion generally fails to fully account for enrichments in many commodity and mineral deposit pathfinder elements in the bedrock due to practical limitations on the resolution of the mapping and knowledge of the relative contributions of different geological units, although the resulting data interpretation is an improvement on one based solely upon raw geochemical data. Weighted sums models have been generated for the deposit types that either exist within the individual map areas covered by this report or are considered by the authors to be of exploration significance. Separate catchment maps showing the distribution of stream water pH and the concentration of elements inferred to have accumulated through hydromorphic dispersion are also provided. An additional series of maps has been generated to display weighted sums models calculated using regression of commodity and mineral deposit pathfinder elements against those principal components containing the same elements that show the strongest spatial associations with bedrock geology. Both model types have been iteratively tested using known mineral occurrences in the relevant map areas and, for the most part, are compatible with the distribution of known mineralization where sampling coverage is adequate. Geochemical anomalies unrelated to known mineral occurrences are evident in both data sets and provide possible targets for further investigation.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kelly G. Stratton; Bobbie-Jo M. Webb-Robertson; Lee Ann McCue; Bryan Stanfill; Daniel Claborne; Iobani Godinez; Thomas Johansen; Allison M. Thompson; Kristin E. Burnum-Johnson; Katrina M. Waters; Lisa M. Bramer (2023). pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00760.s001

Data from: pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.1021/acs.jproteome.8b00760.s001

Dataset updated

May 31, 2023

Dataset provided by

ACS Publications

Authors

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Prior to statistical analysis of mass spectrometry (MS) data, quality control (QC) of the identified biomolecule peak intensities is imperative for reducing process-based sources of variation and extreme biological outliers. Without this step, statistical results can be biased. Additionally, liquid chromatography–MS proteomics data present inherent challenges due to large amounts of missing data that require special consideration during statistical analysis. While a number of R packages exist to address these challenges individually, there is no single R package that addresses all of them. We present pmartR, an open-source R package, for QC (filtering and normalization), exploratory data analysis (EDA), visualization, and statistical analysis robust to missing data. Example analysis using proteomics data from a mouse study comparing smoke exposure to control demonstrates the core functionality of the package and highlights the capabilities for handling missing data. In particular, using a combined quantitative and qualitative statistical test, 19 proteins whose statistical significance would have been missed by a quantitative test alone were identified. The pmartR package provides a single software tool for QC, EDA, and statistical comparisons of MS data that is robust to missing data and includes numerous visualization capabilities.

Clear search

Close search

Google apps

Main menu

Data from: pmartR: Quality Control and Statistics for Mass...

COVID 19 Dataset

Context

Content

Data from: Supplementary Material for "Sonification for Exploratory Data...

Data from: Research and exploratory analysis driven - time-data...

Datasets to accompany Resilience, where to begin? A lay theories approach....

Supplementary materials for the article: Exploratory factor analysis with...

ftmsRanalysis: An R package for exploratory data analysis and interactive...

Bank Marketing Classification Dataset

Data from: Visualization for Interval Data

Customer Personality Analysis

Context

Content

Target

Acknowledgement

Solution

Inspiration

Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary...

Funding Covid-19 research: Insights from an exploratory analysis using open...

Experimental Data Set for the study "Exploratory Landscape Analysis is...

Enterprise-Driven Open Source Software

EXPLORATORY POLLEN ANALYSIS OF SAMPLES FROM SAN PEDRO SIRIS, BELIZE

EXPLORATORY POLLEN ANALYSIS OF THE LOWEST STRATIGRAPHIC SAMPLE FROM HELL...

Degree of Geochemical Similarity (DOGS) using correlation of log-transformed...

Facebook Audience Insight on Food Choice

Data from: Exploratory and confirmatory analyses in sentence processing: A...

Enhanced interpretation of regional geochemical stream sediment data from...

Data from: pmartR: Quality Control and Statistics for Mass Spectrometry-Based Biological Data