100+ datasets found

m
Raw data outputs 1-18
bridges.monash.edu
researchdata.edu.au
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.26180/21259491.v1
Dataset updated
May 30, 2023
Dataset provided by
Monash University
Authors
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Scooter Sales - Excel Project
kaggle.com
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ann Truong (2023). Scooter Sales - Excel Project [Dataset]. https://www.kaggle.com/datasets/bvanntruong/scooter-sales-excel-project
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2023
Dataset provided by
Kaggle
Authors
Ann Truong
Description
The link for the Excel project to download can be found on GitHub here. It includes the raw data, Pivot Tables, and an interactive dashboard with Pivot Charts and Slicers. The project also includes business questions and the formulas I used to answer. The image below is included for ease. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12904052%2F61e460b5f6a1fa73cfaaa33aa8107bd5%2FBusinessQuestions.png?generation=1686190703261971&alt=media" alt=""> The link for the Tableau adjusted dashboard can be found here.

A screenshot of the interactive Excel dashboard is also included below for ease. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12904052%2Fe581f1fce8afc732f7823904da9e4cce%2FScooter%20Dashboard%20Image.png?generation=1686190815608343&alt=media" alt="">
HR Analytics Dataset
kaggle.com
zip
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anshika2301 (2023). HR Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/anshika2301/hr-analytics-dataset
Explore at:
zip(213690 bytes)Available download formats
Dataset updated
Oct 27, 2023
Authors
anshika2301
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
HR analytics, also referred to as people analytics, workforce analytics, or talent analytics, involves gathering together, analyzing, and reporting HR data. It is the collection and application of talent data to improve critical talent and business outcomes. It enables your organization to measure the impact of a range of HR metrics on overall business performance and make decisions based on data. They are primarily responsible for interpreting and analyzing vast datasets.

Download the data CSV files here ; https://drive.google.com/drive/folders/18mQalCEyZypeV8TJeP3SME_R6qsCS2Og
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Supply Chain DataSet
kaggle.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
Explore at:
zip(9340 bytes)Available download formats
Dataset updated
Jun 1, 2023
Authors
Amir Motefaker
Description
Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
RAW Data Excel and SPSS
figshare.com
xlsx
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jamil Ahmed (2024). RAW Data Excel and SPSS [Dataset]. http://doi.org/10.6084/m9.figshare.27101149.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27101149.v1
Dataset updated
Sep 25, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jamil Ahmed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This cross-sectional study aimed to determine the prevalence of obesity and perceived barriers to weight loss in 1453 Bahraini adults who had used any intervention to lose weight in the past year. We found a high prevalence (78.2%) of overweight and obesity. Females were more likely to have obesity compared to males (81.4% vs. 66.7%). Older individuals aged 36-45 were 3.37 times, and 45 or older were 3.56 times more likely to have obesity. Married participants had higher odds of obesity compared to single participants (OR=1.79). Participants with obesity were more likely to be unemployed compared to students (OR=1.49). The most common contributing factors to weight gain were lack of physical activity (29.5%) and unhealthy diet (29.2%). Participants with obesity were more likely to have relied on dieting (OR=2.53) or exercise (OR=1.47) for weight loss and used medication (OR=5.23). This study highlights the complex relationship between sociodemographic factors, lifestyle behaviors, and obesity and sustaining weight loss.
Data from: Raw data files
figshare.com
bin
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronen Schuster (2021). Raw data files [Dataset]. http://doi.org/10.6084/m9.figshare.14319758.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14319758.v1
Dataset updated
Mar 26, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ronen Schuster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data tables and the statistical analysis applied to the data. Files are labeled by figure number. Within each file, each table and linked graph and analysis is annotated by figure number and panel letter. All files are generated in graphpad prism.
CSV file used in statistical analyses
data.csiro.au
researchdata.edu.au
+1more
Updated Oct 13, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSIRO (2014). CSV file used in statistical analyses [Dataset]. http://doi.org/10.4225/08/543B4B4CA92E6
Explore at:
Unique identifier
https://doi.org/10.4225/08/543B4B4CA92E6
Dataset updated
Oct 13, 2014
Dataset authored and provided by
CSIROhttp://www.csiro.au/
License
https://research.csiro.au/dap/licences/csiro-data-licence/https://research.csiro.au/dap/licences/csiro-data-licence/
Time period covered
Mar 14, 2008 - Jun 9, 2009
Dataset funded by
CSIROhttp://www.csiro.au/
Description
A csv file containing the tidal frequencies used for statistical analyses in the paper "Estimating Freshwater Flows From Tidally-Affected Hydrographic Data" by Dan Pagendam and Don Percival.
Carrier Safety Measurement System (CSMS, or SMS) - Raw Data - Download...
catalog.data.gov
Updated Jun 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Motor Carrier Safety Administration (2024). Carrier Safety Measurement System (CSMS, or SMS) - Raw Data - Download Current Data [Dataset]. https://catalog.data.gov/dataset/carrier-safety-measurement-system-csms-or-sms-raw-data-download-current-data
Explore at:
Dataset updated
Jun 26, 2024
Dataset provided by
Federal Motor Carrier Safety Administrationhttps://www.fmcsa.dot.gov/
Description
The Federal Motor Carrier Safety Administration's (FMCSA) Safety Management System (SMS) is an automated data system used by FMCSA to monitor motor carrier on-road safety performance. FMCSA analyzes safety performance by grouping carrier data in the SMS into seven Behavioral Analysis and Safety Improvement Categories (BASICs) which are, in turn, used to identify potential safety problems with individual carriers and determine when an enforcement intervention might be appropriate.
Product Revenue Data - Excel Project
kaggle.com
zip
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharmaine Wong (2024). Product Revenue Data - Excel Project [Dataset]. https://www.kaggle.com/datasets/swsw1717/product-line-revenue-data-excel-project/data
Explore at:
zip(62947 bytes)Available download formats
Dataset updated
Jul 31, 2024
Authors
Sharmaine Wong
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset illustrates sales data from a company and its three product lines - boats, cars, and planes. It contains information such as historical and sales data. This is fictional data, created and used for data exploration and profit margin analysis.

The link for the Excel project to download can be found at this GitHub Repository. It includes the raw data, statistical analysis, Pivot Tables, and a dashboard with Pivot Charts for interaction.

Below is a screenshot of the charts for ease. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10624788%2Fc945ef4223f1b0b6c2dfe7ade798e34e%2FWeekly%20Revenue%20by%20Product%20Line.png?generation=1722385095875351&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10624788%2Fd3be2fd1f741b0899e79b9c50c7e29a0%2FRevenue%20and%20Profit%20by%20Quarter.png?generation=1722385108310009&alt=media" alt="">
Data from: Raw Data Files
figshare.com
application/x-rar
Updated May 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Ralph Uhlig; Daniel Martin-Jimenez; Ricardo Garcia (2019). Raw Data Files [Dataset]. http://doi.org/10.6084/m9.figshare.8157899.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8157899.v1
Dataset updated
May 21, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Manuel Ralph Uhlig; Daniel Martin-Jimenez; Ricardo Garcia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content:The archive contains the raw data used to generate the Figures 1-3 as well as the Supplementary Figures 1-9. Software:The data was created Igor Pro 6 and the Asylum Research Software 14. For best performance of data visualization these proprietary softwares are recommended. However, all the files can be read by Gwyddion, a freely available SPM data analysis tool (http://gwyddion.net/).Credit:When using this data, please cite the original publication.For further questions, please consult the article text or get in touch with the corresponding author.
d
COVID Impact Survey - Public Data
data.world
csv, zip
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2024). COVID Impact Survey - Public Data [Dataset]. https://data.world/associatedpress/covid-impact-survey-public-data
Explore at:
csv, zipAvailable download formats
Dataset updated
Oct 16, 2024
Authors
The Associated Press
Description
Overview

The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.

Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).

The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.

The survey is focused on three core areas of research:

Physical Health: Symptoms related to COVID-19, relevant existing conditions and health insurance coverage.

Economic and Financial Health: Employment, food security, and government cash assistance.

Social and Mental Health: Communication with friends and family, anxiety and volunteerism. (Questions based on those used on the U.S. Census Bureau’s Current Population Survey.) ## Using this Data - IMPORTANT This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!

Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.

Queries

If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".

Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.

Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.

The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."

Margin of Error

The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:

At least twice the margin of error, you can report there is a clear difference.

At least as large as the margin of error, you can report there is a slight or apparent difference.

Less than or equal to the margin of error, you can report that the respondents are divided or there is no difference. ## A Note on Timing Survey results will generally be posted under embargo on Tuesday evenings. The data is available for release at 1 p.m. ET Thursdays.

About the Data

The survey data will be provided under embargo in both comma-delimited and statistical formats.

Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)

Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.

Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.

Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.

Attribution

Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.

AP Data Distributions

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Bike Buyers - Excel Project
kaggle.com
zip
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ann Truong (2023). Bike Buyers - Excel Project [Dataset]. https://www.kaggle.com/datasets/bvanntruong/bike-buyers
Explore at:
zip(11866 bytes)Available download formats
Dataset updated
Jun 8, 2023
Authors
Ann Truong
Description
This dataset illustrates customer data from bike sales. It contains information such as Income, Occupation, Age, Commute, Gender, Children, and more. This is fictional data, created and used for data exploration and cleaning.

The link for the Excel project to download can be found on GitHub here. It includes the raw data, the cleaned data, Pivot Tables, and a dashboard with Pivot Charts and Slicers for interaction. This allows the interactive dashboard to filter by Marital Status, Region, and Education.

Below is a screenshot of the dashboard for ease. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12904052%2Fcbc9db6fe00f3201c64e4fdb668ce9d1%2FBikeBuyers%20Dashboard%20Image.png?generation=1686186378985936&alt=media" alt="">
Z
Dataset for Seismic waveform tomography of the Central and Eastern...
data.niaid.nih.gov
zenodo.org
Updated Mar 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blom, Nienke Alexandra; Gokhberg, Alexey; Fichtner, Andreas (2020). Dataset for Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3538038
Explore at:
Dataset updated
Mar 31, 2020
Dataset provided by
ETH Zürich
University of Cambridge
Authors
Blom, Nienke Alexandra; Gokhberg, Alexey; Fichtner, Andreas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Eastern Mediterranean
Description
Dataset corresponding to the Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle

This dataset belongs to the seismic waveform tomography of the Central and Eastern Mediterranean by Blom, Gokhberg and Fichtner, Solid Earth (Discussions), 2019. Seismic tomography is an inverse problem where the internal elastic structure of the Earth (the upper ~500 km) is determined from seismograms (the vibrations of the Earth as a result of earthquakes, as recorded by seismometers at the Earth's surface). This inverse problem is cast as an optimisation where the misfit between observed and synthetic seismograms is minimised: waveform tomography (often referred to as full waveform inversion or FWI). Synthetic seismograms are produced by simulating the elastic wavefield of earthquakes within the Earth. The optimisation problem is solved by iterative, deterministic, gradient-based inversion. Gradients are computed using the adjoint method, which requires one forward wavefield simulation and one adjoint wavefield simulation per earthquake used in the project.

The inversion was carried out over several frequency bands, starting with the longest periods and including a progressively broader frequency band. Within each frequency band, ~10-20 iterations were carried out, totalling to a hundred iterations. Synthetic seismograms and iteration information are stored for a subset of iterations, notably those where human interaction (i.e. the selection of events / data windows) took place.

Here, we describe:

The contents of this package

How to set up the package such that all the data can be accessed and used, and reproduce the figures.

Contents of this package

Data that was used for the seismic waveform inversion: raw and processed seismograms, station information, earthquake information, as well as the window selection (designating the parts of the data that were actually used at each stage in the inversion) and synthetic seismograms produced during various stages of the inversion. This information is gathered in the LASIF project "EMed_full.complete.tar".

Models and misfit development across the iterations, as well as models relating to model testing, as carried out after the inversion. This information is gathered in the tarball "MODEL_FILES.tar". Model files are both given in the ses3d ascii format (text file drho, dvsv, dvsh, dvp and block_x, block_y, block_z) and in bundled .vtu format. Conversion to .vtu was done using the tools in SCRIPTS. These vtu files can be viewed using Paraview.

information on the tools and code that was used to do the inversion:

ses3d: a seismic wave propagation spectral element code in spherical coordinates. This will run both forward and adjoint simulations. This is available publicly through the developers on https://cos.ethz.ch/software/production/ses3d.html. See Gokhberg & Fichtner, 2016.

LASIF: a waveform inversion workflow managing package, where we have made small adaptations to make it suitable for our workflow. The original package is available via www.lasif.net and on github (see Krischer et al, 2015), the modified version is added to this package as 'LASIF-master.zip'.

LASIF_scripts: bespoke scripts in order to interact with the LASIF project and generate different types of analyses and plots that are used in the publication. This is included in the tarball 'LASIF_scripts.tar'

SCRIPTS: containing some modified tools that were originally written for ses3d, as well as some additional tools - notably to interact with models converted to the VTK format. This is included in the tarball 'SCRIPTS.tar'

A description of the conda environment named lasif_ext (which is used for all the data analysis), in the form of the yml file 'lasif_ext.yml'

An additional LASIF project which is used just to compute sensitivity kernels for different windows within the same trace: 'EMed_window_kernels.tar'. This is used as an example in one of the manuscript figures.

How to set up the data package

Download the entire data package. We will assume it is located in ~/Downloads/.

Get miniconda or anaconda if you don't have it.

Install LASIF. This can be done using the instructions from the LASIF website, but with a few adaptations, which are detailed in the lasif_ext.yml file. This amounts to the following:

Add the channel conda-forge to your standard channels

Name the environment "lasif_ext"

Manually replace the files in the LASIF source directory with those in LASIF-master.zip.

Install the specific version of pyqt=4.11.

Install the additional packages jupyter, vtk=7.0.0, pandas=0.23.4 (these are the ones that work for me).

Extract the LASIF_scripts.tar to the site-packages directory of your conda environment:

tar -xf ~/Downloads/LASIF_scripts.tar -C [/path/to/conda/environments]/lasif_ext/lib/python2.7/site-packages/

Make a project directory and extract all needed packages into it:

make project directory

mkdir CEMed_project_Blometal cd CEMed_project_Blometal

extract data tarballs into it

tar -xf ~/Downloads/EMed_full.complete.tar tar -xf ~/Downloads/EMed_window_kernels.tar tar -xf ~/Downloads/MODEL_FILES.tar

make scripts directory and extract scripts into it

mkdir conda_stuff tar -xf ~/Downloads/SCRIPTS.tar -C conda_stuff

make data analysis directory

mkdir data_analysis cd data_analysis

extract analysis tools

tar -xf ~/Downloads/NPY_FILES.tar tar -xf ~/Downloads/FIGURE_SCRIPTS.tar tar -xf ~/Downloads/figs_png.tar

Now the project should be ready for inspection. The following things can be done, for example:

Reproduce the figures in the manuscript. All scripts for this are located in CEMed_project_Blometal/data_analysis/FIGURE_SCRIPTS/.

conda activate lasif_ext cd CEMed_project_Blometal jupyter notebook

This should open up a browser tab that shows the directory structure. Navigate to data_analysis/FIGURE_scripts and click on one of the .ipynb files to open it. If you press 'Kernel' > 'Restart kernel and run all' at the top, all cells will be launched automatically. This should work out of the box.

Interact with the lasif project. For this, refer to the LASIF website. Note that above jupyter notebooks do so extensively, using the lasif communicator.

Build additional analysis tools, using the tools supplied in SCRIPTS and LASIF_scripts.

References:

Blom, N., Gokhberg, A., and Fichtner, A.: Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle, Solid Earth Discuss., https://doi.org/10.5194/se-2019-152, in review, 2019.

Gokhberg, A., Fichtner, A., 2016. Full-waveform inversion on heterogeneous HPC systems. Comp. & Geosci. 89, 260-268. https://doi.org/10.1016/j.cageo.2015.12.013

Krischer, L., Fichtner, A., Zukauskaitė, S., and Igel, H. (2015), Large‐Scale Seismic Inversion Framework, Seismological Research Letters, 86(4), 1198–1207. doi:10.1785/0220140248
m
KU-HAR: An Open Dataset for Human Activity Recognition
data.mendeley.com
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah-Al Nahid (2021). KU-HAR: An Open Dataset for Human Activity Recognition [Dataset]. http://doi.org/10.17632/45f952y38r.5
Explore at:
Unique identifier
https://doi.org/10.17632/45f952y38r.5
Dataset updated
Feb 16, 2021
Authors
Abdullah-Al Nahid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
(Always use the latest version of the dataset. )

Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them. The activities are:

Stand➞ Standing still (1 min) Sit➞ Sitting still (1 min) Talk-sit➞ Talking with hand movements while sitting (1 min) Talk-stand➞ Talking with hand movements while standing or walking(1 min) Stand-sit➞ Repeatedly standing up and sitting down (5 times) Lay➞ Laying still (1 min) Lay-stand➞ Repeatedly standing up and laying down (5 times) Pick➞ Picking up an object from the floor (10 times) Jump➞ Jumping repeatedly (10 times) Push-up➞ Performing full push-ups (5 times) Sit-up➞ Performing sit-ups (5 times) Walk➞ Walking 20 meters (≈12 s) Walk-backward➞ Walking backward for 20 meters (≈20 s) Walk-circle➞ Walking along a circular path (≈ 20 s) Run➞ Running 20 meters (≈7 s) Stair-up➞ Ascending on a set of stairs (≈1 min) Stair-down➞ Descending from a set of stairs (≈50 s) Table-tennis➞ Playing table tennis (1 min)

Contents of the attached .zip files are: 1.Raw_time_domian_data.zip➞ Originally collected 1945 time-domain samples in separate .csv files. The arrangement of information in each .csv file is: Column 1, 5➞ exact time (elapsed since the start) when the Accelerometer & Gyro output was recorded (in ms) Col. 2, 3, 4➞ Acceleration along X,Y,Z axes (in m/s^2) Col. 6, 7, 8➞ Rate of rotation around X,Y,Z axes (in rad/s)

2.Trimmed_interpolated_raw_data.zip➞ Unnecessary parts of the samples were trimmed (only from the beginning and the end). The samples were interpolated to keep a constant sampling rate of 100 Hz. The arrangement of information is the same as above.

3.Time_domain_subsamples.zip➞ 20750 subsamples extracted from the 1945 collected samples provided in a single .csv file. Each of them contains 3 seconds of non-overlapping data of the corresponding activity. Arrangement of information: Col. 1–300, 301–600, 601–900➞ Acc.meter X, Y, Z axes readings Col. 901–1200, 1201–1500, 1501–1800➞ Gyro X, Y, Z axes readings Col. 1801➞ Class ID (0 to 17, in the order mentioned above) Col. 1802➞ length of the each channel data in the subsample Col. 1803➞ serial no. of the subsample

Gravity acceleration was omitted from the Acc.meter data, and no filter was applied to remove noise. The dataset is free to download, modify, and use.

More information is provided in the data paper which is currently under review: N. Sikder, A.-A. Nahid, KU-HAR: An open dataset for heterogeneous human activity recognition, Pattern Recognit. Lett. (submitted).

A preprint will be available soon.

Backup: drive.google.com/drive/folders/1yrG8pwq3XMlyEGYMnM-8xnrd6js0oXA7
d
Data from: Who's downloading pirated papers? Everyone
datadryad.org
search.dataone.org
zip
Updated Apr 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Elbakyan; John Bohannon (2017). Who's downloading pirated papers? Everyone [Dataset]. http://doi.org/10.5061/dryad.q447c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.q447c
Dataset updated
Apr 22, 2017
Dataset provided by
Dryad
Authors
Alexandra Elbakyan; John Bohannon
Time period covered
Apr 21, 2016
Area covered
global, Global
Description
Sci-Hub download dataThese data include 28 million download request events from the server logs of Sci-Hub from 1 September 2015 through 29 February 2016. The uncompressed 2.7 gigabytes of data are separated into 6 data files, one for each month, in tab-delimited text format.scihub_data.zipIPython Notebook for Sci-Hub raw dataIPython Notebook used to process the raw server log data (processing the GIS files into CSV, scraping DOI metadata, etc.).Sci-Hub.htmlSci-Hub.ipynbSci-Hub publisher DOI prefixesData scraped from the CrossRef website which can be used to replicate the analysis of downloads by publisher.publisher_DOI_prefixes.csv
Z
Data from: A dataset to model Levantine landcover and land-use change...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kempf, Michael (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10396147
Explore at:
Dataset updated
Dec 16, 2023
Dataset provided by
University of Basel
Authors
Kempf, Michael
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS. The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection). Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

(9_workflow_diagramme) this simple code can be used to plot a workflow diagram and is detached from the actual analysis.

Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration, and Funding acquisition: Michael
u
Data from: Data on xylem sap proteins from Mn- and Fe-deficient tomato...
agdatacommons.nal.usda.gov
datasets.ai
+3more
bin
Updated Nov 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Ceballos-Laita; Elain Gutierrez-Carbonell; Daisuke Takahashi; Anunciación Abadía; Matsuo Uemura; Javier Abadía; Ana Flor López-Millán (2025). Data from: Data on xylem sap proteins from Mn- and Fe-deficient tomato plants obtained using shotgun proteomics [Dataset]. http://doi.org/10.1016/j.dib.2018.01.034
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1016/j.dib.2018.01.034
Dataset updated
Nov 21, 2025
Dataset provided by
ProteomeXchange
Authors
Laura Ceballos-Laita; Elain Gutierrez-Carbonell; Daisuke Takahashi; Anunciación Abadía; Matsuo Uemura; Javier Abadía; Ana Flor López-Millán
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article contains consolidated proteomic data obtained from xylem sap collected from tomato plants grown in Fe- and Mn-sufficient control, as well as Fe-deficient and Mn-deficient conditions. Data presented here cover proteins identified and quantified by shotgun proteomics and Progenesis LC-MS analyses: proteins identified with at least two peptides and showing changes statistically significant (ANOVA; p ≤ 0.05) and above a biologically relevant selected threshold (fold ≥ 2) between treatments are listed. The comparison between Fe-deficient, Mn-deficient and control xylem sap samples using a multivariate statistical data analysis (Principal Component Analysis, PCA) is also included. Data included in this article are discussed in depth in "Effects of Fe and Mn deficiencies on the protein profiles of tomato (Solanum lycopersicum) xylem sap as revealed by shotgun analyses", Ceballos-Laita et al., J. Proteomics, 2018. This dataset is made available to support the cited study as well to extend analyses at a later stage. Resources in this dataset:Resource Title: ProteomeExchange submission PXD007517. Xylem sap shotgun proteomics from Fe- and Mn-deficient and Mn-toxic tomato plants. . File Name: Web Page, url: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD007517 The MS proteomics data have been deposited to the ProteomeXchange Consortium via the Pride partner repository with the data set identifier PXD007517. Also includes FTP location. Files available at https://www.ebi.ac.uk/pride/archive/projects/PXD007517 via HTML, FTP, or Fast (Aspera) download : 1 SEARCH.xml file, 1 Peak file, 24 RAW files, 1 Mascot information.xlsx file. Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2018.01.034

Facebook

Twitter

Click to copy link

Link copied

Cite

Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1

Raw data outputs 1-18

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.26180/21259491.v1

Dataset updated

May 30, 2023

Dataset provided by

Monash University

Authors

Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

Clear search

Close search

Google apps

Main menu

Raw data outputs 1-18

Scooter Sales - Excel Project

HR Analytics Dataset

Datasets for Sentiment Analysis

Data Cleaning Sample

18 excel spreadsheets by species and year giving reproduction and growth...

Supply Chain DataSet

RAW Data Excel and SPSS

Data from: Raw data files

CSV file used in statistical analyses

Carrier Safety Measurement System (CSMS, or SMS) - Raw Data - Download...

Product Revenue Data - Excel Project

Data from: Raw Data Files

COVID Impact Survey - Public Data

Overview

Queries

Margin of Error

About the Data

Attribution

AP Data Distributions

Bike Buyers - Excel Project

Dataset for Seismic waveform tomography of the Central and Eastern...

make project directory

extract data tarballs into it

make scripts directory and extract scripts into it

make data analysis directory

extract analysis tools

KU-HAR: An Open Dataset for Human Activity Recognition

Data from: Who's downloading pirated papers? Everyone

Data from: A dataset to model Levantine landcover and land-use change...

Data from: Data on xylem sap proteins from Mn- and Fe-deficient tomato...

Raw data outputs 1-18