100+ datasets found

Excel dataset
kaggle.com
zip
Updated Jun 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinky Verma (2023). Excel dataset [Dataset]. https://www.kaggle.com/datasets/pinkyverma0256/excel-dataset
Explore at:
zip(13123 bytes)Available download formats
Dataset updated
Jun 29, 2023
Authors
Pinky Verma
Description
Dataset

This dataset was created by Pinky Verma

Contents
Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
eCommerce Transactions
kaggle.com
zip
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chad Wambles (2025). eCommerce Transactions [Dataset]. https://www.kaggle.com/datasets/chadwambles/ecommerce-transactions
Explore at:
zip(245430 bytes)Available download formats
Dataset updated
Jan 3, 2025
Authors
Chad Wambles
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This data set is perfect for practicing your analytical skills for Power BI, Tableau, Excel, or transform it into a CSV to practice SQL.

This use case mimics transactions for a fictional eCommerce website named EverMart Online. The 3 tables in this data set are all logically connected together with IDs.

My Power BI Use Case Explanation - Using Microsoft Power BI, I made dynamic data visualizations for revenue reporting and customer behavior reporting.

Revenue Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total Sales, Product Sales, or Categorical Sales. - Line Graph Visual that shows Total Revenue by Month of the entire year. This graph also changes to calculate Total Revenue by Month for the Total Sales by Product and Total Sales by Category if selected. - Bar Graph Visual showcasing Total Sales by Product. - Donut Chart Visual showcasing Total Sales by Category of Product.

Customer Behavior Reporting Visuals - Data Card Visual that dynamically shows Total Products Listed, Total Unique Customers, Total Transactions, and Total Revenue by Total or by continent selected on the map. - Interactive Map Visual showing key statistics for the continent selected. - The key statistics are presented on the tool tip when you select a continent, and the following statistics show for that continent: - Continent Name - Customer Total - Percentage of Products Sold - Percentage of Total Customers - Percentage of Total Transactions - Percentage of Total Revenue
18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
🛒 Online Shopping Dataset 📊📉📈
kaggle.com
zip
Updated Nov 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jackson Divakar R (2023). 🛒 Online Shopping Dataset 📊📉📈 [Dataset]. https://www.kaggle.com/datasets/jacksondivakarr/online-shopping-dataset
Explore at:
zip(5404165 bytes)Available download formats
Dataset updated
Nov 12, 2023
Authors
Jackson Divakar R
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset: Online Shopping Dataset;

CustomerID

Description: Unique identifier for each customer. Data Type: Numeric;

Gender:

Description: Gender of the customer (e.g., Male, Female). Data Type: Categorical;

Location:

Description: Location or address information of the customer. Data Type: Text;

Tenure_Months:

Description: Number of months the customer has been associated with the platform. Data Type: Numeric;

Transaction_ID:

Description: Unique identifier for each transaction. Data Type: Numeric;

Transaction_Date:

Description: Date of the transaction. Data Type: Date;

Product_SKU:

Description: Stock Keeping Unit (SKU) identifier for the product. Data Type: Text;

Product_Description:

Description: Description of the product. Data Type: Text;

Product_Category:

Description: Category to which the product belongs. Data Type: Categorical;

Quantity:

Description: Quantity of the product purchased in the transaction. Data Type: Numeric;

Avg_Price:

Description: Average price of the product. Data Type: Numeric;

Delivery_Charges:

Description: Charges associated with the delivery of the product. Data Type: Numeric;

Coupon_Status:

Description: Status of the coupon associated with the transaction. Data Type: Categorical;

GST:

Description: Goods and Services Tax associated with the transaction. Data Type: Numeric;

Date:

Description: Date of the transaction (potentially redundant with Transaction_Date). Data Type: Date;

Offline_Spend:

Description: Amount spent offline by the customer. Data Type: Numeric;

Online_Spend:

Description: Amount spent online by the customer. Data Type: Numeric;

Month:

Description: Month of the transaction. Data Type: Categorical;

Coupon_Code:

Description: Code associated with a coupon, if applicable. Data Type: Text;

Discount_pct:

Description: Percentage of discount applied to the transaction. Data Type: Numeric;
Dataset for numerical analysis
figshare.com
data.mendeley.com
zip
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shi Chen; Dong Chen; Jyh-Horng Lin (2023). Dataset for numerical analysis [Dataset]. http://doi.org/10.6084/m9.figshare.24648945.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24648945.v1
Dataset updated
Nov 28, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Shi Chen; Dong Chen; Jyh-Horng Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.
s
Data from: Fostering cultures of open qualitative research: Dataset 1 –...
orda.shef.ac.uk
docx
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 1 – Survey Responses [Dataset]. http://doi.org/10.15131/shef.data.23567250.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.23567250.v1
Dataset updated
Oct 8, 2025
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.

The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.

This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.

The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.

The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.

The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.

The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.

A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):

· I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.

The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk

Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science
m
Dataset related to online distance learning
data.mendeley.com
Updated Apr 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lee Chaw (2022). Dataset related to online distance learning [Dataset]. http://doi.org/10.17632/9gbr7sjk32.1
Explore at:
Unique identifier
https://doi.org/10.17632/9gbr7sjk32.1
Dataset updated
Apr 19, 2022
Authors
Lee Chaw
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset in Excel spreadsheet accompanying this article consists of 207 rows and 24 columns. Each row represents an individual responses to questionnaire's items.
New 1000 Sales Records Data 2
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description
This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.
d
Data from: Delta Neighborhood Physical Activity Study
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
Data from: Excel Templates: A Helpful Tool for Teaching Statistics
tandf.figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alejandro Quintela-del-Río; Mario Francisco-Fernández (2023). Excel Templates: A Helpful Tool for Teaching Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.3408052.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3408052.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Alejandro Quintela-del-Río; Mario Francisco-Fernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article describes a free, open-source collection of templates for the popular Excel (2013, and later versions) spreadsheet program. These templates are spreadsheet files that allow easy and intuitive learning and the implementation of practical examples concerning descriptive statistics, random variables, confidence intervals, and hypothesis testing. Although they are designed to be used with Excel, they can also be employed with other free spreadsheet programs (changing some particular formulas). Moreover, we exploit some possibilities of the ActiveX controls of the Excel Developer Menu to perform interactive Gaussian density charts. Finally, it is important to note that they can be often embedded in a web page, so it is not necessary to employ Excel software for their use. These templates have been designed as a useful tool to teach basic statistics and to carry out data analysis even when the students are not familiar with Excel. Additionally, they can be used as a complement to other analytical software packages. They aim to assist students in learning statistics, within an intuitive working environment. Supplementary materials with the Excel templates are available online.
Z
Dairy Supply Chain Sales Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dimitris Iatropoulos; Konstantinos Georgakidis; Ilias Siniosoglou; Christos Chaschatzis; Anna Triantafyllou; Athanasios Liatifis; Dimitrios Pliatsios; Thomas Lagkas; Vasileios Argyriou; Panagiotis Sarigiannidis (2024). Dairy Supply Chain Sales Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853252
Explore at:
Dataset updated
Jul 12, 2024
Authors
Dimitris Iatropoulos; Konstantinos Georgakidis; Ilias Siniosoglou; Christos Chaschatzis; Anna Triantafyllou; Athanasios Liatifis; Dimitrios Pliatsios; Thomas Lagkas; Vasileios Argyriou; Panagiotis Sarigiannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1.Introduction

Sales data collection is a crucial aspect of any manufacturing industry as it provides valuable insights about the performance of products, customer behaviour, and market trends. By gathering and analysing this data, manufacturers can make informed decisions about product development, pricing, and marketing strategies in Internet of Things (IoT) business environments like the dairy supply chain.

One of the most important benefits of the sales data collection process is that it allows manufacturers to identify their most successful products and target their efforts towards those areas. For example, if a manufacturer could notice that a particular product is selling well in a certain region, this information could be utilised to develop new products, optimise the supply chain or improve existing ones to meet the changing needs of customers.

This dataset includes information about 7 of MEVGAL’s products [1]. According to the above information the data published will help researchers to understand the dynamics of the dairy market and its consumption patterns, which is creating the fertile ground for synergies between academia and industry and eventually help the industry in making informed decisions regarding product development, pricing and market strategies in the IoT playground. The use of this dataset could also aim to understand the impact of various external factors on the dairy market such as the economic, environmental, and technological factors. It could help in understanding the current state of the dairy industry and identifying potential opportunities for growth and development.

Citation

Please cite the following papers when using this dataset:

I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

Dataset Modalities

The dataset includes data regarding the daily sales of a series of dairy product codes offered by MEVGAL. In particular, the dataset includes information gathered by the logistics division and agencies within the industrial infrastructures overseeing the production of each product code. The products included in this dataset represent the daily sales and logistics of a variety of yogurt-based stock. Each of the different files include the logistics for that product on a daily basis for three years, from 2020 to 2022.

3.1 Data Collection

The process of building this dataset involves several steps to ensure that the data is accurate, comprehensive and relevant.

The first step is to determine the specific data that is needed to support the business objectives of the industry, i.e., in this publication’s case the daily sales data.

Once the data requirements have been identified, the next step is to implement an effective sales data collection method. In MEVGAL’s case this is conducted through direct communication and reports generated each day by representatives & selling points.

It is also important for MEVGAL to ensure that the data collection process conducted is in an ethical and compliant manner, adhering to data privacy laws and regulation. The industry also has a data management plan in place to ensure that the data is securely stored and protected from unauthorised access.

The published dataset is consisted of 13 features providing information about the date and the number of products that have been sold. Finally, the dataset was anonymised in consideration to the privacy requirement of the data owner (MEVGAL).

File

Period

Number of Samples (days)

product 1 2020.xlsx

01/01/2020–31/12/2020

363

product 1 2021.xlsx

01/01/2021–31/12/2021

364

product 1 2022.xlsx

01/01/2022–31/12/2022

365

product 2 2020.xlsx

01/01/2020–31/12/2020

363

product 2 2021.xlsx

01/01/2021–31/12/2021

364

product 2 2022.xlsx

01/01/2022–31/12/2022

365

product 3 2020.xlsx

01/01/2020–31/12/2020

363

product 3 2021.xlsx

01/01/2021–31/12/2021

364

product 3 2022.xlsx

01/01/2022–31/12/2022

365

product 4 2020.xlsx

01/01/2020–31/12/2020

363

product 4 2021.xlsx

01/01/2021–31/12/2021

364

product 4 2022.xlsx

01/01/2022–31/12/2022

364

product 5 2020.xlsx

01/01/2020–31/12/2020

363

product 5 2021.xlsx

01/01/2021–31/12/2021

364

product 5 2022.xlsx

01/01/2022–31/12/2022

365

product 6 2020.xlsx

01/01/2020–31/12/2020

362

product 6 2021.xlsx

01/01/2021–31/12/2021

364

product 6 2022.xlsx

01/01/2022–31/12/2022

365

product 7 2020.xlsx

01/01/2020–31/12/2020

362

product 7 2021.xlsx

01/01/2021–31/12/2021

364

product 7 2022.xlsx

01/01/2022–31/12/2022

365

3.2 Dataset Overview

The following table enumerates and explains the features included across all of the included files.

Feature

Description

Unit

Day

day of the month

-

Month

Month

-

Year

Year

-

daily_unit_sales

Daily sales - the amount of products, measured in units, that during that specific day were sold

units

previous_year_daily_unit_sales

Previous Year’s sales - the amount of products, measured in units, that during that specific day were sold the previous year

units

percentage_difference_daily_unit_sales

The percentage difference between the two above values

%

daily_unit_sales_kg

The amount of products, measured in kilograms, that during that specific day were sold

kg

previous_year_daily_unit_sales_kg

Previous Year’s sales - the amount of products, measured in kilograms, that during that specific day were sold, the previous year

kg

percentage_difference_daily_unit_sales_kg

The percentage difference between the two above values

kg

daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned

%

previous_year_daily_unit_returns_kg

The percentage of the products that were shipped to selling points and were returned the previous year

%

points_of_distribution

The amount of sales representatives through which the product was sold to the market for this year

previous_year_points_of_distribution

The amount of sales representatives through which the product was sold to the market for the same day for the previous year

Table 1 – Dataset Feature Description

Structure and Format

4.1 Dataset Structure

The provided dataset has the following structure:

Where:

Name

Type

Property

Readme.docx

Report

A File that contains the documentation of the Dataset.

product X

Folder

A folder containing the data of a product X.

product X YYYY.xlsx

Data file

An excel file containing the sales data of product X for year YYYY.

Table 2 - Dataset File Description

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

References

[1] MEVGAL is a Greek dairy production company
Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
zenodo.org
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
University of Tartu
University of the Aegean
Gdańsk University of Technology
University of Zagreb
Authors
Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
SQuAD v2.0 Dataset (complete excel file)
kaggle.com
Updated Sep 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Pragya Katyayan (2023). SQuAD v2.0 Dataset (complete excel file) [Dataset]. https://www.kaggle.com/datasets/pragyakatyayan/squad-v20-dataset-complete-excel-file
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dr. Pragya Katyayan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
SQuAD_v2.0_Dataset

Stanford Question-Answering Dataset (v2.0) - parsed completely in MS-excel file.

The SQuAD 2.0 dataset is available online in JSON format Link There are several csv files for SQuAD 2.0 available for download on Kaggle etc. but none of them have all the attributes related to a question (as given in json files). Also, most of the datasets tend to ignore the unanswerable questions of SQuAD 2.0. The parsing of JSON files can be a task for beginners, so I have taken the opportunity of uploading the completely parsed SQuAD 2.0 dataset in excel files (both training and development sets).

The Training data has following columns:

Title

Context

Question

Id

Answer

Answer start

Plausible Answer

Plausible Answer Start

Is_impossible

The Development set has following columns:

Title

Context

Question

Id

Answer

Answer start

Plausible Answer

Plausible Answer Start

Is_impossible

The SQuAD 2.0 dataset has about 50,000 questions that are unanswerable and for such questions, the training and development dataset has plausible answer and plausible answer start options.

Train_set details: Total: 130319 unique questions Unanswerable questions: 43498 Answerable: 86821

Dev_set details: Total: 11873 unique questions Unanswerable questions: 5945 Answerable questions: 5928 15 unanswerable questions have no plausible answers given in the dataset.

Feel free to use the dataset for R&D purposes! Thank you!

Important Note: This dataset is the original property of Rajpurkar et al. (2018) and I haven't made any new changes in it so don't forget to cite Rajpurkar et al. (2018) when using this dataset.

Cite: Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
s
Data from: Fostering cultures of open qualitative research: Dataset 2 –...
orda.shef.ac.uk
xlsx
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts [Dataset]. http://doi.org/10.15131/shef.data.23567223.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.23567223.v2
Dataset updated
Oct 8, 2025
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:

· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.

All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.

Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.

For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.

· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.

The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.

The project was undertaken by two staff:

Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset

Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
m
Abbreviated FOMO and social media dataset
figshare.mq.edu.au
researchdata.edu.au
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danielle Einstein; Carol Dabb; Madeleine Ferrari; Anne McMaugh; Peter McEvoy; Ron Rapee; Eyal Karin; Maree J. Abbott (2023). Abbreviated FOMO and social media dataset [Dataset]. http://doi.org/10.25949/20188298.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25949/20188298.v1
Dataset updated
May 30, 2023
Dataset provided by
Macquarie University
Authors
Danielle Einstein; Carol Dabb; Madeleine Ferrari; Anne McMaugh; Peter McEvoy; Ron Rapee; Eyal Karin; Maree J. Abbott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database is comprised of 951 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 509 males (54%) and 442 females (46%). Their ages ranged from 12 to 16 years (M = 13.69, SD = 0.72). Seven participants did not report their age. The majority were born in Australia (N = 849, 89%). The next most common countries of birth were China (N = 24, 2.5%), the UK (N = 23, 2.4%), and the USA (N = 9, 0.9%). Data were drawn from students at five Australian independent secondary schools. The data contains item responses for the Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. The Social media question asked about frequency of use with the question “How often do you use social media?”. The response options ranged from constantly to once a week or less. Items measuring Fear of Missing Out were included and incorporated the following five questions based on the APS Stress and Wellbeing in Australia Survey (APS, 2015). These were “When I have a good time it is important for me to share the details online; I am afraid that I will miss out on something if I don’t stay connected to my online social networks; I feel worried and uncomfortable when I can’t access my social media accounts; I find it difficult to relax or sleep after spending time on social networking sites; I feel my brain burnout with the constant connectivity of social media. Internal consistency for this measure was α = .81. Self compassion was measured using the 12-item short-form of the Self-Compassion Scale (SCS-SF; Raes et al., 2011). The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels. References: Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4 Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. https://doi.org/10.1002/cpp.702 Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. https://doi.org/10.1016/S0005-7967(98)00034-5
Escape Excel: A tool for preventing gene symbol and accession conversion...
plos.figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric A. Welsh; Paul A. Stewart; Brent M. Kuenzi; James A. Eschrich (2023). Escape Excel: A tool for preventing gene symbol and accession conversion errors [Dataset]. http://doi.org/10.1371/journal.pone.0185207
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0185207
Dataset updated
Jun 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Eric A. Welsh; Paul A. Stewart; Brent M. Kuenzi; James A. Eschrich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundMicrosoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.ResultsHere, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/).ConclusionsEscape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
f
Repeated Measures data files
auckland.figshare.com
zip
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gavin T. L. Brown (2020). Repeated Measures data files [Dataset]. http://doi.org/10.17608/k6.auckland.13211120.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.17608/k6.auckland.13211120.v1
Dataset updated
Nov 9, 2020
Dataset provided by
The University of Auckland
Authors
Gavin T. L. Brown
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This zip file contains data files for 3 activities described in the accompanying PPT slides 1. an excel spreadsheet for analysing gain scores in a 2 group, 2 times data array. this activity requires access to –https://campbellcollaboration.org/research-resources/effect-size-calculator.html to calculate effect size.2. an AMOS path model and SPSS data set for an autoregressive, bivariate path model with cross-lagging. This activity is related to the following article: Brown, G. T. L., & Marshall, J. C. (2012). The impact of training students how to write introductions for academic essays: An exploratory, longitudinal study. Assessment & Evaluation in Higher Education, 37(6), 653-670. doi:10.1080/02602938.2011.5632773. an AMOS latent curve model and SPSS data set for a 3-time latent factor model with an interaction mixed model that uses GPA as a predictor of the LCM start and slope or change factors. This activity makes use of data reported previously and a published data analysis case: Peterson, E. R., Brown, G. T. L., & Jun, M. C. (2015). Achievement emotions in higher education: A diary study exploring emotions across an assessment event. Contemporary Educational Psychology, 42, 82-96. doi:10.1016/j.cedpsych.2015.05.002andBrown, G. T. L., & Peterson, E. R. (2018). Evaluating repeated diary study responses: Latent curve modeling. In SAGE Research Methods Cases Part 2. Retrieved from http://methods.sagepub.com/case/evaluating-repeated-diary-study-responses-latent-curve-modeling doi:10.4135/9781526431592