24 datasets found

Graph Input Data Example.xlsx
figshare.com
xlsx
Updated Dec 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7506209.v1
Dataset updated
Dec 26, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Dr Corynen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
m
An Extensive Dataset for the Heart Disease Classification System
data.mendeley.com
Updated Feb 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1
Explore at:
Unique identifier
https://doi.org/10.17632/65gxgy2nmg.1
Dataset updated
Feb 15, 2022
Authors
Sozan S. Maghdid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
18 excel spreadsheets by species and year giving reproduction and growth...
catalog.data.gov
data.wu.ac.at
Updated Aug 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Sample Student Data
figshare.com
xls
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carrie Ellis (2022). Sample Student Data [Dataset]. http://doi.org/10.6084/m9.figshare.20419434.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20419434.v1
Dataset updated
Aug 2, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Carrie Ellis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In "Sample Student Data", there are 6 sheets. There are three sheets with sample datasets, one for each of the three different exercise protocols described (CrP Sample Dataset, Glycolytic Dataset, Oxidative Dataset). Additionally, there are three sheets with sample graphs created using one of the three datasets (CrP Sample Graph, Glycolytic Graph, Oxidative Graph). Each dataset and graph pairs are from different subjects. · CrP Sample Dataset and CrP Sample Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the creatine phosphate system. Here, the subject was a track and field athlete who threw the shot put for the DeSales University track team. The NIRS monitor was placed on the right triceps muscle, and the student threw the shot put six times with a minute rest in between throws. Data was collected telemetrically by the NIRS device and then downloaded after the student had completed the protocol. · Glycolytic Dataset and Glycolytic Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the glycolytic energy system. In this example, the subject performed continuous squat jumps for 30 seconds, followed by a 90 second rest period, for a total of three exercise bouts. The NIRS monitor was place on the left gastrocnemius muscle. Here again, data was collected telemetrically by the NIRS device and then downloaded after he had completed the protocol. · Oxidative Dataset and Oxidative Graph: In this example, the dataset and graph are from an exercise protocol designed to stress the oxidative system. Here, the student held a sustained, light-intensity, isometric biceps contraction (pushing against a table). The NIRS monitor was attached to the left biceps muscle belly. Here, data was collected by a student observing the SmO2 values displayed on a secondary device; specifically, a smartphone with the IPSensorMan APP displaying data. The recorder student observed and recorded the data on an Excel Spreadsheet, and marked the times that exercise began and ended on the Spreadsheet.
s
Analysis of CBCS publications for Open Access, data availability statements...
figshare.scilifelab.se
researchdata.se
txt
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theresa Kieselbach (2025). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/scilifelab.23641749.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.23641749.v1
Dataset updated
Jan 15, 2025
Dataset provided by
Umeå University
Authors
Theresa Kieselbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)
s
In-Air Hand-Drawn Number and Shape Dataset
orda.shef.ac.uk
zip
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Basheer Alwaely; Charith Abhayaratne (2025). In-Air Hand-Drawn Number and Shape Dataset [Dataset]. http://doi.org/10.15131/shef.data.7381472.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.7381472.v2
Dataset updated
Jul 14, 2025
Dataset provided by
The University of Sheffield
Authors
Basheer Alwaely; Charith Abhayaratne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains in-air hand-written numbers and shapes data used in the paper:B. Alwaely and C. Abhayaratne, "Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition," in IEEE Access, vol. 7, pp. 159661-159673, 2019, doi: 10.1109/ACCESS.2019.2950643.The dataset contains the following:-Readme.txt- InAirNumberShapeDataset.zip containing-Number Folder (With 2 sub folders for Matlab and Excel)-Shapes Folder (With 2 sub folders for Matlab and Excel)The datasets include the in-air drawn number and shape hand movement path captured by a Kinect sensor. The number sub dataset includes 500 instances per each number 0 to 9, resulting in a total of 5000 number data instances. Similarly, the shape sub dataset also includes 500 instances per each shape for 10 different arbitrary 2D shapes, resulting in a total of 5000 shape instances. The dataset provides X, Y, Z coordinates of the hand movement path data in Matlab (M-file) and Excel formats and their corresponding labels.This dataset creation has received The University of Sheffield ethics approval under application #023005 granted on 19/10/2018.
B
Field Variable Permeability Tests (Slug Tests) in Boreholes Made by Driven...
borealisdata.ca
search.dataone.org
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert P. Chapuis (2024). Field Variable Permeability Tests (Slug Tests) in Boreholes Made by Driven Flush-Joint Casings, or Driven Flush-Joint Casing Permeameters, or Between Packers in Cored Rock Boreholes, or in Monitoring Wells ― Overdamped Response / Essais de perméabilité à niveau variable (Slug Tests) dans des forages faits avec un tubage battu à joints lisses, ou un perméamètre battu à joints lisses, ou entre des obturateurs dans un trou foré dans le roc, ou dans un puits de surveillance ― Cas de la réponse suramortie [Dataset]. http://doi.org/10.5683/SP2/YUAUGX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/YUAUGX
Dataset updated
Oct 29, 2024
Dataset provided by
Borealis
Authors
Robert P. Chapuis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Civil and geological engineers have used field variable-head permeability tests (VH tests or slug tests) for over one century to assess the local hydraulic conductivity of tested soils and rocks. The water level in the pipe or riser casing reaches, after some rest time, a static position or elevation, z2. Then, the water level position is changed rapidly, by adding or removing some water volume, or by inserting or removing a solid slug. Afterward, the water level position or elevation z1(t) is recorded vs. time t, yielding a difference in hydraulic head or water column defined as Z(t) = z1(t) - z2. The water level at rest is assumed to be the piezometric level or PL for the tested zone, before drilling a hole and installing test equipment. All equations use Z(t) or Z*(t) = Z(t) / Z(t=0). The water-level response vs. time may be a slow return to equilibrium (overdamped test), or an oscillation back to equilibrium (underdamped test). This document deals exclusively with overdamped tests. Their data may be analyzed using several methods, known to yield different results for the hydraulic conductivity. The methods fit in three groups: group 1 neglects the influence of the solid matrix strain, group 2 is for tests in aquitards with delayed strain caused by consolidation, and group 3 takes into account some elastic and instant solid matrix strain. This document briefly explains what is wrong with certain theories and why. It shows three ways to plot the data, which are the three diagnostic graphs. According to experience with thousands of tests, most test data are biased by an incorrect estimate z2 of the piezometric level at rest. The derivative or velocity plot does not depend upon this assumed piezometric level, but can verify its correctness. The document presents experimental results and explains the three-diagnostic graphs approach, which unifies the theories and, most important, yields a user-independent result. Two free spreadsheet files are provided. The spreadsheet "Lefranc-Test-English-Model" follows the Canadian standards and is used to explain how to treat correctly the test data to reach a user-independent result. The user does not modify this model spreadsheet but can make as many copies as needed, with different names. The user can treat any other data set in a copy, and can also modify any copy if needed. The second Excel spreadsheet contains several sets of data that can be used to practice with the copies of the model spreadsheet. En génie civil et géologique, on a utilisé depuis plus d'un siècle les essais in situ de perméabilité à niveau variable (essais VH ou slug tests), afin d'évaluer la conductivité hydraulique locale des sols et rocs testés. Le niveau d'eau dans le tuyau ou le tubage prend, après une période de repos, une position ou élévation statique, z2. Ensuite, on modifie rapidement la position du niveau d'eau, en ajoutant ou en enlevant rapi-dement un volume d'eau, ou en insérant ou retirant un objet solide. La position ou l'élévation du niveau d'eau, z1(t), est alors notée en fonction du temps, t, ce qui donne une différence de charge hydraulique définie par Z(t) = z1(t) - z2. Le niveau d'eau au repos est supposé être le niveau piézométrique pour la zone testée, avant de forer un trou et d'installer l'équipement pour un essai. Toutes les équations utilisent Z(t) ou Z*(t) = Z(t) / Z(t=0). La réponse du niveau d'eau avec le temps peut être soit un lent retour à l'équilibre (cas suramorti) soit une oscillation amortie retournant à l'équilibre (cas sous-amorti). Ce document ne traite que des cas suramortis. Leurs données peuvent être analysées à l'aide de plusieurs méthodes, connues pour donner des résultats différents pour la conductivité hydraulique. Les méthodes appartiennent à trois groupes : le groupe 1 néglige l'influence de la déformation de la matrice solide, le groupe 2 est pour les essais dans des aquitards avec une déformation différée causée par la consolidation, et le groupe 3 prend en compte une certaine déformation élastique et instantanée de la matrice solide. Ce document explique brièvement ce qui est incorrect dans les théories et pourquoi. Il montre trois façons de tracer les données, qui sont les trois graphiques de diagnostic. Selon l'expérience de milliers d'essais, la plupart des données sont biaisées par un estimé incorrect de z2, le niveau piézométrique supposé. Le graphe de la dérivée ou graphe des vitesses ne dépend pas de la valeur supposée pour le niveau piézomé-trique, mais peut vérifier son exactitude. Le document présente des résultats expérimentaux et explique le diagnostic à trois graphiques, qui unifie les théories et donne un résultat indépendant de l'utilisateur, ce qui est important. Deux fichiers Excel gratuits sont fournis. Le fichier"Lefranc-Test-English-Model" suit les normes canadiennes : il sert à expliquer comment traiter correctement les données d'essai pour avoir un résultat indépendant de l'utilisateur. Celui-ci ne modifie pas ce...
c
Niagara Open Data
catalog.civicdataecosystem.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niagara Open Data [Dataset]. https://catalog.civicdataecosystem.org/dataset/niagara-open-data
Explore at:
Description
The Ontario government, generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Open Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular organization, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. Read more about previewing data. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Have a look at our walk-through of how to make a chart in the catalogue. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or applications (including HTML, XML, Excel, etc.). The database file can be converted to a CSV/TXT to make the data machine-readable, but human-readable formatting will be lost. How to access the data: Open with Microsoft Office Access (a database management system used to develop application software). A file that keeps the original layout and
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Z
Data set published in the IEEE TCAD article "Custom Multi-Cache...
data.niaid.nih.gov
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Winterstein, Felix (2024). Data set published in the IEEE TCAD article "Custom Multi-Cache Architectures for Heap-Manipulating Programs" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_61614
Explore at:
Dataset updated
Aug 4, 2024
Dataset authored and provided by
Winterstein, Felix
License
https://opensource.org/licenses/BSD-3-Clausehttps://opensource.org/licenses/BSD-3-Clause
Description
This data set contains the results presented in the paper "Custom Multi-Cache Architectures for Heap-Manipulating Programs", published in the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) in 2016.

The data set consists of two parts, a Microsoft Excel file ('FPGA_implementation_results.xlsx') and a Matlab script ('plot_cache_performance.m', in combination with measurement results in an ascii file).

The Excel file contains - the FPGA resource utilisation, - execution time measurements, - hit rate measurement of the multi-cache system, - and power measurements

of different FPGA designs with different on-chip cache configurations. The resource utilisation is split into FPGA slices, LUTs, FlipFlops, DSP slices and block RAMs. Results in this file can be found in Table I-IV in the paper. Please refer to the paper for more information or email f.winterstein12@imperial.ac.uk.

The Matlab script loads a data file ('cache_performance_N16384_L1') containing the hit rate measurements for different cache sizes of two direct-mapped cache with 64bit line width. The script produces a 3D 'skyscraper' plot, i.e. a grid of coloured bars. Each bar corresponds to the hit rate measured at the particular cache size configuration. The plot is saved in the file 'surf.pdf'. The script was used to produce Figure 4 of the paper. Please refer to the paper for more information or email f.winterstein12@imperial.ac.uk.

In addition to this description, we include an author copy of the paper. Note that this is not the official version of the paper. Please cite the original IEEE TCAD article if you use the data.
r
Respiration_chambers/raw_log_files and combined datasets of biomass and...
researchdata.edu.au
data.aad.gov.au
+1more
Updated Dec 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLACK, JAMES GEOFFREY; Black, J.G.; BLACK, JAMES GEOFFREY (2018). Respiration_chambers/raw_log_files and combined datasets of biomass and chamber data, and physical parameters [Dataset]. https://researchdata.edu.au/respirationchambersrawlogfiles-combined-datasets-physical-parameters/2817678
Explore at:
Dataset updated
Dec 3, 2018
Dataset provided by
Australian Ocean Data Network
Australian Antarctic Data Centre
Authors
BLACK, JAMES GEOFFREY; Black, J.G.; BLACK, JAMES GEOFFREY
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 27, 2015 - Feb 23, 2015
Area covered

Description
General overview
The following datasets are described by this metadata record, and are available for download from the provided URL.

- Raw log files, physical parameters raw log files
- Raw excel files, respiration/PAM chamber raw excel spreadsheets
- Processed and cleaned excel files, respiration chamber biomass data
- Raw rapid light curve excel files (this is duplicated from Raw log files), combined dataset pH, temperature, oxygen, salinity, velocity for experiment
- Associated R script file for pump cycles of respirations chambers

####

Physical parameters raw log files

Raw log files
1) DATE=
2) Time= UTC+11
3) PROG=Automated program to control sensors and collect data
4) BAT=Amount of battery remaining
5) STEP=check aquation manual
6) SPIES=check aquation manual
7) PAR=Photoactive radiation
8) Levels=check aquation manual
9) Pumps= program for pumps
10) WQM=check aquation manual

####

Respiration/PAM chamber raw excel spreadsheets

Abbreviations in headers of datasets
Note: Two data sets are provided in different formats. Raw and cleaned (adj). These are the same data with the PAR column moved over to PAR.all for analysis. All headers are the same. The cleaned (adj) dataframe will work with the R syntax below, alternative add code to do cleaning in R.

Date: ISO 1986 - Check
Time:UTC+11 unless otherwise stated
DATETIME: UTC+11 unless otherwise stated
ID (of instrument in respiration chambers)
ID43=Pulse amplitude fluoresence measurement of control
ID44=Pulse amplitude fluoresence measurement of acidified chamber
ID=1 Dissolved oxygen
ID=2 Dissolved oxygen
ID3= PAR
ID4= PAR
PAR=Photo active radiation umols
F0=minimal florescence from PAM
Fm=Maximum fluorescence from PAM
Yield=(F0 – Fm)/Fm
rChl=an estimate of chlorophyll (Note this is uncalibrated and is an estimate only)
Temp=Temperature degrees C
PAR=Photo active radiation
PAR2= Photo active radiation2
DO=Dissolved oxygen
%Sat= Saturation of dissolved oxygen
Notes=This is the program of the underwater submersible logger with the following abreviations:
Notes-1) PAM=
Notes-2) PAM=Gain level set (see aquation manual for more detail)
Notes-3) Acclimatisation= Program of slowly introducing treatment water into chamber
Notes-4) Shutter start up 2 sensors+sample…= Shutter PAMs automatic set up procedure (see aquation manual)
Notes-5) Yield step 2=PAM yield measurement and calculation of control
Notes-6) Yield step 5= PAM yield measurement and calculation of acidified
Notes-7) Abatus respiration DO and PAR step 1= Program to measure dissolved oxygen and PAR (see aquation manual). Steps 1-4 are different stages of this program including pump cycles, DO and PAR measurements.

8) Rapid light curve data
Pre LC: A yield measurement prior to the following measurement
After 10.0 sec at 0.5% to 8%: Level of each of the 8 steps of the rapid light curve
Odessey PAR (only in some deployments): An extra measure of PAR (umols) using an Odessey data logger
Dataflow PAR: An extra measure of PAR (umols) using a Dataflow sensor.
PAM PAR: This is copied from the PAR or PAR2 column
PAR all: This is the complete PAR file and should be used
Deployment: Identifying which deployment the data came from

####

Respiration chamber biomass data

The data is chlorophyll a biomass from cores from the respiration chambers. The headers are: Depth (mm) Treat (Acidified or control) Chl a (pigment and indicator of biomass) Core (5 cores were collected from each chamber, three were analysed for chl a), these are psudoreplicates/subsamples from the chambers and should not be treated as replicates.

####

Associated R script file for pump cycles of respirations chambers

Associated respiration chamber data to determine the times when respiration chamber pumps delivered treatment water to chambers. Determined from Aquation log files (see associated files). Use the chamber cut times to determine net production rates. Note: Users need to avoid the times when the respiration chambers are delivering water as this will give incorrect results. The headers that get used in the attached/associated R file are start regression and end regression. The remaining headers are not used unless called for in the associated R script. The last columns of these datasets (intercept, ElapsedTimeMincoef) are determined from the linear regressions described below.

To determine the rate of change of net production, coefficients of the regression of oxygen consumption in discrete 180 minute data blocks were determined. R squared values for fitted regressions of these coefficients were consistently high (greater than 0.9). We make two assumptions with calculation of net production rates: the first is that heterotrophic community members do not change their metabolism under OA; and the second is that the heterotrophic communities are similar between treatments.

####

Combined dataset pH, temperature, oxygen, salinity, velocity for experiment

This data is rapid light curve data generated from a Shutter PAM fluorimeter. There are eight steps in each rapid light curve. Note: The software component of the Shutter PAM fluorimeter for sensor 44 appeared to be damaged and would not cycle through the PAR cycles. Therefore the rapid light curves and recovery curves should only be used for the control chambers (sensor ID43).

The headers are
PAR: Photoactive radiation
relETR: F0/Fm x PAR
Notes: Stage/step of light curve
Treatment: Acidified or control

The associated light treatments in each stage. Each actinic light intensity is held for 10 seconds, then a saturating pulse is taken (see PAM methods).

After 10.0 sec at 0.5% = 1 umols PAR
After 10.0 sec at 0.7% = 1 umols PAR
After 10.0 sec at 1.1% = 0.96 umols PAR
After 10.0 sec at 1.6% = 4.32 umols PAR
After 10.0 sec at 2.4% = 4.32 umols PAR
After 10.0 sec at 3.6% = 8.31 umols PAR
After 10.0 sec at 5.3% =15.78 umols PAR
After 10.0 sec at 8.0% = 25.75 umols PAR

This dataset appears to be missing data, note D5 rows potentially not useable information

See the word document in the download file for more information.
4
Data underlying the publication: Inter-Spheroid proximity and matrix...
data.4tu.nl
zip
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav Mehta; Ankur Deep Bordoloi; Cor Ravensbergen; Kristen David; Wilma Mesker; Gerrit Jan Liefers; Peter ten Dijke; Pouyan Boukany (2025). Data underlying the publication: Inter-Spheroid proximity and matrix remodelling determine CAF mediated cancer cell invasion [Dataset]. http://doi.org/10.4121/c8025882-83fb-4d4f-bf20-80789cb5e776.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/c8025882-83fb-4d4f-bf20-80789cb5e776.v1
Dataset updated
Jun 24, 2025
Dataset provided by
4TU.ResearchData
Authors
Pranav Mehta; Ankur Deep Bordoloi; Cor Ravensbergen; Kristen David; Wilma Mesker; Gerrit Jan Liefers; Peter ten Dijke; Pouyan Boukany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2023 - 2025
Dataset funded by
ZonMW
Description
README
Data Underlying: Inter-spheroid Proximity and Matrix Remodelling Determines CAF-mediated Cancer Cell Invasion
Status: Manuscript submitted and currently under review

Overview
This dataset supports the manuscript titled “Inter-spheroid Proximity and Matrix Remodelling Determines CAF-mediated Cancer Cell Invasion”. This work investigates the interaction between Cancer Associated Fibroblast (CAFs) and Luminal A breast cancer cell (MCF7s) spheroids in collagen matrices, with the objective of elucidating the interaction mechanisms that enable CAFs to promote cancer cell invasion. Timelapse imaging and analysis reveals a systematic interaction pattern between the two cell types. This dataset includes raw and processed data from 3D spheroid invasion assays of MCF7 breast cancer cell spheroids (mCherry-labelled), 19TT Cancer-Associated Fibroblast (CAF) spheroids (H2B-GFP-labeled), and heterospheroids consisting of a 1:1 ratio of MCF7 (mCherry-labelled) and 19TT Cancer-Associated Fibroblast (CAF) cells (H2B-GFP-labeled). All experiments were conducted in 3 mg/mL type 1 Collagen matrices. For invasion assays, experiments were run for a duration of 48h (Figs 2, 5, and 6) and 24h for MMP inhibition experiments (Figs 3 and 4). Imaging was performed using time-lapse fluorescence microscopy with reflection imaging to visualize collagen structure.

Folder Structure
Main folders are named Data_Fig1, Data_Fig2 through Data_Fig6. Within these folders, subfolders include Datasets, Fig Images, Graphs, and Movies. The Datasets folder contains the raw files used to analyse and generate images and graphs. Datasets used in Fig 1 (IF_pCK_aSMA and PSR_FL_slide_scans) can only be opened using the QuPath software. Datasets for the rest of the figures can be opened using FIJI/ImageJ2. Dataset folders are further divided into sub-folders by condition (e.g. Control, BB94). The Fig Images folder contains images used in manuscript figures. The Graphs folder contains vector image files of graphs (.pdf), Excel files (.xlsx) with data values, and Prism Graphpad files (.pzfx). The Movies folder contains video files (.avi) for datasets used in figures.
Sample Analysis contains subfolders with sample analysis files and intermediates. Each type of analysis is shown once using data from a specific figure. For example, the Fig2C_Expansion_Analysis folder contains two datasets, intermediates, and results for the dArea graphs in Fig2C.
Figure Files includes the graphical abstract, main text figures, and supporting submission figures in vector format (.pdf) and Keynote format (.key).
Submission Videos contains all movies linked to the manuscript, organized by figure.
Scripts contains all MATLAB codes used to analyse and quantify data. These include scripts for collagen fibre orientation analysis (colfibre_orientation.m, colfibre_mcf.m), CAF invasion index quantification (invasive_index.m), and collagen void fraction and pixel intensity measurement (Pxq_VF.m). To use these scripts, extract individual channels from composite tiff images and save them as image sequences via ImageJ. Scripts are modular, annotated, and require MATLAB’s Image Processing Toolbox. Some use custom fibre analysis functions. Custom functions saved in this folder include CAF_center.m, Centroid_3d.m, Circfit.m, Histwc.m, Imcrop.m, and PDF_function3.m. Supporting Information contains datasets and figure files used for supporting figures. The folder structure mirrors the main data folders. FigS2_CAF02-03-06 contains .mrxs files for CAF02, CAF03, and CAF06 tumors. Graphs are saved as Excel (.xlsx) and Prism Graphpad (.pzfx) files.
Text contains various manuscript-related documents including the main manuscript (.docx), figure captions (.docx), electronic supporting information (.docx), declaration of competing interests (.docx), statement of significance (.docx), and supporting information text (.docx).
Filetypes included in the dataset are text files (.docx), Excel spreadsheets (.xlsx) containing quantified data, image files (.tiff) of raw datasets and figures, MATLAB scripts (.m), native graph files (.pzfx) from Prism, slide scanner image files (.mrxs), Keynote files (.key), and high-resolution vector images (.pdf).
d
Data from: Wind Turbine / Reviewed Data
catalog.data.gov
data.openei.org
+1more
Updated Apr 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wind Energy Technologies Office (WETO) (2022). Wind Turbine / Reviewed Data [Dataset]. https://catalog.data.gov/dataset/snl-sonic-convective-ttu
Explore at:
Dataset updated
Apr 26, 2022
Dataset provided by
Wind Energy Technologies Office (WETO)
Description
Overview The SUMR-D CART2 turbine data are recorded by the CART2 wind turbine's supervisory control and data acquisition (SCADA) system for the Advanced Research Projects Agency–Energy (ARPA-E) SUMR-D project located at the National Renewable Energy Laboratory (NREL) Flatirons Campus. For the project, the CART2 wind turbine was outfitted with a highly flexible rotor specifically designed and constructed for the project. More details about the project can be found here: https://sumrwind.com/. The data include power, loads, and meteorological information from the turbine during startup, operation, and shutdown, and when it was parked and idle. Data Details Additional files are attached: sumr_d_5-Min_Database.mat - a database file in MATLAB format of this dataset, which can be used to search for desired data files; sumr_d_5-Min_Database.xlsx - a database file in Microsoft Excel format of this dataset, which can be used to search for desired data files; loadcartU.m - this script loads in a CART data file and puts it in your workspace as a Matlab matrix (you can call this script from your own Matlab scripts to do your own analysis); charts.mat - this is a dependency file needed for the other scripts (it allows you to make custom preselections for cartPlotU.m); cartLoadHdrU.m - this script loads in the header file information for the data file (the header is embedded in each data file at the beginning); cartPlotU.m - this is a graphic user interface (GUI) that allows you to interactively look at different channels (to use it, run the script in Matlab, and load in the data file(s) of interest; from there, you can select different channels and plot things against each other; note that this script has issues with later versions of MATLAB; the preferred version to use is R2011b). Data Quality Wind turbine blade loading data were calibrated using blade gravity calibrations prior to data collection and throughout the data collection period. Blade loading was also checked for data quality following data collection as strain gauge measurements drifted throughout the data collection. These drifts in the strain gauge measurements were removed in post processing.
d
Replication data for: Job-to-Job Mobility and Inflation
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faccini, Renato; Melosi, Leonardo (2023). Replication data for: Job-to-Job Mobility and Inflation [Dataset]. http://doi.org/10.7910/DVN/SMQFGS
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SMQFGS
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Faccini, Renato; Melosi, Leonardo
Description
Replication files for "Job-to-Job Mobility and Inflation" Authors: Renato Faccini and Leonardo Melosi Review of Economics and Statistics Date: February 2, 2023 -------------------------------------------------------------------------------------------- ORDERS OF TOPICS .Section 1. We explain the code to replicate all the figures in the paper (except Figure 6) .Section 2. We explain how Figure 6 is constructed .Section 3. We explain how the data are constructed SECTION 1 Replication_Main.m is used to reproduce all the figures of the paper except Figure 6. All the primitive variables are defined in the code and all the steps are commented in code to facilitate the replication of our results. Replication_Main.m, should be run in Matlab. The authors tested it on a DELL XPS 15 7590 laptop wih the follwoing characteristics: -------------------------------------------------------------------------------------------- Processor Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz 2.40 GHz Installed RAM 64.0 GB System type 64-bit operating system, x64-based processor -------------------------------------------------------------------------------------------- It took 2 minutes and 57 seconds for this machine to construct Figures 1, 2, 3, 4a, 4b, 5, 7a, and 7b. The following version of Matlab and Matlab toolboxes has been used for the test: -------------------------------------------------------------------------------------------- MATLAB Version: 9.7.0.1190202 (R2019b) MATLAB License Number: 363305 Operating System: Microsoft Windows 10 Enterprise Version 10.0 (Build 19045) Java Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode -------------------------------------------------------------------------------------------- MATLAB Version 9.7 (R2019b) Financial Toolbox Version 5.14 (R2019b) Optimization Toolbox Version 8.4 (R2019b) Statistics and Machine Learning Toolbox Version 11.6 (R2019b) Symbolic Math Toolbox Version 8.4 (R2019b) -------------------------------------------------------------------------------------------- The replication code uses auxiliary files and save the pictures in various subfolders: \JL_models: It contains the equations describing the model including the observation equations and routine used to solve the model. To do so, the routine in this folder calls other routines located in some fo the subfolders below. \gensystoama: It contains a set of codes that allow us to solve linear rational expectations models. We use the AMA solver. More information are provided in the file AMASOLVE.m. The codes in this subfolder have been developed by Alejandro Justiniano. \filters: it contains the Kalman filter augmented with a routine to make sure that the zero lower bound constraint for the nominal interest rate is satisfied in every period in our sample. \SteadyStateSolver: It contains a set of routines that are used to solved the steady state of the model numerically. \NLEquations: It contains some of the equations of the model that are log-linearized using the symbolic toolbox of matlab. \NberDates: It contains a set of routines that allows to add shaded area to graphs to denote NBER recessions. \Graphics: It contains useful codes enabling features to construct some of the graphs in the paper. \Data: it contains the data set used in the paper. \Params: It contains a spreadsheet with the values attributes to the model parameters. \VAR_Estimation: It contains the forecasts implied by the Bayesian VAR model of Section 2. The output of Replication_Main.m are the figures of the paper that are stored in the subfolder \Figures SECTION 2 The Excel file "Figure-6.xlsx" is used to create the charts in Figure 6. All three panels of the charts (A, B, and C) plot a measure of unexpected wage inflation against the unemployment rate, then fits separate linear regressions for the periods 1960-1985,1986-2007, and 2008-2009. Unexpected wage inflation is given by the difference between wage growth and a measure of expected wage growth. In all three panels, the unemployment rate used is the civilian unemployment rate (UNRATE), seasonally adjusted, from the BLS. The sheet "Panel A" uses quarterly manufacturing sector average hourly earnings growth data, seasonally adjusted (CES3000000008), from the Bureau of Labor Statistics (BLS) Employment Situation report as the measure of wage inflation. The unexpected wage inflation is given by the difference between earnings growth at time t and the average of earnings growth across the previous four months. Growth rates are annualized quarterly values. The sheet "Panel B" uses quarterly Nonfarm Business Sector Compensation Per Hour, seasonally adjusted (COMPNFB), from the BLS Productivity and Costs report as its measure of wage inflation. As in Panel A, expected wage inflation is given by the... Visit https://dataone.org/datasets/sha256%3A44c88fe82380bfff217866cac93f85483766eb9364f66cfa03f1ebdaa0408335 for complete metadata about this dataset.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
E. Coli Growth over 4 hours
figshare.com
png
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Salvagno; alexandria haddad (2023). E. Coli Growth over 4 hours [Dataset]. http://doi.org/10.6084/m9.figshare.91671.v2
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.91671.v2
Dataset updated
Jun 8, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anthony Salvagno; alexandria haddad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In preparation for some deuterium effects on E. coli and S. cerevisiae, I grew a starter culture and diluted it in 3 different concentrations. 1:10, 1:5, and 1:2. These dilutions were then grown at 37C for 4 hours and an absorption measurement was taken every hour. This fileset contains the raw data and some played with data, along with some figures made in Excel from the data. The file labeled "arb-ecoli-growth.png" is a figure made from manipulated data. I tried to combine the three data sets into one graph to see if I could extract some sort of growth information. I'm pretty sure I didn't do it right, but I included the image here nonetheless. In the 1:10 dilution sample, the cells would double in slightly less than one hour, every hour. In the 1:2 dilution, the growth rate was much slower, and the growth rate seemed to peak rather early in the trial. The 1:5 dilution is an overlap of growths between both the 1:10 and 1:2 dilutions. I don't know what to make of that. Also included in the fileset is an image of the absorbance spectrum from the nanodrop for every sample (including blanks taken every hour).
Football Players Data
kaggle.com
Updated Nov 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Masood Ahmed (2023). Football Players Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/6960429
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6960429
Dataset updated
Nov 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Masood Ahmed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description:

This comprehensive dataset offers detailed information on approximately 17,000 FIFA football players, meticulously scraped from SoFIFA.com.

It encompasses a wide array of player-specific data points, including but not limited to player names, nationalities, clubs, player ratings, potential, positions, ages, and various skill attributes. This dataset is ideal for football enthusiasts, data analysts, and researchers seeking to conduct in-depth analysis, statistical studies, or machine learning projects related to football players' performance, characteristics, and career progressions.

Features:

name: Name of the player.

full_name: Full name of the player.

birth_date: Date of birth of the player.

age: Age of the player.

height_cm: Player's height in centimeters.

weight_kgs: Player's weight in kilograms.

positions: Positions the player can play.

nationality: Player's nationality.

overall_rating: Overall rating of the player in FIFA.

potential: Potential rating of the player in FIFA.

value_euro: Market value of the player in euros.

wage_euro: Weekly wage of the player in euros.

preferred_foot: Player's preferred foot.

international_reputation(1-5): International reputation rating from 1 to 5.

weak_foot(1-5): Rating of the player's weaker foot from 1 to 5.

skill_moves(1-5): Skill moves rating from 1 to 5.

body_type: Player's body type.

release_clause_euro: Release clause of the player in euros.

national_team: National team of the player.

national_rating: Rating in the national team.

national_team_position: Position in the national team.

national_jersey_number: Jersey number in the national team.

crossing: Rating for crossing ability.

finishing: Rating for finishing ability.

heading_accuracy: Rating for heading accuracy.

short_passing: Rating for short passing ability.

volleys: Rating for volleys.

dribbling: Rating for dribbling.

curve: Rating for curve shots.

freekick_accuracy: Rating for free kick accuracy.

long_passing: Rating for long passing.

ball_control: Rating for ball control.

acceleration: Rating for acceleration.

sprint_speed: Rating for sprint speed.

agility: Rating for agility.

reactions: Rating for reactions.

balance: Rating for balance.

shot_power: Rating for shot power.

jumping: Rating for jumping.

stamina: Rating for stamina.

strength: Rating for strength.

long_shots: Rating for long shots.

aggression: Rating for aggression.

interceptions: Rating for interceptions.

positioning: Rating for positioning.

vision: Rating for vision.

penalties: Rating for penalties.

composure: Rating for composure.

marking: Rating for marking.

standing_tackle: Rating for standing tackle.

sliding_tackle: Rating for sliding tackle.

Use Case:

This dataset is ideal for data analysis, predictive modeling, and machine learning projects. It can be used for:

Player performance analysis and comparison.

Market value assessment and wage prediction.

Team composition and strategy planning.

Machine learning models to predict future player potential and career trajectories.

Note:

Please ensure to adhere to the terms of service of SoFIFA.com and relevant data protection laws when using this dataset. The dataset is intended for educational and research purposes only and should not be used for commercial gains without proper authorization.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1

Graph Input Data Example.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.7506209.v1

Dataset updated

Dec 26, 2018

Dataset provided by

Figsharehttp://figshare.com/

Authors

Dr Corynen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.

Clear search

Close search

Google apps

Main menu

Graph Input Data Example.xlsx

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

An Extensive Dataset for the Heart Disease Classification System

18 excel spreadsheets by species and year giving reproduction and growth...

Sample Student Data

Analysis of CBCS publications for Open Access, data availability statements...

In-Air Hand-Drawn Number and Shape Dataset

Field Variable Permeability Tests (Slug Tests) in Boreholes Made by Driven...

Niagara Open Data

Superstore Sales Analysis

Data set published in the IEEE TCAD article "Custom Multi-Cache...

Respiration_chambers/raw_log_files and combined datasets of biomass and...

Data underlying the publication: Inter-Spheroid proximity and matrix...

Data from: Wind Turbine / Reviewed Data

Replication data for: Job-to-Job Mobility and Inflation

UC_vs_US Statistic Analysis.xlsx

E. Coli Growth over 4 hours

Football Players Data

Description:

Features:

Use Case:

Note:

Graph Input Data Example.xlsx