https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3451https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3451
We evaluated several animations for transitions between scatter plots in a crowd-sourcing study. We published the results in a paper and provide additional information within this supplemental material. Contents: Tables that did not fit into the original paper, due to page limits. An anonymized print-out of the preregistration. The original preregistration is available at OSF (DOI) and on the internet archive. Videos demonstrating the tasks used in the study: used to record samples for the study, used for participant training, and used to detect distracted participants and bots. An interactive demonstration of all study tasks (including training and attention checks). The source code is contained within the directory ./interactive-demo/ of this supplemental material and also available at GitHub. The animation library that we used for the study. We also include a test page for readers to use with their own data sets. The source code is contained within the directory ./animation-library/ of this supplemental material and also avialable at GitHub. The list of nonsensical statements that we used for attention checks on Prolific. The statistical tests with the recorded study data, some of which we reported in the main paper. We also provide reports from the preliminary power analysis that we performed to determine the number of participants for the study. The recorded pseudo-anonymized study data for further analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The desertification process causes soil degradation and a reduction in vegetation. The absence of visualisation techniques and the broad spatial and temporal dimension of the data hampers the identification of desertification and rapid decision-making by multidisciplinary teams. The 2D Scatter Plot is a two-dimensional visual analysis of reflectances in the red (630 - 690 nm) and near-infrared (760 - 900 nm) bands to visualise the spectral response of the vegetation. The hypothesis of this study is that visualising the reflectances of the vegetation by means of a 2D scatter plot will allow desertification to be inferred. The aim of this study was to identify desertified areas and characterise the spatial and temporal dynamics of the vegetation and soil during dry (DP) and rainy (RP) periods between 2000 and 2008, using a 2D scatter plot. The 2D scatter plot generated by the Envi® 4.8 software and the reflectances in bands 3 and 4 of the TM5 sensor were used within communities in the Irauçuba hub (Ceará, Brazil). The concentration densities of the near-infrared reflectances of the vegetation pixels were observed. Each community presented pixel concentrations with reflectances of less than 0.4 (40%) during each of the periods under evaluation, indicating little vegetation development, with further degradation caused by deforestation, the use of fire and overgrazing. The 2D scatter plot was able to show vegetation with low reflectance in the near infrared during both dry and rainy periods between 2000 and 2008, thereby inferring the occurrence of desertification.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
trends
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive insights into US regional sales data across different sales channels, including In-Store, Online, Distributor, and Wholesale. With a total of 17,992 rows and 15 columns, this dataset encompasses a wide range of information, from order and product details to sales performance metrics. It offers a comprehensive overview of sales transactions and customer interactions, enabling deep analysis of sales patterns, trends, and potential opportunities.
Columns in the dataset: - OrderNumber: A unique identifier for each order. - Sales Channel: The channel through which the sale was made (In-Store, Online, Distributor, Wholesale). - WarehouseCode: Code representing the warehouse involved in the order. - ProcuredDate: Date when the products were procured. - OrderDate: Date when the order was placed. - ShipDate: Date when the order was shipped. - DeliveryDate: Date when the order was delivered. - SalesTeamID: Identifier for the sales team involved. - CustomerID: Identifier for the customer. - StoreID: Identifier for the store. - ProductID: Identifier for the product. - Order Quantity: Quantity of products ordered. - Discount Applied: Applied discount for the order. - Unit Cost: Cost of a single unit of the product. - Unit Price: Price at which the product was sold.
This dataset serves as a valuable resource for analysing sales trends, identifying popular products, assessing the performance of different sales channels, and optimising pricing strategies for different regions.
Visualization Ideas:
Data Modelling and Machine Learning Ideas (Price Prediction): - Linear Regression: Build a linear regression model to predict the unit price based on features such as order quantity, discount applied, and unit cost. - Random Forest Regression: Use a random forest regression model to predict the price, taking into account multiple features and their interactions. - Neural Networks: Train a neural network to predict unit price using deep learning techniques, which can capture complex relationships in the data. - Feature Importance Analysis: Identify the most influential features affecting price prediction using techniques like feature importance scores from tree-based models. - Time Series Forecasting: Develop a time series forecasting model to predict future prices based on historical sales data. - These visualisation and modelling ideas can help you gain valuable insights from the sales data and create predictive models to optimise pricing strategies and improve sales performance.
10,000 Sets-Digital Chart Q&A Data, covering categories such as line charts, bar charts, pie charts, scatter plots, composite types, and tables. Each image has two rounds of Q&A, one for numerical reading and the other for numerical calculation.
SoilExcel workflow, a tool designed to optimize soil data analysis. It covers data preparation, statistical analysis methods, and result visualization. SoilExcel integrates various environmental data types and applies advanced techniques to enhance accuracy in soil studies. The results demonstrate its effectiveness in interpreting complex data, aiding decision-making in environmental management projects. Background Understanding the intricate relationships and patterns within soil samples is crucial for various environmental and agricultural applications. Principal Component Analysis (PCA) serves as a powerful tool in unraveling the complexity of multivariate soil datasets. Soil datasets often consist of numerous variables representing diverse physicochemical properties, making PCA an invaluable method for: ∙Dimensionality Reduction: Simplifying the analysis without compromising data integrity by reducing the dimensionality of large soil datasets. ∙Identification of Dominant Patterns: Revealing dominant patterns or trends within the data, providing insights into key factors contributing to overall variability. ∙Exploration of Variable Interactions: Enabling the exploration of complex interactions between different soil attributes, enhancing understanding of their relationships. ∙Interpretability of Data Variance: Clarifying how much variance is explained by each principal component, aiding in discerning the significance of different components and variables. ∙Visualization of Data Structure: Facilitating intuitive comprehension of data structure through plots such as scatter plots of principal components, helping identify clusters, trends, and outliers. ∙Decision Support for Subsequent Analyses: Providing a foundation for subsequent analyses by guiding decision-making, whether in identifying influential variables, understanding data patterns, or selecting components for further modeling. Introduction The motivation behind this workflow is rooted in the imperative need to conduct a thorough analysis of a diverse soil dataset, characterized by an array of physicochemical variables. Comprising multiple rows, each representing distinct soil samples, the dataset encompasses variables such as percentage of coarse sands, percentage of organic matter, hydrophobicity, and others. The intricacies of this dataset demand a strategic approach to preprocessing, analysis, and visualization. To lay the groundwork, the workflow begins with the transformation of an initial Excel file into a CSV format, ensuring improved compatibility and ease of use throughout subsequent analyses. Furthermore, the workflow is designed to empower users in the selection of relevant variables, a task facilitated by user-defined parameters. This flexibility allows for a focused and tailored dataset, essential for meaningful analysis. Acknowledging the inherent challenges of missing data, the workflow offers options for data quality improvement, including optional interpolation of missing values or the removal of rows containing such values. Standardizing the dataset and specifying the target variable are crucial, establishing a robust foundation for subsequent statistical analyses. Incorporating PCA offers a sophisticated approach, enabling users to explore inherent patterns and structures within the data. The adaptability of PCA allows users to customize the analysis by specifying the number of components or desired variance. The workflow concludes with practical graphical representations, including covariance and correlation matrices, a scree plot, and a scatter plot, offering users valuable visual insights into the complexities of the soil dataset. Aims The primary objectives of this workflow are tailored to address specific challenges and goals inherent in the analysis of diverse soil samples: ∙Data transformation: Efficiently convert the initial Excel file into a CSV format to enhance compatibility and ease of use. ∙Variable selection: Empower users to extract relevant variables based on user-defined parameters, facilitating a focused and tailored dataset. ∙Data quality improvement: Provide options for interpolation or removal of missing values to ensure dataset integrity for downstream analyses. ∙Standardization and target specification: Standardize the dataset values and designate the target variable, laying the groundwork for subsequent statistical analyses. ∙PCA: Conduct PCA with flexibility, allowing users to specify the number of components or desired variance for a comprehensive understanding of data variance and patterns. ∙Graphical representations: Generate visual outputs, including covariance and correlation matrices, a scree plot, and a scatter plot, enhancing the interpretability of the soil dataset. Scientific questions This workflow addresses critical scientific questions related to soil analysis: ∙Variable importance: Identify variables contributing significantly to principal components through the covariance matrix and PCA. ∙Data structure: Explore correlations between variables and gain insights from the correlation matrix. ∙Optimal component number: Determine the optimal number of principal components using the scree plot for effective representation of data variance. ∙Target-related patterns: Analyze how selected principal components correlate with the target variable in the scatter plot, revealing patterns based on target variable values.
Chart Viewer allows app viewers to explore your map beside charts related to your data. App authors can display multiple data-based graphics configured in Map Viewer to compliment information in the map. Up to ten charts can be included in the app and each can be viewed alongside your map or side by side with other charts for comparison.Examples:Present a bar chart representing average property value by county for a given areaCompare charts based on multiple population statistics in your datasetDisplay an interactive scatter plot based on two values in your dataset along with an essential set of map exploration toolsData RequirementsThis app requires a map with at least one chart configured. For more information, see the Charts help topic.Key App CapabilitiesMultiple layout options - Choose to display your charts stacked with the map or side by side with the mapManage charts - Reorder, rename, or turn off and on charts in the appMultiselect chart - Compare two charts in the panel at the same timeBookmarks - Enable bookmarks configured in the Map Viewer to include a collection of preset extentsHome, Zoom Controls, Legend, Layer List, SearchSupportabilityThis web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
This data annex contains the supplementary data to the IEA PVPS Task 16 report "Worldwide benchmark of modeled solar irradiance data" from 2023. The dataset includes visualizations and tables of the results as well as information concerning the reference stations. The dataset contains the following type of files: StationList.xlsx: list of all stations, including their coordinates, climate zone, station code, continent, altitude AMSL, data source, number of available test data sets, station type (Tier-1 or Tier-2), and available calibration record. Result tables in folder “ResultTables”: Folders “climate_zones” and “continents” contain the tables described in Section 5.3. The filenames are “Component_metric_in_subgroup.html” with “component” DNI or GHI, “metric” describing the metric (see Table 3), and “subgroup” describing the continent or climate zone. World maps: The folder “Resultmaps” contains world maps of the metrics described in Section 5.2. Either four or three metrics, depending on the map, are included in each pdf. A legend describing the meaning of the point size is also included. Scatter plots of test vs. reference irradiance: The folder “Scatterplots” contains two folders, “DNI” and “GHI”, for the two investigated components. Three subfolders are also contained in these two folders: The subfolders “plotsPerSiteYear” contain plots named “scatOverviewCOMPONENT_SITEYYYY.png”, where “COMPONENT” is either DNI or GHI, SITE is the three-letter site abbreviation, and YYYY is the evaluated year. The png plots include the scatterplots for all test data sets evaluated for the case specified by the filename. The subfolders “plotsPerTestdataProvider” contain plots named “scatOverviewTESTDATASET_COMPONENTYYYY.png”, where “TESTDATASET” describes the test data set, “COMPONENT” is either DNI or GHI, and YYYY is the evaluated year. The png plots include the scatterplots for all sites evaluated for the case specified by the filename. The subfolders “plotsPerTestdataProviderSamePosPerStat” contain the same scatterplots as “plotsPerTestdataProvider”, but using a slightly different visualization method. Here, the position of each scatterplot for a given site within the plot is always the same. Although this yields many empty subplots and small scatterplots, it can be helpful to rapidly browse through the plots if only one or a few stations are of interest.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.
The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.
An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.
A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.
The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.
Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.
ATMOWeb ( as the part of FTPBrowser interface) provides a graphical browsing, subsetting and retrieval capability for selected ionospheric and atmospheric data. Data can be displayed as time series plots, filtering and scatter plot options are also included for a few spacecraft. The Space Physics Data Facility (SPDF) is the archive of non-solar data for the Heliospheric Science Division (HSD) at NASA's Goddard Space Flight Center.
Note data organized into .Zip folders we suggest to maintain format when downloading. Figure S5A -Binned binding qF3 curve data for VBH3 proteins that failed to bind sufficiently to CMcl-1. Figure S5B Simulated Shape Ratio data and MATLAB code Figure S5C Shape Ratios of known binding partners of VBH3 proteins with CBclXL or CBcl2 and for VBH3. These values correspond to exported sRatios for control wells included in the source data for Figures 3,4,5,7. Figure S5D Not shown here * Fig S5D is a copy of data in Figures 3,4,5,7 summarized results. Figure S5E Binding curve data, combined from 4 biological replicates of "mim2_BclW screen" or "mim2_Bcl2_screen". The mCerulean3-fused anti-apoptotic protein and transfected Venus-fused BH3 are given in the filename. Displayed graphs can be generated by plotting the first 2 columns as a scatter plot. Figure S5F Binding curve data, combined from 3 biological replicates of "BH3-swap screen". The mCerulean3-fused anti-apoptotic protein and transfected Venus-fused BH3 are given in the filename. Displayed graphs can be generated by plotting the first 2 columns as a scatter plot.
NASCArrays is the Nottingham Arabidopsis Stock Centre''s microarray database. Currently most of the data is for Arabidopsis thaliana experiments run by the NASC Affymetrix Facility. There are also experiments from other species, and experiments run by other centres too. NASCArrays is an Affymetrix microarray database. It contains free Affymetrix microarray data, and also features a series of tools allowing you to query that data in powerful ways. Most of the data currently comes from NASC''s Affymetrix Service. It also includes data from other sources, notably the AtGenExpress project. They currently distribute over 30,000 tubes of seed a year. There are currently the following data mining tools available. All of these tools allow you to type in a gene(s) of interest, and identify experiments or slides that you might be interested in: -Spot History: This tool allows you to see the pattern of gene expression over all slides in the database. Easily identify slides (and therefore experimental treatments) where genes are highly, lowly, or unusually expressed -Two gene scatter plot: This tool allows you to see the pattern of gene expression over all slides for two genes as a scatter plot. If you are interested in two genes, you can find out if they act in tandem, and highlight slides (and therefore experimental conditions) where these two genes behave in an unusual manner. -Gene Swinger: If you have a gene of interest, this tool will show you which experiment the gene expression varied most -Bulk Gene Download: This tool allows you to download the expression of a list of genes over all experiments. You can get all genes over all experiments (the entire database!) from the Super Bulk Gene Download Sponsors: This is a BBSRC funded consortium to provide services to the Arabidopsis community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the MITgcm model output data presented in Ashley, K.E. et al., 2021. This dataset includes simulated spatial changes in sea surface salinity (SSS), time series data of salinity, and scatter plot data of SSS changes against meltwater discharge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
* Distinct Flow Graphs and Data (Using Categories)
* *Distinct Flows by Malicious Category Using Full Flow Names* This graph depicts the frequency of flows appearing within each Malicious cateogry defined by the MalGenome project, and includes the popular applications we processed under the "Normal" category for comparison purposes. The frequency is defined by the number of applications within a category that use a particular flow, divided by the total number of applications in that category, and is represented by the size of the marks in the scatter plot.
* *Distinct Flow Categories by Malicious Category, Level 1* This graph, similar to the one described above, depicts the frequency of flow minus one level of distinction. For example, the flow sources android.location.Location:getLatitude and android.location.Location:getLongitude are now grouped under android.location.Location, as are their corresponding sinks.
* *Distinct Flow Categories by Malicious Category, Level 2* This graph, similar to the first one described above, depicts the frequency of flow minus two levels of distinction. For example, the flow sources android.location.Location:getLatitude and android.location.Location:getLongitude are now grouped under android.location, as are their corresponding sinks.
* *Distinct Flow Categories by Malicious Category, Level 3* This graph, similar to the first one described above, depicts the frequency of flow minus three levels of distinction. For example, the flow sources android.location.Location:getLatitude and android.location.Location:getLongitude are now grouped under android, as are their corresponding sinks.
* Distinct Flow Graphs and Data (General Malware Vs. Normal)
* *Distinct Flows Using Full Flow Names* This graph depicts the frequency of flows appearing within each Malicious cateogry defined by the MalGenome project, and includes the popular applications we processed under the "Normal" category for comparison purposes. The frequency is defined by the number of applications within a category that use a particular flow, divided by the total number of applications in that category, and is represented by the size of the marks in the scatter plot.
* *Distinct Flows Cateogories, Level 1* This graph, similar to the first graph, depicts the frequency of flow minus one level of distinction. For example, the flow sources android.location.Location:getLatitude and android.location.Location:getLongitude are now grouped under android.location.Location, as are their corresponding sinks.
* *Distinct Flows Cateogories, Level 2* This graph, similar to the first graph, depicts the frequency of flow minus one level of distinction. For example, the flow sources android.location.Location:getLatitude and android.location.Location:getLongitude are now grouped under android.location, as are their corresponding sinks.
* *Distinct Flows Categories, Level 3* This graph, similar to the first graph, depicts the frequency of flow minus one level of distinction. For example, the flow sources android.location.Location:getLatitude and android.location.Location:getLongitude are now grouped under android, as are their corresponding sinks.
Attribute info
1. Category
2. Flow Source
3. Flow Sink
4. Distinct APK count
5. Total Distinct APKs
Software Product Lines (SPLs) typically provide a large number of configurations to cater to a set of diverse requirements of specific markets. This large number of configurations renders unfeasible to test them all individually. Instead, Combinatorial Interaction Testing (CIT) compute a representative sample according to criteria of the interactions of feature in the configurations. We performed an empirical study using task performance an eye-tracker technologies to analyze the effectiveness of two visualization techniques at conveying the test coverage of ten case studies of varying complexity. Our evaluation considered accuracy, execution time, metacognitive monitoring, and visual attention. The study revealed clear advantages of a visualization technique over the other in three evaluation aspects, with a reverse effect depending on the strength of the coverage and distinct areas of visual attention.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.
This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recent development of high-throughput analytical techniques has made it possible to qualitatively identify a number of metabolites simultaneously. Correlation and multivariate analyses such as principal component analysis have been widely used to analyse those data and evaluate correlations among the metabolic profiles. However, these analyses cannot simultaneously carry out identification of metabolic reaction networks and prediction of dynamic behaviour of metabolites in the networks. The present study, therefore, proposes a new approach consisting of a combination of statistical technique and mathematical modelling approach to identify and predict a probable metabolic reaction network from time-series data of metabolite concentrations and simultaneously construct its mathematical model. Firstly, regression functions are fitted to experimental data by the locally estimated scatter plot smoothing method. Secondly, the fitted result is analysed by the bivariate Granger causality test to determine which metabolites cause the change in other metabolite concentrations and remove less related metabolites. Thirdly, S-system equations are formed by using the remaining metabolites within the framework of biochemical systems theory. Finally, parameters including rate constants and kinetic orders are estimated by the Levenberg–Marquardt algorithm. The estimation is iterated by setting insignificant kinetic orders at zero, i.e., removing insignificant metabolites. Consequently, a reaction network structure is identified and its mathematical model is obtained. Our approach is validated using a generic inhibition and activation model and its practical application is tested using a simplified model of the glycolysis of Lactococcus lactis MG1363, for which actual time-series data of metabolite concentrations are available. The results indicate the usefulness of our approach and suggest a probable pathway for the production of lactate and acetate. The results also indicate that the approach pinpoints a probable strong inhibition of lactate on the glycolysis pathway.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Partially interactive HTMLs of vector space and dataset projection scatter plots.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
#Overview
This dataset supports the study on Africa-CORDEX Regional Climate Models' performance evaluation in simulating air temperature (tasmax and tasmin) and precipitation in the Melka-Wakena watershed, Ethiopia.
The evaluation spans from 1991 to 2005 and includes observed daily data and raw CORDEX AFR44 daily data for tasmax, tasmin, and precipitation. The dataset was analyzed using scatter plots, empirical cumulative distribution functions (ECDF), Taylor diagrams, and multi-metric performance evaluations.
#Files and Structure
1. Data/
>>This directory contains all input and processed datasets used in the study.
#Observed_Data/
>>Observed_Tasmax.csv: Observed daily maximum temperature (tasmax) data.
>>Observed_Tasmin.csv: Observed daily minimum temperature (tasmin) data.
>>Observed_Precipitation.csv: Observed daily precipitation data.
#CORDEX_Raw_Data/
>>CORDEX_Tasmax_Raw.csv: Daily tasmax data from CORDEX AFR44 models.
>>CORDEX_Tasmin_Raw.csv: Daily tasmin data from CORDEX AFR44 models.
>>CORDEX_Precipitation_Raw.csv: Daily precipitation data from CORDEX AFR44 models.
#Processed_CORDEX_Data/
>>CORDEX_Tasmax_Processed.csv: Preprocessed daily tasmax data for analysis (e.g., aggregated and formatted).
>>CORDEX_Tasmin_Processed.csv: Preprocessed daily tasmin data for analysis.
>>CORDEX_Precipitation_Processed.csv: Preprocessed daily precipitation data for analysis.
2. Scripts/
This directory contains Python scripts used for preprocessing, evaluation, and visualization of the data.
#Data_Preprocessing_Scripts/
>>Preprocess_Tasmax.py: Script to preprocess daily tasmax data.
>>Preprocess_Tasmin.py: Script to preprocess daily tasmin data.
>>Preprocess_Precipitation.py: Script to preprocess daily precipitation data.
#Evaluation_Scripts/
>>Scatter_Plot_Script.py: Script for scatter plot visualizations comparing observed and model data.
>>ECDF_Script.py: Script for generating empirical cumulative distribution functions (ECDF).
>>Taylor_Diagram_Script.py: Script for generating Taylor diagrams to evaluate model performance.
>>Performance_Metrics_Script.py: Script to compute evaluation metrics.
>>Approach_Comparison_Script.py: Script for comparing different model evaluation approaches using multi-metric weighted ranking.
#Metadata
>>Study Area: Melka-Wakena watershed, Ethiopia.
>>Time Period: 1991–2005.
#Data Source:
>>Observed data from local meteorological stations.
>>CORDEX AFR44 model data downloaded from the Earth System Grid Federation (ESGF).
#Variables:
Tasmax: Daily maximum temperature (°C).
Tasmin: Daily minimum temperature (°C).
Precipitation: Daily precipitation (mm/day).
Evaluation Metrics: RMSE, MAE, R², NSE, Percent Bias (PBIAS) ,and others.
#How to Use
Download the dataset:
All required data files are organized in the Data/ folder.
#Run the preprocessing scripts:
>>If using new datasets, preprocess the raw data using the scripts in Data_Preprocessing_Scripts/. This step formats the data and ensures compatibility with the evaluation scripts.
#Conduct evaluation:
>>Use the Evaluation_Scripts/ to replicate scatter plots, ECDF, Taylor diagrams, and compute performance metrics.
>>Use Approach_Comparison_Script.py for multi-metric weighted ranking comparisons of model performance.
#Citation
>>When using this dataset, please cite the following:
#The dataset:
"Dataset for CORDEX AFR44 Model Evaluation in the Melka-Wakena Watershed, Ethiopia."
DOI: https://doi.org/10.5281/zenodo.14208274
#The source of CORDEX data:
>>CORDEX AFR44 data, available from the Earth System Grid Federation (ESGF).
Contact
For questions or additional information, contact:
Tadele: t4shgeresu@gmail.com.
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3451https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-3451
We evaluated several animations for transitions between scatter plots in a crowd-sourcing study. We published the results in a paper and provide additional information within this supplemental material. Contents: Tables that did not fit into the original paper, due to page limits. An anonymized print-out of the preregistration. The original preregistration is available at OSF (DOI) and on the internet archive. Videos demonstrating the tasks used in the study: used to record samples for the study, used for participant training, and used to detect distracted participants and bots. An interactive demonstration of all study tasks (including training and attention checks). The source code is contained within the directory ./interactive-demo/ of this supplemental material and also available at GitHub. The animation library that we used for the study. We also include a test page for readers to use with their own data sets. The source code is contained within the directory ./animation-library/ of this supplemental material and also avialable at GitHub. The list of nonsensical statements that we used for attention checks on Prolific. The statistical tests with the recorded study data, some of which we reported in the main paper. We also provide reports from the preliminary power analysis that we performed to determine the number of participants for the study. The recorded pseudo-anonymized study data for further analysis.