Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
The program PanPlot 2 was developed as a visualization tool for the information system PANGAEA. It can be used as a stand-alone application to plot data versus depth or time. Data input format is tab-delimited ASCII (e.g. by export from MS-Excel or from PANGAEA). The default scales and graphic features can individualy be modified. PanPlot 2 graphs can be exported in several image formats (BMP, PNG, PDF, and SVG) which can be imported by graphic software for further processing.!PanPlot is retired since 2017. It is free of charge, is no longer being actively developed or supported, and is provided as-is without warranty.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The program PanPlot 2 was developed as a visualization tool for the information system PANGAEA. It can be used as a stand-alone application to plot data versus depth or time. Data input format is tab-delimited ASCII (e.g. by export from MS-Excel or from PANGAEA). The default scales and graphic features can individualy be modified. PanPlot 2 graphs can be exported in several image formats (BMP, PNG, PDF, and SVG) which can be imported by graphic software for further processing.
!PanPlot is retired since 2017. It is free of charge, is no longer being actively developed or supported, and is provided as-is without warranty.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set has been downloaded via the OAI-PMH endpoint of the Berlin State Library/Staatsbibliothek zu Berlin’s Digitized Collections (https://digital.staatsbibliothek-berlin.de/oai) on March 1st 2019 and converted into common tabular formats on the basis of the provided Dublin Core metadata. It contains 146,000 records.
In addition to the bibliographic metadata, representative images of the works have been downloaded, resized to a 512 pixel maximum thumbnail image and saved in JPEG format. The image data is split into title pages and first pages. Title pages have been derived from structural metadata created by scan operators and librarians. If this information was not available, first pages of the media have been downloaded. In case of multi-volume media, title pages are not available.
In total, 141,206 images title/first pages are available.
Furthermore, the tabular data has been cleaned and extended with geo-spatial coordinates provided by the OpenStreetMap project (https://www.openstreetmap.org). The actual data processing steps are summarized in the next section. For the sake of transparency and reproducibility, the original data taken from the OAI-PMH endpoint is still present in the table.
To conclude with, various graphs in GML file format are available that can be loaded directly into graph analysis tools such as Gephi (https://gephi.org/).
The implementation of the data processing steps (incl. graph creation) are available as a Jupyter notebook provided at https://github.com/elektrobohemian/SBBrowse2018/blob/master/DataProcessing.ipynb.
Tabular Metadata
The metadata is available in Excel (cleanedData.xlsx) and CSV (cleanedData.csv) file formats with equal content.
The table contains the following columns. Italique columns have not been processed.
· title The title of the medium
· creator Its creator (family name, first name)
· subject A collection’s name as provided by the library
· type The type of medium
· format A MIME type for full metadata download
· identifier An additional identifier (most often the PPN)
· language A 3-letter language code of the medium
· date The date of creation/publication or a time span
· relation A relation to a project or collection a medium has been digitized for.
· coverage The location of publication or origin (ranging from cities to continents)
· publisher The publisher of the medium.
· rights Copyright information.
· PPN The unique identifier that can be used to find more information about the current medium in all information systems of Berlin State Library/Staatsbibliothek zu Berlin.
· spatialClean In case of multiple entries in coverage, only the first place of origin has been extracted. Additionally, characters such as question marks, brackets, or the like have been removed. The entries have been normalized regarding whitespaces and writing variants with the help of regular expressions.
· dateClean As the original date may contain various format variants to indicate unclear creation dates (e.g., time spans or question marks), this field contains a mapping to a certain point in time.
· spatialCluster The cluster ID determined with the help of the Jaro-Winkler distance on the spatialClean string. This step is needed because the spatialClean fields still contain a huge amount of orthographic variants and latinizations of geographic names.
· spatialClusterName A verbal cluster name (controlled manually).
· latitude The latitude provided by OpenStreetMap of the spatialClusterName if the location could be found.
· longitude The longitude provided by OpenStreetMap of the spatialClusterName if the location could be found.
· century A century derived from the date.
· textCluster A text cluster ID on the basis of a k-means clustering relying on the title field with a vocabulary size of 125,000 using the tf*idf model and k=5,000.
· creatorCluster A text cluster ID based on the creator field with k=20,000.
· titleImage The path to the first/title page relative to the img/ subdirectory or None in case of a multi-volume work.
Other Data
graphs.zip
Various pre-computed graphs.
img.zip
First and title pages in JPEG format.
json.zip
JSON files for each record in the following format:
ppn "PPN57346250X"
dateClean "1625"
title "M. Georgii Gutkii, Gymnasii Berlinensis Rectoris Habitus Primorum Principiorum, Seu Intelligentia; Annexae Sunt Appendicis loco Disputationes super eodem habitu tum in Academia Wittebergensi, tum in Gymnasio Berlinensi ventilatae"
creator "Gutke, Georg"
spatialClusterName "Berlin"
spatialClean "Berolini"
spatialRaw "Berolini"
mediatype "monograph"
subject "Historische Drucke"
publisher "Kallius"
lat "52.5170365"
lng "13.3888599"
textCluster "45"
creatorCluster "5040"
titleImage "titlepages/PPN57346250X.jpg"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all the files used in developing the Tree-KG, the knowledge graph to capture the tree annotations in the works of Vladimir Nabokov.
In the Ontology Versions folder, four ontology (TAV) files in turtle (.ttl) format are provided. They are all numbered and dated to represent their different versions. Also, the competency questions (CQs) and sample SPARQL queries provided in .txt files. The KG was developed on Protégé.
(1) contains the schema for the TAV vocabulary without the linking to external vocabularies.
(2) contains the schema for TAV vocabulary with the links to external terms.
(3) contains the Tree-KG along with the data from three Nabokov novels (Mary; King, Queen, Knave; Glory) in a self-contained way.
(4) contains the Tree-KG that reflects the data from three novels (Mary; King, Queen, Knave; Glory) in a linked data way.
(5) contains some of the CQs used to develop TAV (.txt) file.
(6) contains some sample SPARQL queries (.txt) file.
In the Trees of Nabokov-Annotated Dataset folder, 6 spreadsheets in excel (.xlsx) format are provided. They are numbered. Note that annotated data are all in English as the consulted works are the English translations of the literary works of Nabokov.
(1) contains the tree annotations from the novels originally written in Russian by Vladimir Nabokov.
(2) contains the tree annotations from the novels originally written in English by Vladimir Nabokov.
(3) contains the tree annotations from the short stories originally written in Russian and English by Vladimir Nabokov.
(4) is the knowledge base (KB) developed to link the annotated trees to Wikidata and DBPedia.
(5) is the benchmarking results of some entity recognition tools. It also includes the relevant passages from Nabokov's novels that were used in the experiments.
(6) represents the complete bibliographic details of the works of Vladimir Nabokov (https://thenabokovian.org/abbreviations).
*** Fake News on Twitter ***
These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:
1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.
2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."
3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.
4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.
5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.
The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).
DD
DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:
The structure of excel files for each dataset is as follow:
Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The description sentence in the profile of the user who has published the tweet/retweet
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Date and time of creation of the account by which the current tweet/retweet has been posted
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)
DG
DG for each fake news contains two files:
A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)
Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.
The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NHTSA's Corporate Average Fuel Economy (CAFE) program requires manufacturers of passenger cars and light trucks, produced for sale in the U.S., to meet CAFE standards, expressed in miles per gallon (mpg). The purpose of the CAFE program is to reduce the nation's energy consumption by increasing the fuel economy of cars and light trucks. The CAFE Public Information Center (PIC) is the authoritative source for Corporate Average Fuel Economy (CAFE) program data. This site allows fuel economy data to be viewed in report and/or graph format. The data can be sorted and filtered to produce custom reports which can also be downloaded as Excel or pdf files. NHTSA periodically updates the CAFE data in the PIC and, therefore, each report and graph is date stamped to indicate the last time NHTSA made updates.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NHTSA's Corporate Average Fuel Economy (CAFE) program requires manufacturers of passenger cars and light trucks, produced for sale in the U.S., to meet CAFE standards, expressed in miles per gallon (mpg). The purpose of the CAFE program is to reduce the nation's energy consumption by increasing the fuel economy of cars and light trucks. The CAFE Public Information Center (PIC) is the authoritative source for Corporate Average Fuel Economy (CAFE) program data. This site allows fuel economy data to be viewed in report and/or graph format. The data can be sorted and filtered to produce custom reports which can also be downloaded as Excel or pdf files. NHTSA periodically updates the CAFE data in the PIC and, therefore, each report and graph is date stamped to indicate the last time NHTSA made updates.
This resource is a compilation of individual Excel workbooks containing temperature-depth log data and graphic profiles for wells in Florida counties. The files are provided in zipped archival folders. Each file contains a Resource Provider worksheet with contact information for the data source, and additional worksheets for each temperature-depth dataset and profile by site ID. Each set of data and profile includes information related to the log date, site name, site ID, county, location (lat/long), and information source. In addition to the temp-depth profile data, a graph is provided. The data were provided by the Florida Geological Survey and are available in the following format: Excel workbooks for download. The data were provided by the Florida Geological Survey and made available for distribution through the National Geothermal Data System.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SciQA benchmark of questions and queries.
The data dump is in NTriples format (RDF NT) taken from the orkg system on 14.03.2023 at 02:04PM.
The dump can be imported into a virtuoso endpoint or any RDF engine so it can be queried.
The questions/queries are provided as spread sheets (Excel format & CSV format), also train and test files are provided.
Types of questions and queries:
More details on certain columns:
"Classification rationale" It may contain the following values:
Explanation of Rationale for Non-factoid:
Tax on Construction, Installations and Works (ICIO): Self-assessment information is presented since 2005. In csv format are presented: Number of self-assessments/year Amounts/year The same data are presented in Excel format, as reports with totals and graphs for the last ten years.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set for the essay "Automatic merging of separated construction plans of hydraulic structures" submitted for Bautechnik 5/22. The data set is structured as follows: - The ZIP file "01 Original Data" contains 233 folders (named after the TU IDs) with the associated partial recordings in TIF format. The TIFs are binary compressed in CCITT Fax 4 format. 219 TUs are divided into two parts and 14 into three parts. The original data therefore consists of 480 partial recordings. - The ZIP file "02 Interim Results" contains 233 folders (named after the TU IDs) with relevant intermediate results generated during stitching. This includes the input images scaled to 10 MP, the visualization of the feature assignment(s) and the result in downscaled resolution with visualized seam lines. - The ZIP file "03_Results" contains the 170 successfully merged plans in high resolution in TIF format - The Excel file "Dataset" contains metadata on the 233 examined TUs including the DOT graph of the assignment described in the work and the correctness rating the results and the assignment to the presented sources of error. The data set was generated with the following metadata query in the IT system Digital Management of Technical Documents (DVtU): Microfilm metadata - TA (partial recording) - Number: "> 1" Document metadata - Object part: "130 (Wehrwangen, Wehrpillars)" - Object ID no .: "213 (Weir systems)" - Detail: "*[Bb]wehrung*" - Version: "01.00.00"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figure 12, Figure 13, and Figure 14 show selected imaginary and real part capacitance spectra (graphs a and b, respectively) recorded over the freezing period for the TVIS vial containing 5% sucrose , 5% sucrose and 0.26% NaCl, and 5% sucrose and 0.55% NaCl. The right side shows the derived parameters of the log of the peak frequency (graph c) and the peak amplitude (graph d) for the principle relaxation peak observed in the imaginary capacitance spectra, i.e.,Maxwell-Wagner process (MW) for the 5% sucrose solution (low conductivity).Interfacial capacitance (IC) for the 5% sucrose solutions with 0.26% and 0.55% solutions (high conductivity).Dielectric relaxation of ice for the frozen state of all three solutions.The right side of Figure 12, Figure 13, and Figure 14 also show the derived parameters of the real capacitance at low and high frequency, given the symbols(10 Hz) and(0.2 MHz), respectively (graphs e and f, respectively). Overlayed on these profiles are the arrows pointing from left to right which have been drawn to indicate/highlight the differences in the temperature dependencies of the real part capacitance in the limits of low and high frequency.The period marked TP is the transition period in which either of the two conditions apply: (i) the peaks of any of the processes are no longer visible within the experimental frequency region of the TVIS instrument, or (ii) the peak observed is a hybrid of multiple processes, for example a peak that comprises contributions from the MW relaxation and from the dielectric relaxation of ice.The other Excel files give a comprehensive set of data, for all three samples, 5% w/v sucrose (with 0% w/vNaCl), 5% w/v sucrose (with 0.26% w/v NaCl), and 5% w/v sucrose (with 0.55% w/v NaCl), to support figures 12, 13, and 14
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Saint Vincent and the Grenadines's Time required to start a business is 10[Days] which is the 111th highest in the world ranking. Transition graphs on Time required to start a business in Saint Vincent and the Grenadines and comparison bar charts (USA vs. China vs. Japan vs. Saint Vincent and the Grenadines), (Grenada vs. Tonga vs. Saint Vincent and the Grenadines) are used for easy understanding. Various data can be downloaded and output in csv format for use in EXCEL free of charge.
https://www.gnu.org/licenses/gpl-3.0https://www.gnu.org/licenses/gpl-3.0
The program PanPlot was developed as a visualization tool for the information system PANGAEA. It can be used as a stand-alone application to plot data versus depth or time or in a ternary view. Data input format is tab-delimited ASCII (e.g. by export from MS-Excel or from PANGAEA). The default scales and graphic features can individualy be modified. PanPlot graphs can be exported in platform-specific interchange formats (EMF, PICT) which can be imported by graphic software for further processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
's Persons who began work in the past year is 247,600person which is the 4th highest in Japan (by Prefecture). Transition Graphs and Comparison chart between Aichi and Osaka(Osaka) and Saitama(Saitama)(Closest Prefecture in Population) are available. Various data can be downloaded and output in csv format for use in EXCEL free of charge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.