57 datasets found

Graph Input Data Example.xlsx
figshare.com
xlsx
Updated Dec 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7506209.v1
Dataset updated
Dec 26, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Dr Corynen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.
f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
e
PanPlot 2 - software to visualize profiles and time series - Dataset -...
b2find.eudat.eu
Updated Oct 22, 2003
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2003). PanPlot 2 - software to visualize profiles and time series - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/76f60c61-23be-5640-bc0a-bb74d4069d01
Explore at:
Dataset updated
Oct 22, 2003
Description
The program PanPlot 2 was developed as a visualization tool for the information system PANGAEA. It can be used as a stand-alone application to plot data versus depth or time. Data input format is tab-delimited ASCII (e.g. by export from MS-Excel or from PANGAEA). The default scales and graphic features can individualy be modified. PanPlot 2 graphs can be exported in several image formats (BMP, PNG, PDF, and SVG) which can be imported by graphic software for further processing.!PanPlot is retired since 2017. It is free of charge, is no longer being actively developed or supported, and is provided as-is without warranty.
Supplementary data files for manuscript titled "From spreadsheet lab data...
zenodo.org
zip
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gesa Witt; Gesa Witt; Yojana Gadiya; Yojana Gadiya; Tooba Abbassi-Daloii; Tooba Abbassi-Daloii; Vassilios Ioannidis; Vassilios Ioannidis; Nick Juty; Nick Juty; Claus Stie Kallesøe; Claus Stie Kallesøe; Marie Attwood; Manfred Kohler; Manfred Kohler; Philip Gribbon; Marie Attwood; Philip Gribbon (2025). Supplementary data files for manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research" [Dataset]. http://doi.org/10.5281/zenodo.15234457
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15234457
Dataset updated
Apr 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gesa Witt; Gesa Witt; Yojana Gadiya; Yojana Gadiya; Tooba Abbassi-Daloii; Tooba Abbassi-Daloii; Vassilios Ioannidis; Vassilios Ioannidis; Nick Juty; Nick Juty; Claus Stie Kallesøe; Claus Stie Kallesøe; Marie Attwood; Manfred Kohler; Manfred Kohler; Philip Gribbon; Marie Attwood; Philip Gribbon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 17, 2025
Description
This data repository contains all the necessary supplementary files for the manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research."

The repository is a copy of the GitHub page with the source code used to generate the graph and additional files required for the Lab Data Template.

Below we provide a brief overview of the data files in the `additional folder` and their underlying purpose:

The Data Survey collects relevant project and data set information to set up a Data Management Plan. It can serve as an input for Lab Data Template development.

The Lab Data Templates facilitate the collection of AMR research data (in vivo and in vitro) in several sub-tables. The Excel format is compatible with upload procedures into the data repository 'grit' and serves as input for a knowledge graph workflow.

The Data dictionary is connected to the Lab Data Templates and ensures harmonized data entries. In addition, the dictionaries collect metadata beyond the content of the Lab Data Template (e.g. bacterial strain information or compound information) and link to ontologies where possible.

The FAIR assessments have been used as a primer for improving the template. This report is generated using the FAIR-DSM model.

The templates have been used during the IMI2 GNA NOW project to collect information and have been improved according to FAIR standards in collaboration with the IMI FAIRplus project ("post FAIRification").
d
PanPlot 2 - software to visualize profiles and time series
search.dataone.org
doi.pangaea.de
Updated Apr 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sieger, Rainer; Grobe, Hannes; Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven (2018). PanPlot 2 - software to visualize profiles and time series [Dataset]. http://doi.org/10.1594/PANGAEA.816201
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.816201
Dataset updated
Apr 4, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Sieger, Rainer; Grobe, Hannes; Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven
Description
The program PanPlot 2 was developed as a visualization tool for the information system PANGAEA. It can be used as a stand-alone application to plot data versus depth or time. Data input format is tab-delimited ASCII (e.g. by export from MS-Excel or from PANGAEA). The default scales and graphic features can individualy be modified. PanPlot 2 graphs can be exported in several image formats (BMP, PNG, PDF, and SVG) which can be imported by graphic software for further processing.

!PanPlot is retired since 2017. It is free of charge, is no longer being actively developed or supported, and is provided as-is without warranty.
Z
Metadata, Title Pages, and Network Graph of the Digitized Content of the...
data.niaid.nih.gov
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zellhöfer, David (2024). Metadata, Title Pages, and Network Graph of the Digitized Content of the Berlin State Library (146,000 items) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2582481
Explore at:
Dataset updated
Jul 25, 2024
Dataset authored and provided by
Zellhöfer, David
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Berlin
Description
The data set has been downloaded via the OAI-PMH endpoint of the Berlin State Library/Staatsbibliothek zu Berlin’s Digitized Collections (https://digital.staatsbibliothek-berlin.de/oai) on March 1st 2019 and converted into common tabular formats on the basis of the provided Dublin Core metadata. It contains 146,000 records.

In addition to the bibliographic metadata, representative images of the works have been downloaded, resized to a 512 pixel maximum thumbnail image and saved in JPEG format. The image data is split into title pages and first pages. Title pages have been derived from structural metadata created by scan operators and librarians. If this information was not available, first pages of the media have been downloaded. In case of multi-volume media, title pages are not available.

In total, 141,206 images title/first pages are available.

Furthermore, the tabular data has been cleaned and extended with geo-spatial coordinates provided by the OpenStreetMap project (https://www.openstreetmap.org). The actual data processing steps are summarized in the next section. For the sake of transparency and reproducibility, the original data taken from the OAI-PMH endpoint is still present in the table.

To conclude with, various graphs in GML file format are available that can be loaded directly into graph analysis tools such as Gephi (https://gephi.org/).

The implementation of the data processing steps (incl. graph creation) are available as a Jupyter notebook provided at https://github.com/elektrobohemian/SBBrowse2018/blob/master/DataProcessing.ipynb.

Tabular Metadata

The metadata is available in Excel (cleanedData.xlsx) and CSV (cleanedData.csv) file formats with equal content.

The table contains the following columns. Italique columns have not been processed.

· title The title of the medium

· creator Its creator (family name, first name)

· subject A collection’s name as provided by the library

· type The type of medium

· format A MIME type for full metadata download

· identifier An additional identifier (most often the PPN)

· language A 3-letter language code of the medium

· date The date of creation/publication or a time span

· relation A relation to a project or collection a medium has been digitized for.

· coverage The location of publication or origin (ranging from cities to continents)

· publisher The publisher of the medium.

· rights Copyright information.

· PPN The unique identifier that can be used to find more information about the current medium in all information systems of Berlin State Library/Staatsbibliothek zu Berlin.

· spatialClean In case of multiple entries in coverage, only the first place of origin has been extracted. Additionally, characters such as question marks, brackets, or the like have been removed. The entries have been normalized regarding whitespaces and writing variants with the help of regular expressions.

· dateClean As the original date may contain various format variants to indicate unclear creation dates (e.g., time spans or question marks), this field contains a mapping to a certain point in time.

· spatialCluster The cluster ID determined with the help of the Jaro-Winkler distance on the spatialClean string. This step is needed because the spatialClean fields still contain a huge amount of orthographic variants and latinizations of geographic names.

· spatialClusterName A verbal cluster name (controlled manually).

· latitude The latitude provided by OpenStreetMap of the spatialClusterName if the location could be found.

· longitude The longitude provided by OpenStreetMap of the spatialClusterName if the location could be found.

· century A century derived from the date.

· textCluster A text cluster ID on the basis of a k-means clustering relying on the title field with a vocabulary size of 125,000 using the tf*idf model and k=5,000.

· creatorCluster A text cluster ID based on the creator field with k=20,000.

· titleImage The path to the first/title page relative to the img/ subdirectory or None in case of a multi-volume work.

Other Data

graphs.zip

Various pre-computed graphs.

img.zip

First and title pages in JPEG format.

json.zip

JSON files for each record in the following format:

ppn "PPN57346250X"

dateClean "1625"

title "M. Georgii Gutkii, Gymnasii Berlinensis Rectoris Habitus Primorum Principiorum, Seu Intelligentia; Annexae Sunt Appendicis loco Disputationes super eodem habitu tum in Academia Wittebergensi, tum in Gymnasio Berlinensi ventilatae"

creator "Gutke, Georg"

spatialClusterName "Berlin"

spatialClean "Berolini"

spatialRaw "Berolini"

mediatype "monograph"

subject "Historische Drucke"

publisher "Kallius"

lat "52.5170365"

lng "13.3888599"

textCluster "45"

creatorCluster "5040"

titleImage "titlepages/PPN57346250X.jpg"
Tree Annotation Vocabulary (TAV) - Knowledge Graph and Annotated Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shakeeb Arzoo; Shakeeb Arzoo; Stephen H. Blackwell; Aidan Hogan; Aidan Hogan; Stephen H. Blackwell (2024). Tree Annotation Vocabulary (TAV) - Knowledge Graph and Annotated Dataset [Dataset]. http://doi.org/10.5281/zenodo.13382194
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13382194
Dataset updated
Sep 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shakeeb Arzoo; Shakeeb Arzoo; Stephen H. Blackwell; Aidan Hogan; Aidan Hogan; Stephen H. Blackwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 28, 2024
Description
This dataset contains all the files used in developing the Tree-KG, the knowledge graph to capture the tree annotations in the works of Vladimir Nabokov.

In the Ontology Versions folder, four ontology (TAV) files in turtle (.ttl) format are provided. They are all numbered and dated to represent their different versions. Also, the competency questions (CQs) and sample SPARQL queries provided in .txt files. The KG was developed on Protégé.

(1) contains the schema for the TAV vocabulary without the linking to external vocabularies.

(2) contains the schema for TAV vocabulary with the links to external terms.

(3) contains the Tree-KG along with the data from three Nabokov novels (Mary; King, Queen, Knave; Glory) in a self-contained way.

(4) contains the Tree-KG that reflects the data from three novels (Mary; King, Queen, Knave; Glory) in a linked data way.

(5) contains some of the CQs used to develop TAV (.txt) file.

(6) contains some sample SPARQL queries (.txt) file.

In the Trees of Nabokov-Annotated Dataset folder, 6 spreadsheets in excel (.xlsx) format are provided. They are numbered. Note that annotated data are all in English as the consulted works are the English translations of the literary works of Nabokov.

(1) contains the tree annotations from the novels originally written in Russian by Vladimir Nabokov.

(2) contains the tree annotations from the novels originally written in English by Vladimir Nabokov.

(3) contains the tree annotations from the short stories originally written in Russian and English by Vladimir Nabokov.

(4) is the knowledge base (KB) developed to link the annotated trees to Wikidata and DBPedia.

(5) is the benchmarking results of some entity recognition tools. It also includes the relevant passages from Nabokov's novels that were used in the experiments.

(6) represents the complete bibliographic details of the works of Vladimir Nabokov (https://thenabokovian.org/abbreviations).
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset authored and provided by
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
g
CAFE (Corporate Average Fuel Economy) | gimi9.com
gimi9.com
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CAFE (Corporate Average Fuel Economy) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_cafe-corporate-average-fuel-economy/
Explore at:
Dataset updated
Dec 4, 2024
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
NHTSA's Corporate Average Fuel Economy (CAFE) program requires manufacturers of passenger cars and light trucks, produced for sale in the U.S., to meet CAFE standards, expressed in miles per gallon (mpg). The purpose of the CAFE program is to reduce the nation's energy consumption by increasing the fuel economy of cars and light trucks. The CAFE Public Information Center (PIC) is the authoritative source for Corporate Average Fuel Economy (CAFE) program data. This site allows fuel economy data to be viewed in report and/or graph format. The data can be sorted and filtered to produce custom reports which can also be downloaded as Excel or pdf files. NHTSA periodically updates the CAFE data in the PIC and, therefore, each report and graph is date stamped to indicate the last time NHTSA made updates.
ICSE 2025 - Artifact
figshare.com
pdf
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FARIDAH AKINOTCHO (2025). ICSE 2025 - Artifact [Dataset]. http://doi.org/10.6084/m9.figshare.28194605.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28194605.v1
Dataset updated
Jan 24, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
FARIDAH AKINOTCHO
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Mobile Application Coverage: The 30% Curse and Ways Forward## Purpose In this artifact, we provide the information about our benchmarks used for manual and tool exploration. We include coverage results achieved by tools and human analysts as well as plots of the coverage progression over time for analysts. We further provide manual analysis results for our case study, more specifically extracted reasons for unreachability for the case study apps and extracted code-level properties, which constitute a ground truth for future work in coverage explainability. Finally, we identify a list of beyond-GUI exploration tools and categorize them for future work to take inspiration from. We are claiming available and reusable badges; the artifact is fully aligned with the results described in our paper and comprehensively documented.## ProvenanceThe paper preprint is available here: https://people.ece.ubc.ca/mjulia/publications/Mobile_Application_Coverage_ICSE2025.pdf## Data The artifact submission is organized into five parts:- 'BenchInfo' excel sheet describing our experiment dataset- 'Coverage' folder containing coverage results for tools and analysts (RQ1) - 'Reasons' excel sheet describing our manually extracted reasons for unreachability (RQ2)- 'ActivationProperties' excel sheet describing our manually extracted code properties of unreached activities (RQ3)- 'ActivationProperties-Graph' pdf which presents combinations of the extracted code properties in a graph format.- 'BeyondGUI' folder containing information about identified techniques which go beyond GUI exploration.The artifact requires about 15MB of storage.### Dataset: 'BenchInfo.xlsx'This file list the full application dataset used for experiments into three tabs: 'BenchNotGP' (apps from AndroTest dataset which are not on Google Play), 'BenchGP' (apps from AndroTest which are also on Google Play) and 'TopGP' (top ranked free apps from Google Play). Each tab contains the following information:- Application Name- Package Name- Version Used (Latest)- Original Version- # Activities- Minimum SDK- Target SDK- # Permissions (in Manifest)- List of Permissions (in Manifest)- # Features (in Manifest)- List of Features (in Manifest)The 'TopGP' sheet also includes Google-Play-specific information, namely:- Category (one of 32 app categories)- Downloads- Popularity RankThe 'BenchGP' and 'BenchNotGP' sheets also include the original version (included in the AndroTest benchmark) and the source (one of F-Droid, Github or Google Code Archives).### RQ1: 'Coverage'The 'Coverage' folder includes coverage results for tools and analysts, and is structured as follows:- 'CoverageResults.xlsx": An excel sheet containing the coverage results achieved by each human analysts and tool. - The first tab described the results over all apps for analysts combined, tools combined, and analysts + tools, which map to Table II in the paper. - Each of the following 42 tab, one per app in TopGP, marks the activities reached by Analyst 1, Analyst 2, Tool 1 (ape) and Tool 2 (fastbot), with an 'x' in the corresponding column to indicate that the activity was reached by the given agent.- 'Plots': A folder containing plots of the progressive coverage over time of analysts, split into one folder for 'Analyst1' and one for 'Analyst2'. - Each of the analysts' folder includes a subfolder per benchmark ('BenchNotGP', 'BenchGP' and 'TopGP'), containing as many png files as applications in the benchmark (respectively 47, 14 and 42 image files) named 'ANALYST_[X]_[APP_PACKAGE_NAME]'.png.### RQ2: 'Reasons.xslx'This file contains the extracted reasons for unreachability for the 11 apps manually analyzed. - The 'Summary' tab provides an overview of unreached activities per reasons over all apps and per app, which corresponds to Table III in the paper. - The following 11 tabs, each corresponding to and named after a single application, describe the reasons associated with each activity of that application. Each column corresponds to a single reason and 'x' indicates that the activity is unreached due to the reason in that column. The top row sums up the total number of activities unreached due to a given reason in each column.- The activities at the bottom which are greyed out correspond to activities that were reached during exploration, and are thus excluded from the reason extraction.### RQ3: 'ActivationProperties.xslx'This file contains the full list of activation properties extracted for each of the 185 activities analyzed for RQ2.The first half of the columns (columns C-M) correspond to the reasons (excluding Transitive, Inconclusive and No Caller) and the second half (columns N-AD) correspond to properties described in Figure 5 in the paper, namely:- Exported- Activation Location: - Code: GUI/lifecycle, Other Android or App-specific - Manifest- Activation Guards: - Enforcement: In Code or In Resources - Restriction: Mandatory or Discretionary- Data: - Type: Parameters, Execution Dependencies - Format: Primitive, Strings, ObjectsThe rows are grouped by applications, and each row correspond to an activity of that application. 'x' in a given column indicates the presence of the property in that column within the analyzed path to the activity. The third and fourth rows sums up the numbers and percentages for each property, as reported in Figure 5.### RQ3: 'ActivationProperties-Graph.pdf'This file shows combinations of the individual properties listed in 'ActivationProperties.xlsx' in a graph format, extending the combinations described in Table IV with data (types and format) and reasons for unreachability.### BeyondGUIThis folder includes:- 'ToolInfo.xlsx': an excel sheet listing the identified 22 beyond-GUI papers, the date of publication, availability, invasiveness (Source code, Bytecode, framework, OS) and their targeting strategy (None, Manual or Automated).- ToolClassification.pdf': a pdf file describing our paper selection methodology as well as a classication of the techniques in terms of Invocation Strategy, Navigation Strategy, Value Generation Strategy, and Value Generation Types. We fully introduced these categories in the pdf file.## Requirements & technology skills assumed by the reviewer evaluating the artifactThe artifact entirely consists of Excel sheets which can be opened with common Excel visualization software, i.e., Microsoft Excel, coverage plots as PNG files and PDF files. It requires about 15MB of storage in total.No other specific technology skills are required of the reviewer evaluating the artifact.
A
CAFE (Corporate Average Fuel Economy)
data.amerigeoss.org
html
Updated Dec 19, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2018). CAFE (Corporate Average Fuel Economy) [Dataset]. https://data.amerigeoss.org/dataset/cafe-corporate-average-fuel-economy1
Explore at:
htmlAvailable download formats
Dataset updated
Dec 19, 2018
Dataset provided by
United States
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
NHTSA's Corporate Average Fuel Economy (CAFE) program requires manufacturers of passenger cars and light trucks, produced for sale in the U.S., to meet CAFE standards, expressed in miles per gallon (mpg). The purpose of the CAFE program is to reduce the nation's energy consumption by increasing the fuel economy of cars and light trucks. The CAFE Public Information Center (PIC) is the authoritative source for Corporate Average Fuel Economy (CAFE) program data. This site allows fuel economy data to be viewed in report and/or graph format. The data can be sorted and filtered to produce custom reports which can also be downloaded as Excel or pdf files. NHTSA periodically updates the CAFE data in the PIC and, therefore, each report and graph is date stamped to indicate the last time NHTSA made updates.
d
Florida Temperature-Depth Logs
datadiscoverystudio.org
data.wu.ac.at
zip
Updated Mar 7, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Smith (2013). Florida Temperature-Depth Logs [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/65089f7070144bea96af155f2adbadbd/html
Explore at:
zipAvailable download formats
Dataset updated
Mar 7, 2013
Authors
Andy Smith
Area covered

Description
This resource is a compilation of individual Excel workbooks containing temperature-depth log data and graphic profiles for wells in Florida counties. The files are provided in zipped archival folders. Each file contains a Resource Provider worksheet with contact information for the data source, and additional worksheets for each temperature-depth dataset and profile by site ID. Each set of data and profile includes information related to the log date, site name, site ID, county, location (lat/long), and information source. In addition to the temp-depth profile data, a graph is provided. The data were provided by the Florida Geological Survey and are available in the following format: Excel workbooks for download. The data were provided by the Florida Geological Survey and made available for distribution through the National Geothermal Data System.
SciQA benchmark: Dataset and RDF dump
zenodo.org
zip
Updated Mar 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sören Auer; Sören Auer; Dante A. C. Barone; Dante A. C. Barone; Cassiano Bartz; Eduardo G. Cortes; Eduardo G. Cortes; Mohamad Yaser Jaradeh; Mohamad Yaser Jaradeh; Oliver Karras; Oliver Karras; Manolis Koubarakis; Manolis Koubarakis; Dmitry Mouromtsev; Dmitry Mouromtsev; Dmitrii Pliukhin; Dmitrii Pliukhin; Daniil Radyush; Daniil Radyush; Ivan Shilin; Ivan Shilin; Markus Stocker; Markus Stocker; Eleni Tsalapati; Eleni Tsalapati; Cassiano Bartz (2023). SciQA benchmark: Dataset and RDF dump [Dataset]. http://doi.org/10.5281/zenodo.7707888
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7707888
Dataset updated
Mar 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sören Auer; Sören Auer; Dante A. C. Barone; Dante A. C. Barone; Cassiano Bartz; Eduardo G. Cortes; Eduardo G. Cortes; Mohamad Yaser Jaradeh; Mohamad Yaser Jaradeh; Oliver Karras; Oliver Karras; Manolis Koubarakis; Manolis Koubarakis; Dmitry Mouromtsev; Dmitry Mouromtsev; Dmitrii Pliukhin; Dmitrii Pliukhin; Daniil Radyush; Daniil Radyush; Ivan Shilin; Ivan Shilin; Markus Stocker; Markus Stocker; Eleni Tsalapati; Eleni Tsalapati; Cassiano Bartz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SciQA benchmark of questions and queries.

The data dump is in NTriples format (RDF NT) taken from the orkg system on 14.03.2023 at 02:04PM.
The dump can be imported into a virtuoso endpoint or any RDF engine so it can be queried.

The questions/queries are provided as spread sheets (Excel format & CSV format), also train and test files are provided.

Types of questions and queries:

Handcrafted set of 100 questions

Auto-generated set of 2465 questions

More details on certain columns:
"Classification rationale" It may contain the following values:

Nested facts in the question

Sorting, sum, average, minimum, maximum or count calculation required

Filter used

Mappings of Asking Point in the question to the ORKG ontology

Explanation of Rationale for Non-factoid:

Nested facts in the question. An entity (e.g., a system or a paper) or predicate is requested that is not explicitly stated in the question text and must be inferred while searching for an answer.

Sorting, sum, average, minimum, maximum or count calculation required. To get the answer to the question it is necessary to make an aggregation of the query results.

Filter used. To get the answer to the question it is necessary to use filtering of the query results by some conditions.
g
Tax on Construction, Installations and Works (ICIO). Self-assessments |...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tax on Construction, Installations and Works (ICIO). Self-assessments | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-datos-madrid-es-egob-catalogo-204285-0-impuesto-icio/
Explore at:
Description
Tax on Construction, Installations and Works (ICIO): Self-assessment information is presented since 2005. In csv format are presented: Number of self-assessments/year Amounts/year The same data are presented in Excel format, as reports with totals and graphs for the last ten years.
C
Image stitching data set
ckan.mobidatalab.eu
txt, xlsx, zip
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bundesanstalt für Wasserbau (2023). Image stitching data set [Dataset]. https://ckan.mobidatalab.eu/dataset/imagestitchingrecord
Explore at:
zip, txt, xlsxAvailable download formats
Dataset updated
Jun 14, 2023
Dataset provided by
Bundesanstalt für Wasserbau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data set for the essay "Automatic merging of separated construction plans of hydraulic structures" submitted for Bautechnik 5/22. The data set is structured as follows: - The ZIP file "01 Original Data" contains 233 folders (named after the TU IDs) with the associated partial recordings in TIF format. The TIFs are binary compressed in CCITT Fax 4 format. 219 TUs are divided into two parts and 14 into three parts. The original data therefore consists of 480 partial recordings. - The ZIP file "02 Interim Results" contains 233 folders (named after the TU IDs) with relevant intermediate results generated during stitching. This includes the input images scaled to 10 MP, the visualization of the feature assignment(s) and the result in downscaled resolution with visualized seam lines. - The ZIP file "03_Results" contains the 170 successfully merged plans in high resolution in TIF format - The Excel file "Dataset" contains metadata on the 233 examined TUs including the DOT graph of the assignment described in the work and the correctness rating the results and the assignment to the presented sources of error. The data set was generated with the following metadata query in the IT system Digital Management of Technical Documents (DVtU): Microfilm metadata - TA (partial recording) - Number: "> 1" Document metadata - Object part: "130 (Wehrwangen, Wehrpillars)" - Object ID no .: "213 (Weir systems)" - Detail: "*[Bb]wehrung*" - Version: "01.00.00"
d
Figures 12, 13 and 14 in Exel file format
figshare.dmu.ac.uk
xlsx
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoff Smith (2025). Figures 12, 13 and 14 in Exel file format [Dataset]. http://doi.org/10.21253/DMU.29281709.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.21253/DMU.29281709.v1
Dataset updated
Jul 29, 2025
Dataset provided by
De Montfort University
Authors
Geoff Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figure 12, Figure 13, and Figure 14 show selected imaginary and real part capacitance spectra (graphs a and b, respectively) recorded over the freezing period for the TVIS vial containing 5% sucrose , 5% sucrose and 0.26% NaCl, and 5% sucrose and 0.55% NaCl. The right side shows the derived parameters of the log of the peak frequency (graph c) and the peak amplitude (graph d) for the principle relaxation peak observed in the imaginary capacitance spectra, i.e.,Maxwell-Wagner process (MW) for the 5% sucrose solution (low conductivity).Interfacial capacitance (IC) for the 5% sucrose solutions with 0.26% and 0.55% solutions (high conductivity).Dielectric relaxation of ice for the frozen state of all three solutions.The right side of Figure 12, Figure 13, and Figure 14 also show the derived parameters of the real capacitance at low and high frequency, given the symbols(10 Hz) and(0.2 MHz), respectively (graphs e and f, respectively). Overlayed on these profiles are the arrows pointing from left to right which have been drawn to indicate/highlight the differences in the temperature dependencies of the real part capacitance in the limits of low and high frequency.The period marked TP is the transition period in which either of the two conditions apply: (i) the peaks of any of the processes are no longer visible within the experimental frequency region of the TVIS instrument, or (ii) the peak observed is a hybrid of multiple processes, for example a peak that comprises contributions from the MW relaxation and from the dielectric relaxation of ice.The other Excel files give a comprehensive set of data, for all three samples, 5% w/v sucrose (with 0% w/vNaCl), 5% w/v sucrose (with 0.26% w/v NaCl), and 5% w/v sucrose (with 0.55% w/v NaCl), to support figures 12, 13, and 14
S
Saint Vincent and the Grenadines's Time required to start a business(2005 to...
en.graphtochart.com
csv
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LBB Limited Liability Company (2021). Saint Vincent and the Grenadines's Time required to start a business(2005 to 2019) [Dataset]. https://en.graphtochart.com/private-sector/saint-vincent-and-the-grenadines-time-required-to-start-a-business-days.php
Explore at:
csvAvailable download formats
Dataset updated
May 29, 2021
Dataset authored and provided by
LBB Limited Liability Company
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2005 - 2019
Area covered

Description
Saint Vincent and the Grenadines's Time required to start a business is 10[Days] which is the 111th highest in the world ranking. Transition graphs on Time required to start a business in Saint Vincent and the Grenadines and comparison bar charts (USA vs. China vs. Japan vs. Saint Vincent and the Grenadines), (Grenada vs. Tonga vs. Saint Vincent and the Grenadines) are used for easy understanding. Various data can be downloaded and output in csv format for use in EXCEL free of charge.
Data from: PanPlot – software to visualize profiles and core logs
doi.pangaea.de
search.dataone.org
+1more
html, tsv
Updated Dec 5, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rainer Sieger; Hannes Grobe (2005). PanPlot – software to visualize profiles and core logs [Dataset]. http://doi.org/10.1594/PANGAEA.330147
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.330147
Dataset updated
Dec 5, 2005
Dataset provided by
PANGAEA
Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven
Authors
Rainer Sieger; Hannes Grobe
License
https://www.gnu.org/licenses/gpl-3.0https://www.gnu.org/licenses/gpl-3.0
Variables measured
File size, File content, Uniform resource locator/link to file
Description
The program PanPlot was developed as a visualization tool for the information system PANGAEA. It can be used as a stand-alone application to plot data versus depth or time or in a ternary view. Data input format is tab-delimited ASCII (e.g. by export from MS-Excel or from PANGAEA). The default scales and graphic features can individualy be modified. PanPlot graphs can be exported in platform-specific interchange formats (EMF, PICT) which can be imported by graphic software for further processing.
A
Aichi's Persons who began work in the past year(1979 to 2017)
en.graphtochart.com
csv
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LBB Limited Liability Company (2021). Aichi's Persons who began work in the past year(1979 to 2017) [Dataset]. https://en.graphtochart.com/japan/aichi-persons-who-began-work-in-the-past-year.php
Explore at:
csvAvailable download formats
Dataset updated
Apr 9, 2021
Dataset authored and provided by
LBB Limited Liability Company
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1979 - 2017
Area covered

Description
's Persons who began work in the past year is 247,600person which is the 4th highest in Japan (by Prefecture). Transition Graphs and Comparison chart between Aichi and Osaka(Osaka) and Saitama(Saitama)(Closest Prefecture in Population) are available. Various data can be downloaded and output in csv format for use in EXCEL free of charge.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1

Graph Input Data Example.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.7506209.v1

Dataset updated

Dec 26, 2018

Dataset provided by

Figsharehttp://figshare.com/

Authors

Dr Corynen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.

Clear search

Close search

Google apps

Main menu

Graph Input Data Example.xlsx

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Superstore Sales Analysis

PanPlot 2 - software to visualize profiles and time series - Dataset -...

Supplementary data files for manuscript titled "From spreadsheet lab data...

PanPlot 2 - software to visualize profiles and time series

Metadata, Title Pages, and Network Graph of the Digitized Content of the...

Tree Annotation Vocabulary (TAV) - Knowledge Graph and Annotated Dataset

A study on real graphs of fake news spreading on Twitter

CAFE (Corporate Average Fuel Economy) | gimi9.com

ICSE 2025 - Artifact

CAFE (Corporate Average Fuel Economy)

Florida Temperature-Depth Logs

SciQA benchmark: Dataset and RDF dump

Tax on Construction, Installations and Works (ICIO). Self-assessments |...

Image stitching data set

Figures 12, 13 and 14 in Exel file format

Saint Vincent and the Grenadines's Time required to start a business(2005 to...

Data from: PanPlot – software to visualize profiles and core logs

Aichi's Persons who began work in the past year(1979 to 2017)

Graph Input Data Example.xlsx