34 datasets found

g
Data from: United States Geological Survey Digital Cartographic Data...
datasearch.gesis.org
icpsr.umich.edu
v1
Updated Aug 5, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of the Interior. United States Geological Survey (2015). United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps [Dataset]. http://doi.org/10.3886/ICPSR08379.v1
Explore at:
v1Available download formats
Unique identifier
https://doi.org/10.3886/ICPSR08379.v1
Dataset updated
Aug 5, 2015
Dataset provided by
da|ra (Registration agency for social science and economic data)
Authors
United States Department of the Interior. United States Geological Survey
Description
This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Example of an interactive line graph.
plos.figshare.com
xml
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Vesna D. Garovic; Marko Savic; Stacey J. Winham; Natasa M. Milic (2023). Example of an interactive line graph. [Dataset]. http://doi.org/10.1371/journal.pbio.1002545.s001
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002545.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Vesna D. Garovic; Marko Savic; Stacey J. Winham; Natasa M. Milic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This example can be viewed by uploading S1 Data into the web-based tool (http://statistika.mfub.bg.ac.rs/interactive-graph/). (XML)
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
Z
Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4
data.niaid.nih.gov
Updated Dec 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Baran Kılıç (2022). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7157853
Explore at:
Dataset updated
Dec 14, 2022
Dataset provided by
Alper Şen
Baran Kılıç
Can Özturan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain.

Part1 is available at https://zenodo.org/deposit/7157356 Part3 is available at https://zenodo.org/deposit/7158133 Part4 is available at https://zenodo.org/deposit/7158328

Details of the datasets are given below:

FILENAME FORMAT:

The filenames have the following format:

btc-tx---.bz2

where is the starting block number, final block number, and is the split part of the file.

For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from

block 100000 to block 149999 inclusive.

The files are compressed with bzip2. They can be uncompressed using command bunzip2.

TRANSACTION FORMAT:

Each line in a file corresponds to a transaction. The transaction has the following format:

Type of transaction (i.e. BTC-IN or BTC-OUT).

Number of the block which contains the transaction.

Position of the transaction in the block (i.e. transaction number in the block).

Source bitcoin address/transaction of the transfer.

Destination bitcoin address/transaction of the transfer.

Amount of transfer.

BLOCK TIME FORMAT:

The block time file has the following format:

Number of the block.

Unix timestamp at which the block is mined as a hexadecimal number.

IMPORTANT NOTE:

Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything.

NOTE:

If you use this dataset, please do not forget to add the DOI number to the citation.

If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14

@incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }
Data from: PDD Graph: Bridging Electronic Medical Records and Biomedical...
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meng Wang; Jiaheng Zhang; Jun Liu; Wei Hu; Sen Wang; Xue Li; Wenqiang Liu (2023). PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking [Dataset]. http://doi.org/10.6084/m9.figshare.5242138
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5242138
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Meng Wang; Jiaheng Zhang; Jun Liu; Wei Hu; Sen Wang; Xue Li; Wenqiang Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Patient-drug-disease (PDD) Graph dataset, utilising Electronic medical records (EMRS) and biomedical Knowledge graphs. The novel framework to construct the PDD graph is described in the associated publication.PDD is an RDF graph consisting of PDD facts, where a PDD fact is represented by an RDF triple to indicate that a patient takes a drug or a patient is diagnosed with a disease. For instance, (pdd:274671, pdd:diagnosed, sepsis)Data files are in .nt N-Triple format, a line-based syntax for an RDF graph. These can be accessed via openly-available text edit software.diagnose_icd_information.nt - contains RDF triples mapping patients to diagnoses. For example:(pdd:18740, pdd:diagnosed, icd99592),where pdd:18740 is a patient entity, and icd99592 is the ICD-9 code of sepsis.drug_patients.nt- contains RDF triples mapping patients to drugs. For example:(pdd:18740, pdd:prescribed, aspirin),where pdd:18740 is a patient entity, and aspirin is the drug's name.Background:Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Faced with patients' symptoms, experienced caregivers make the right medical decisions based on their professional knowledge, which accurately grasps relationships between symptoms, diagnoses and corresponding treatments. In the associated paper, we aim to capture these relationships by constructing a large and high-quality heterogenous graph linking patients, diseases, and drugs (PDD) in EMRs. Specifically, we propose a novel framework to extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented in this paper is accessible on the Web via the SPARQL endpoint as well as in .nt format in this repository, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.De-identificationIt is necessary to mention that MIMIC-III contains clinical information of patients. Although the protected health information was de-identifed, researchers who seek to use more clinical data should complete an on-line training course and then apply for the permission to download the complete MIMIC-III dataset: https://mimic.physionet.org/
u
Code book of RTL visualization in Arabic News media
rdr.ucl.ac.uk
xlsx
Updated Jul 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muna Alebri; No ̈elle Rakotondravony; Lane Harrison (2024). Code book of RTL visualization in Arabic News media [Dataset]. http://doi.org/10.5522/04/26150749.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.5522/04/26150749.v1
Dataset updated
Jul 3, 2024
Dataset provided by
University College London
Authors
Muna Alebri; No ̈elle Rakotondravony; Lane Harrison
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this project, we aimed to map the visualisation design space of visualisation embedded in right-to-left (RTL) scripts. We aimed to expand our knowledge of visualisation design beyond the dominance of research based on left-to-right (LTR) scripts. Through this project, we identify common design practices regarding the chart structure, the text, and the source. We also identify ambiguity, particularly regarding the axis position and direction, suggesting that the community may benefit from unified standards similar to those found on web design for RTL scripts. To achieve this goal, we curated a dataset that covered 128 visualisations found in Arabic news media and coded these visualisations based on the chart composition (e.g., chart type, x-axis direction, y-axis position, legend position, interaction, embellishment type), text (e.g., availability of text, availability of caption, annotation type), and source (source position, attribution to designer, ownership of the visualisation design). Links are also provided to the articles and the visualisations. This dataset is limited for stand-alone visualisations, whether they were single-panelled or included small multiples. We also did not consider infographics in this project, nor any visualisation that did not have an identifiable chart type (e.g., bar chart, line chart). The attached documents also include some graphs from our analysis of the dataset provided, where we illustrate common design patterns and their popularity within our sample.
h
SynthChartNet
huggingface.co
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Docling (2025). SynthChartNet [Dataset]. https://huggingface.co/datasets/ds4sd/SynthChartNet
Explore at:
Dataset updated
Jul 31, 2025
Dataset authored and provided by
Docling
License
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
Description
SynthChartNet

SynthChartNet is a multimodal dataset designed for training the SmolDocling model on chart-based document understanding tasks. It consists of 1,981,157 synthetically generated samples, where each image depicts a chart (e.g., line chart, bar chart, pie chart, stacked bar chart), and the associated ground truth is given in OTSL format. Charts were rendered at 120 DPI using a diverse set of visualization libraries: Matplotlib, Seaborn, and Pyecharts, enabling… See the full description on the dataset page: https://huggingface.co/datasets/ds4sd/SynthChartNet.
Chart Viewer
anla-esp-esri-co.hub.arcgis.com
city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com
Updated Sep 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
esri_en (2021). Chart Viewer [Dataset]. https://anla-esp-esri-co.hub.arcgis.com/items/be4582b38d764de0a970b986c824acde
Explore at:
Dataset updated
Sep 22, 2021
Dataset provided by
Esrihttp://esri.com/
Authors
esri_en
Description
Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.

Dataset for On the regular linear spaces up to order 16

zenodo.org

application/gzip

Updated Sep 6, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Anton Betten; Dieter Betten; Daniel Heinlein; Daniel Heinlein; Patric R. J. Östergård; Patric R. J. Östergård; Anton Betten; Dieter Betten (2023). Dataset for On the regular linear spaces up to order 16 [Dataset]. http://doi.org/10.5281/zenodo.7890664

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7890664

Dataset updated

Sep 6, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anton Betten; Dieter Betten; Daniel Heinlein; Daniel Heinlein; Patric R. J. Östergård; Patric R. J. Östergård; Anton Betten; Dieter Betten

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This dataset contains, up to isomorphism, all (15_4,20_3) and (15_5,25_3) configurations, all (16_6,32_3) configurations with nontrivial automorphisms, as well as all 4-regular graphs on 15 vertices, 6-regular graphs on 15 vertices, 3-regular graphs on 16 vertices, and 4-regular graphs on 17 vertices. The configurations uniquely give regular linear spaces with parameters (15|2^45,3^20), (15|2^30,3^25), and (16|2^24,3^32). All files are compressed with gzip.

The dataset supplements the publication "On the Regular Linear Spaces up to Order 16" by Anton Betten, Dieter Betten, Daniel Heinlein, and Patric R. J. Östergård.

In the files containing configurations, each line is a configuration with the syntax

Example:
Assuming a total of 15 points labeled with {0,...,14}, the characteristic vector of a block {1,3,14} is
(0)100|0000|0000|1010
The first bit is padding as each hexadecimal number encodes four bits. Vertical bars designate groups of four bits. Consequently, the block is encoded as
400a

The following example shows the first line of one of the files:
$ zcat conf_15_4_20_3.txt.gz | head -n1
15 20 1081 4101 2201 0c01 0026 004a 0092 4402 008c 0054 0a04 0038 2108 1110 0160 0620 08c0 5200 3400 6800 A1

For the files containing graphs, we apply the graph6 file format but we extend each line by the corresponding number of automorphisms as described for configurations above, without the letter A. Programs for manipulating graphs in the graph6 format can be found in the gtools package that comes with the graph isomorphism program nauty (https://pallini.di.uniroma1.it/). Details regarding the graph6 format can be found in the documentation of nauty (https://pallini.di.uniroma1.it/Guide.html).

For graphs with a most 62 vertices, which holds in all cases here, a line in graph6 format is the ASCII converted equivalent of

Example:
Assume a graph with 5 vertices and edges: 02, 04, 13, 34 (the path 2-0-4-3-1), which has the adjacency matrix
00101
00010
10000
01001
10010
Hence, the upper triangle read column-wise is
0100101001
After padding we get
010010100100
and after grouping
010010|100100
Converting to decimal and adding 63 gives
63+16+2|63+32+4
that is
81|99
The number of vertices is 5, so we prepend 5+63=68:
68 81 99
The line in graph6 format is therefore
DQc
and our nonstandard appending of the order of the automorphism group gives
DQc 2

The first line of one of the files is as follows:
$ zcat graph_15_4.txt.gz | head -n1
Ns_???BAwjDoTOY_M_? 2

The orders of the automorphism groups and the numbers of isomorphism classes are as follows. The (up to isomorphism) 114711393113 (16_6,32_3) regular linear spaces with no nontrivial automorphisms are not stored.

	(15_4,20_3)	(15_5,25_3)	(16_6,32_3)
1	251712191	1442354689	114711393113
2	94229	180367	1125379
3	1129	2178	17287
4	915	936	3054
5	29	33
6	142	180	240
8	85	36	50
9		4
10	4	4
12	10	13	30
15	1
16	7		3
18	4	3	2
20	2	2
24	10	5	2
30	1
32			1
36	4		2
40	2	1
48	4		1
72		1
96			1
120		1
600		1
720	1
total	251808770	1442538454	114712539165

	4-regular graphs with 15 vertices	6-regular graphs with 15 vertices	3-regular graphs with 16 vertices	4-regular graphs with 17 vertices
1	656794	1396131168	1547	76356249
2	119881	69928313	1261	8665624
3	17	630	2	127
4	21500	3848635	667	997704
5		14
6	409	55060	15	27213
8	4789	274294	330	131662
10	10	35
12	352	21334	11	12577
14				4
16	1020	23435	147	19786
18	1	10		2
20	7	12
24	210	5596	11	4344
28				18
30	4	7
32	243	2463	51	3320
34				3
36	1	128		53
48	106	1453	33	1500
56	1			15
60	2	2
64	54	285	16	639
68				1
72	6	165	2	96
96	41	309	24	504
112				7
120	5			692
128	10	48	4	132
140				1
144	10	74	3	82
168	1			1
192	14	77	20	193
216		2		3
224	2			6
240	18	1	2	497
256	1	6	1	24
280				1
288	5	36	9	53
320		4
384	6	26	11	58
432		9	3	2
448

f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
S
A dataset on the carbon release capacity of wetland plants and its effect on...
scidb.cn
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tan Peiyang; Huang Xin; Hou Zhiyong; Xie Yonghong; Li Yang; Mei Jinhua (2024). A dataset on the carbon release capacity of wetland plants and its effect on nitrogen removal from artificial wetlands [Dataset]. http://doi.org/10.57760/sciencedb.j00001.00818
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00001.00818
Dataset updated
Jun 7, 2024
Dataset provided by
Science Data Bank
Authors
Tan Peiyang; Huang Xin; Hou Zhiyong; Xie Yonghong; Li Yang; Mei Jinhua
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of two parts: (1) The variation rules of nutrient release from carbon sources of wetland plants. After the experiment began, water samples were collected at the same period, the original and average concentrations of TOC and TN of each sample were tested and counted, and line charts were drawn. (2) Data on the influence of carbon source materials on nitrogen removal performance of Argento, Canna and corncob. From December 8 to April 27, 2019, water samples of each treatment were collected at the same time, the original concentration, average concentration, carbon source utilization rate and nitrogen removal efficiency of TOC, NO3--N, NH4+-N and TN of each sample were tested and counted, and a line chart was drawn.
OAGT Paper Topic Dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated May 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erion Çano; Erion Çano (2022). OAGT Paper Topic Dataset [Dataset]. http://doi.org/10.5281/zenodo.6560535
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6560535
Dataset updated
May 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Erion Çano; Erion Çano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.

The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
under ODC-BY license.

This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

If using it, please cite the following paper:

Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249
i
Netherlands Twin Register. (2024). Snellen Chart [Data set]. Vrije...
data.individualdevelopment.nl
Updated Oct 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Netherlands Twin Register. (2024). Snellen Chart [Data set]. Vrije Universiteit Amsterdam. https://doi.org/10.60641/tf9p-gb90 [Dataset]. https://data.individualdevelopment.nl/dataset/2a3ab7ce544c67c3c8bbe719d7870ce4
Explore at:
Dataset updated
Oct 17, 2024
Area covered
Netherlands, Amsterdam
Description
A Snellen chart is an eye chart that can be used to measure visual acuity. The Snellen chart is printed with eleven lines of block letters. The first line consists of one very large letter, which may be one of several letters, for example E, H, or N. Subsequent rows have increasing numbers of letters that decrease in size. A person taking the test covers one eye from 6 metres/20 feet away, and reads aloud the letters of each row, beginning at the top. The smallest row that can be read accurately indicates the visual acuity in that specific eye. In NTR, the Snellen chart was tested at the MRI scanner.
f
Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A...
frontiersin.figshare.com
xlsx
Updated Jun 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Loffing (2023). Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.XLSX [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2022.808469.s002
Dataset updated
Jun 15, 2023
Dataset provided by
Frontiers
Authors
Florian Loffing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
C
Event Graph of BPI Challenge 2019
data.4tu.nl
zip
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14169614.v1
Dataset updated
Apr 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Dirk Fahland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents
-------------

neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

The data contains the following entities and their events

- PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
- POItem - an item in a Purchase Order document describing a specific item to be purchased
- Resource - the user or worker handling the document or a specific item
- Vendor - the external organization from which an item is to be purchased

Data Size
---------

BPIC19, nodes: 1926651, relationships: 15082099
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
H
Data from: Value Line Investment Survey
dataverse.harvard.edu
search.dataone.org
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Value Line Publishing (2024). Value Line Investment Survey [Dataset]. http://doi.org/10.7910/DVN/P0RROU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/P0RROU
Dataset updated
Jun 10, 2024
Dataset provided by
Harvard Dataverse
Authors
Value Line Publishing
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/P0RROUhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/P0RROU
Time period covered
Jan 4, 1980 - Dec 31, 1989
Description
The Value Line Investment Survey is one of the oldest, continuously running investment advisory publications. Since 1955, the Survey has been published in multiple formats including print, loose-leaf, microfilm and microfiche. Data from 1997 to present is now available online. The Survey tracks 1700 stocks across 92 industry groups. It provides reported and projected measures of firm performance, proprietary rankings and analysis for each stock on a quarterly basis. DATA AVAILABLE FOR YEARS: 1980-1989 This dataset, a subset of the Survey covering the years 1980-1989 has been digitized from the microfiche collection available at the Dewey Library (FICHE HG 4501.V26). It is only available to MIT students and faculty for academic research. Published weekly, each edition of the Survey has the following three parts: Summary & Index: includes an alphabetical listing of all industries with their relative ranking and the page number for detailed industry analysis. It also includes an alphabetical listing of all stocks in the publication with references to their location in Part 3, Ratings & Reports. Selection & Opinion: contains the latest economic and stock market commentary and advice along with one or more pages of research on interesting stocks or industries, and a variety of pertinent economic and stock market statistics. It also includes three model stock portfolios. Ratings & Reports: This is the core of the Value Line Investment Survey. Preceded by an industry report, each one-page stock report within that industry includes Timeliness, Safety and Technical rankings, 3-to 5-year analyst forecasts for stock prices, income and balance sheet items, up to 17 years of historical data, and Value Line analysts’ commentaries. The report also contains stock price charts, quarterly sales, earnings, and dividend information. Publication Schedule: Each edition of the Survey covers around 130 stocks in seven to eight industries on a preset sequential schedule so that all 1700 stocks are analyzed once every 13 weeks or each quarter. All editions are numbered 1-13 within each quarter. For example, in 1980, reports for Chrysler appear in edition 1 of each quarter on the following dates: January 4, 1980 – page 132 April 4, 1980 – page 133 July 4, 1980 – page 133 October 1, 1980 – page 133 Reports for Coca-Cola were published in edition 10 of each quarter on: March 7, 1980 – page 1514 June 6, 1980 – page 1518 Sept. 5, 1980 – page 1517 Dec. 5, 1980 – page 1548 Any significant news affecting a stock between quarters is covered in the supplementary reports that appear at the end of part 3, Ratings & Reports. File format: Digitized files within this dataset are in PDF format and are arranged by publication date within each compressed annual folder. How to Consult the Value Line Investment Survey: To find reports on a particular stock, consult the alphabetical listing of stocks in the Summary & Index part of the relevant weekly edition. Look for the page number just to the left of the company name and then use the table below to identify the edition where that page number appears. All editions within a given quarter are numbered 1-13 and follow equally sized page ranges for stock reports. The table provides page ranges for stock reports within editions 1-13 of 1980 Q1. It can be used to identify edition and page numbers for any quarter within a given year. Ratings & Reports Edition Pub. Date Pages 1 04-Jan-80 100-242 2 11-Jan-80 250-392 3 18-Jan-80 400-542 4 25-Jan-80 550-692 5 01-Feb-80 700-842 6 08-Feb-80 850-992 7 15-Feb-80 1000-1142 8 22-Feb-80 1150-1292 9 29-Feb-80 1300-1442 10 07-Mar-80 1450-1592 11 14-Mar-80 1600-1742 12 21-Mar-80 1750-1908 13 28-Mar-80 2000-2142 Another way to navigate to the Ratings & Reports part of an edition would be to look around page 50 within the PDF document. Note that the page numbers of the PDF will not match those within the publication.
Environmental data associated to particular health events example dataset
data.europa.eu
unknown
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Environmental data associated to particular health events example dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5823426?locale=cs
Explore at:
unknown(6689542)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set is a collection of environmental records associated with the individual events. The data set has been generated using the serdif-api wrapper (https://github.com/navarral/serdif-api) when sending a CSV file with example events for the Republic of Ireland. The serdif-api send a semantic query that (i) selects the environmental data sets within the region of the event, (ii) filters by the specific period of interest from the event, (iii) aggregates the data sets using the minimum, maximum, average or sum for each of the available variables for a specific time unit. The aggregation method and the time unit can be passed to the serdif-api through the Command Line Interface (CLI) (see example in https://github.com/navarral/serdif-api). The resulting data set format can be also specified as data table (CSV) or as graph (RDF) for analysis and publication as FAIR data. The open-ready data for research is retrieved as a zip file that contains: (i) data as csv: environmental data associated to particular events as a data table (ii) data as rdf: environmental data associated to particular events as a graph (iii) metadata for publication as rdf: metadata record with generalized information about the data that do not contain personal data as a graph; therefore, publishable. (iv) metadata for research as rdf: metadata records with detailed information about the data, such as individual dates, regions, data sets used and data lineage; which could lead to data privacy issues if published without approval from the Data Protection Officer (DPO) and data controller.
Transaction Graph Dataset for the Ethereum Blockchain
zenodo.org
data.europa.eu
Updated Dec 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç (2022). Transaction Graph Dataset for the Ethereum Blockchain [Dataset]. http://doi.org/10.5281/zenodo.3669937
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3669937
Dataset updated
Dec 19, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç
Description
This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain.

Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward transactions are not currently included in the dataset.

Details of the datasets are given below:

FILENAME FORMAT:

The filenames have the following format:

eth-tx-

where

For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from

block 1000000 to block 1099999 inclusive.

The files are compressed with bzip2. They can be uncompressed using command bunzip2.

TRANSACTION FORMAT:

Each line in a file corresponds to a transaction. The transaction has the following format:

units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20

contract) are indicated by token symbol. For example GUSD is Gemini USD stable

coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens.

decoder-error.txt FILE:

This file contains transactions (block no, tx no, tx hash) on each line that produced

error while decoding calldata. These transactions are not present in the data files.

er20tokens.json FILE:

This file contains the list of popular ERC20 token contracts whose transfer/transferFrom

transactions appear in the data files.

-------------------------------------------------------------------------------------------

[

{

"address": "0xdac17f958d2ee523a2206206994597c13d831ec7",

"decdigits": 6,

"symbol": "USDT",

"name": "Tether-USD"

},

{

"address": "0xB8c77482e45F1F44dE1745F52C74426C631bDD52",

"decdigits": 18,

"symbol": "BNB",

"name": "Binance"

},

{

"address": "0x2af5d2ad76741191d15dfe7bf6ac92d4bd912ca3",

"decdigits": 18,

"symbol": "LEO",

"name": "Bitfinex-LEO"

},

{

"address": "0x514910771af9ca656af840dff83e8264ecf986ca",

"decdigits": 18,

"symbol": "LNK",

"name": "Chainlink"

},

{

"address": "0x6f259637dcd74c767781e37bc6133cd6a68aa161",

"decdigits": 18,

"symbol": "HT",

"name": "HuobiToken"

},

{

"address": "0xf1290473e210b2108a85237fbcd7b6eb42cc654f",

"decdigits": 18,

"symbol": "HEDG",

"name": "HedgeTrade"

},

{

"address": "0x9f8f72aa9304c8b593d555f12ef6589cc3a579a2",

"decdigits": 18,

"symbol": "MKR",

"name": "Maker"

},

{

"address": "0xa0b73e1ff0b80914ab6fe0444e65848c4c34450b",

"decdigits": 8,

"symbol": "CRO",

"name": "Crypto.com"

},

{

"address": "0xd850942ef8811f2a866692a623011bde52a462c1",

"decdigits": 18,

"symbol": "VEN",

"name": "VeChain"

},

{

"address": "0x0d8775f648430679a709e98d2b0cb6250d2887ef",

"decdigits": 18,

"symbol": "BAT",

"name": "Basic-Attention"

},

{

"address": "0xc9859fccc876e6b4b3c749c5d29ea04f48acb74f",

"decdigits": 0,

"symbol": "INO",

"name": "INO-Coin"

},

{

"address": "0x8e870d67f660d95d5be530380d0ec0bd388289e1",

"decdigits": 18,

"symbol": "PAX",

"name": "Paxos-Standard"

},

{

"address": "0x17aa18a4b64a55abed7fa543f2ba4e91f2dce482",

"decdigits": 18,

"symbol": "INB",

"name": "Insight-Chain"

},

{

"address": "0xc011a72400e58ecd99ee497cf89e3775d4bd732f",

"decdigits": 18,

"symbol": "SNX",

"name": "Synthetix-Network"

},

{

"address": "0x1985365e9f78359a9B6AD760e32412f4a445E862",

"decdigits": 18,

"symbol": "REP",

"name": "Reputation"

},

{

"address": "0x653430560be843c4a3d143d0110e896c2ab8ac0d",

"decdigits": 16,

"symbol": "MOF",

"name": "Molecular-Future"

},

{

"address": "0x0000000000085d4780B73119b644AE5ecd22b376",

"decdigits": 18,

"symbol": "TUSD",

"name": "True-USD"

},

{

"address": "0xe41d2489571d322189246dafa5ebde1f4699f498",

"decdigits": 18,

"symbol": "ZRX",

"name": "ZRX"

},

{

"address": "0x8ce9137d39326ad0cd6491fb5cc0cba0e089b6a9",

"decdigits": 18,

"symbol": "SXP",

"name": "Swipe"

},

{

"address": "0x75231f58b43240c9718dd58b4967c5114342a86c",

"decdigits": 18,

"symbol": "OKB",

"name": "Okex"

},

{

"address": "0xa974c709cfb4566686553a20790685a47aceaa33",

"decdigits": 18,

"symbol": "XIN",

"name": "Mixin"

},

{

"address": "0xd26114cd6EE289AccF82350c8d8487fedB8A0C07",

"decdigits": 18,

"symbol": "OMG",

"name": "OmiseGO"

},

{

"address": "0x89d24a6b4ccb1b6faa2625fe562bdd9a23260359",

"decdigits": 18,

"symbol": "SAI",

"name": "Sai Stablecoin v1.0"

},

{

"address": "0x6c6ee5e31d828de241282b9606c8e98ea48526e2",

"decdigits": 18,

"symbol": "HOT",

"name": "HoloToken"

},

{

"address": "0x6b175474e89094c44da98b954eedeac495271d0f",

"decdigits": 18,

"symbol": "DAI",

"name": "Dai Stablecoin"

},

{

"address": "0xdb25f211ab05b1c97d595516f45794528a807ad8",

"decdigits": 2,

"symbol": "EURS",

"name": "Statis-EURS"

},

{

"address": "0xa66daa57432024023db65477ba87d4e7f5f95213",

"decdigits": 18,

"symbol": "HPT",

"name": "HuobiPoolToken"

},

{

"address": "0x4fabb145d64652a948d72533023f6e7a623c7c53",

"decdigits": 18,

"symbol": "BUSD",

"name": "Binance-USD"

},

{

"address": "0x056fd409e1d7a124bd7017459dfea2f387b6d5cd",

"decdigits": 2,

"symbol": "GUSD",

"name": "Gemini-USD"

},

{

"address": "0x2c537e5624e4af88a7ae4060c022609376c8d0eb",

"decdigits": 6,

"symbol": "TRYB",

"name": "BiLira"

},

{

"address": "0x4922a015c4407f87432b179bb209e125432e4a2a",

"decdigits": 6,

"symbol": "XAUT",

"name": "Tether-Gold"

},

{

"address": "0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48",

"decdigits": 6,

"symbol": "USDC",

"name": "USD-Coin"

},

{

"address": "0xa5b55e6448197db434b92a0595389562513336ff",

"decdigits": 16,

"symbol": "SUSD",

"name": "Santender"

},

{

"address": "0xffe8196bc259e8dedc544d935786aa4709ec3e64",

"decdigits": 18,

"symbol": "HDG",

"name": "HedgeTrade"

},

{

"address": "0x4a16baf414b8e637ed12019fad5dd705735db2e0",

"decdigits": 2,

"symbol": "QCAD",

"name": "QCAD"

}

]

-------------------------------------------------------------------------------------------

Facebook

Twitter

Click to copy link

Link copied

Cite

United States Department of the Interior. United States Geological Survey (2015). United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps [Dataset]. http://doi.org/10.3886/ICPSR08379.v1

Data from: United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps

Version 1

Explore at:

v1Available download formats

Unique identifier

https://doi.org/10.3886/ICPSR08379.v1

Dataset updated

Aug 5, 2015

Dataset provided by

da|ra (Registration agency for social science and economic data)

Authors

United States Department of the Interior. United States Geological Survey

Description

This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.

Clear search

Close search

Google apps

Main menu

Data from: United States Geological Survey Digital Cartographic Data...

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Example of an interactive line graph.

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4

Data from: PDD Graph: Bridging Electronic Medical Records and Biomedical...

Code book of RTL visualization in Arabic News media

SynthChartNet

Chart Viewer

Dataset for On the regular linear spaces up to order 16

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

A dataset on the carbon release capacity of wetland plants and its effect on...

OAGT Paper Topic Dataset

Netherlands Twin Register. (2024). Snellen Chart [Data set]. Vrije...

Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A...

Event Graph of BPI Challenge 2019

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Data from: Value Line Investment Survey

Environmental data associated to particular health events example dataset

Transaction Graph Dataset for the Ethereum Blockchain

Data from: United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps

Version 1