34 datasets found
  1. g

    Data from: United States Geological Survey Digital Cartographic Data...

    • datasearch.gesis.org
    • icpsr.umich.edu
    v1
    Updated Aug 5, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of the Interior. United States Geological Survey (2015). United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps [Dataset]. http://doi.org/10.3886/ICPSR08379.v1
    Explore at:
    v1Available download formats
    Dataset updated
    Aug 5, 2015
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    United States Department of the Interior. United States Geological Survey
    Description

    This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.

  2. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

  3. Example of an interactive line graph.

    • plos.figshare.com
    xml
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracey L. Weissgerber; Vesna D. Garovic; Marko Savic; Stacey J. Winham; Natasa M. Milic (2023). Example of an interactive line graph. [Dataset]. http://doi.org/10.1371/journal.pbio.1002545.s001
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tracey L. Weissgerber; Vesna D. Garovic; Marko Savic; Stacey J. Winham; Natasa M. Milic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This example can be viewed by uploading S1 Data into the web-based tool (http://statistika.mfub.bg.ac.rs/interactive-graph/). (XML)

  4. H

    Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

    • dataverse.harvard.edu
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Georgios Boumis; Brad Peter
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...

  5. Z

    Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4

    • data.niaid.nih.gov
    Updated Dec 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baran Kılıç (2022). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7157853
    Explore at:
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Alper Şen
    Baran Kılıç
    Can Özturan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain.

    Part1 is available at https://zenodo.org/deposit/7157356 Part3 is available at https://zenodo.org/deposit/7158133 Part4 is available at https://zenodo.org/deposit/7158328

    Details of the datasets are given below:

    FILENAME FORMAT:

    The filenames have the following format:

    btc-tx---.bz2

    where is the starting block number, final block number, and is the split part of the file.

    For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from

    block 100000 to block 149999 inclusive.

    The files are compressed with bzip2. They can be uncompressed using command bunzip2.

    TRANSACTION FORMAT:

    Each line in a file corresponds to a transaction. The transaction has the following format:

    Type of transaction (i.e. BTC-IN or BTC-OUT).

    Number of the block which contains the transaction.

    Position of the transaction in the block (i.e. transaction number in the block).

    Source bitcoin address/transaction of the transfer.

    Destination bitcoin address/transaction of the transfer.

    Amount of transfer.

    BLOCK TIME FORMAT:

    The block time file has the following format:

    Number of the block.

    Unix timestamp at which the block is mined as a hexadecimal number.

    IMPORTANT NOTE:

    Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything.

    NOTE:

    If you use this dataset, please do not forget to add the DOI number to the citation.

    If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14

    @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }

  6. Data from: PDD Graph: Bridging Electronic Medical Records and Biomedical...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng Wang; Jiaheng Zhang; Jun Liu; Wei Hu; Sen Wang; Xue Li; Wenqiang Liu (2023). PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking [Dataset]. http://doi.org/10.6084/m9.figshare.5242138
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Meng Wang; Jiaheng Zhang; Jun Liu; Wei Hu; Sen Wang; Xue Li; Wenqiang Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Patient-drug-disease (PDD) Graph dataset, utilising Electronic medical records (EMRS) and biomedical Knowledge graphs. The novel framework to construct the PDD graph is described in the associated publication.PDD is an RDF graph consisting of PDD facts, where a PDD fact is represented by an RDF triple to indicate that a patient takes a drug or a patient is diagnosed with a disease. For instance, (pdd:274671, pdd:diagnosed, sepsis)Data files are in .nt N-Triple format, a line-based syntax for an RDF graph. These can be accessed via openly-available text edit software.diagnose_icd_information.nt - contains RDF triples mapping patients to diagnoses. For example:(pdd:18740, pdd:diagnosed, icd99592),where pdd:18740 is a patient entity, and icd99592 is the ICD-9 code of sepsis.drug_patients.nt- contains RDF triples mapping patients to drugs. For example:(pdd:18740, pdd:prescribed, aspirin),where pdd:18740 is a patient entity, and aspirin is the drug's name.Background:Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Faced with patients' symptoms, experienced caregivers make the right medical decisions based on their professional knowledge, which accurately grasps relationships between symptoms, diagnoses and corresponding treatments. In the associated paper, we aim to capture these relationships by constructing a large and high-quality heterogenous graph linking patients, diseases, and drugs (PDD) in EMRs. Specifically, we propose a novel framework to extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented in this paper is accessible on the Web via the SPARQL endpoint as well as in .nt format in this repository, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.De-identificationIt is necessary to mention that MIMIC-III contains clinical information of patients. Although the protected health information was de-identifed, researchers who seek to use more clinical data should complete an on-line training course and then apply for the permission to download the complete MIMIC-III dataset: https://mimic.physionet.org/

  7. u

    Code book of RTL visualization in Arabic News media

    • rdr.ucl.ac.uk
    xlsx
    Updated Jul 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muna Alebri; No ̈elle Rakotondravony; Lane Harrison (2024). Code book of RTL visualization in Arabic News media [Dataset]. http://doi.org/10.5522/04/26150749.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    University College London
    Authors
    Muna Alebri; No ̈elle Rakotondravony; Lane Harrison
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In this project, we aimed to map the visualisation design space of visualisation embedded in right-to-left (RTL) scripts. We aimed to expand our knowledge of visualisation design beyond the dominance of research based on left-to-right (LTR) scripts. Through this project, we identify common design practices regarding the chart structure, the text, and the source. We also identify ambiguity, particularly regarding the axis position and direction, suggesting that the community may benefit from unified standards similar to those found on web design for RTL scripts. To achieve this goal, we curated a dataset that covered 128 visualisations found in Arabic news media and coded these visualisations based on the chart composition (e.g., chart type, x-axis direction, y-axis position, legend position, interaction, embellishment type), text (e.g., availability of text, availability of caption, annotation type), and source (source position, attribution to designer, ownership of the visualisation design). Links are also provided to the articles and the visualisations. This dataset is limited for stand-alone visualisations, whether they were single-panelled or included small multiples. We also did not consider infographics in this project, nor any visualisation that did not have an identifiable chart type (e.g., bar chart, line chart). The attached documents also include some graphs from our analysis of the dataset provided, where we illustrate common design patterns and their popularity within our sample.

  8. h

    SynthChartNet

    • huggingface.co
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Docling (2025). SynthChartNet [Dataset]. https://huggingface.co/datasets/ds4sd/SynthChartNet
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    Docling
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    SynthChartNet

    SynthChartNet is a multimodal dataset designed for training the SmolDocling model on chart-based document understanding tasks. It consists of 1,981,157 synthetically generated samples, where each image depicts a chart (e.g., line chart, bar chart, pie chart, stacked bar chart), and the associated ground truth is given in OTSL format. Charts were rendered at 120 DPI using a diverse set of visualization libraries: Matplotlib, Seaborn, and Pyecharts, enabling… See the full description on the dataset page: https://huggingface.co/datasets/ds4sd/SynthChartNet.

  9. Chart Viewer

    • anla-esp-esri-co.hub.arcgis.com
    • city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com
    Updated Sep 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    esri_en (2021). Chart Viewer [Dataset]. https://anla-esp-esri-co.hub.arcgis.com/items/be4582b38d764de0a970b986c824acde
    Explore at:
    Dataset updated
    Sep 22, 2021
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    esri_en
    Description

    Use the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.

  10. Dataset for On the regular linear spaces up to order 16

    • zenodo.org
    application/gzip
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Betten; Dieter Betten; Daniel Heinlein; Daniel Heinlein; Patric R. J. Östergård; Patric R. J. Östergård; Anton Betten; Dieter Betten (2023). Dataset for On the regular linear spaces up to order 16 [Dataset]. http://doi.org/10.5281/zenodo.7890664
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anton Betten; Dieter Betten; Daniel Heinlein; Daniel Heinlein; Patric R. J. Östergård; Patric R. J. Östergård; Anton Betten; Dieter Betten
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains, up to isomorphism, all (15_4,20_3) and (15_5,25_3) configurations, all (16_6,32_3) configurations with nontrivial automorphisms, as well as all 4-regular graphs on 15 vertices, 6-regular graphs on 15 vertices, 3-regular graphs on 16 vertices, and 4-regular graphs on 17 vertices. The configurations uniquely give regular linear spaces with parameters (15|2^45,3^20), (15|2^30,3^25), and (16|2^24,3^32). All files are compressed with gzip.

    The dataset supplements the publication "On the Regular Linear Spaces up to Order 16" by Anton Betten, Dieter Betten, Daniel Heinlein, and Patric R. J. Östergård.

    In the files containing configurations, each line is a configuration with the syntax

    Example:
    Assuming a total of 15 points labeled with {0,...,14}, the characteristic vector of a block {1,3,14} is
    (0)100|0000|0000|1010
    The first bit is padding as each hexadecimal number encodes four bits. Vertical bars designate groups of four bits. Consequently, the block is encoded as
    400a

    The following example shows the first line of one of the files:
    $ zcat conf_15_4_20_3.txt.gz | head -n1
    15 20 1081 4101 2201 0c01 0026 004a 0092 4402 008c 0054 0a04 0038 2108 1110 0160 0620 08c0 5200 3400 6800 A1

    For the files containing graphs, we apply the graph6 file format but we extend each line by the corresponding number of automorphisms as described for configurations above, without the letter A. Programs for manipulating graphs in the graph6 format can be found in the gtools package that comes with the graph isomorphism program nauty (https://pallini.di.uniroma1.it/). Details regarding the graph6 format can be found in the documentation of nauty (https://pallini.di.uniroma1.it/Guide.html).

    For graphs with a most 62 vertices, which holds in all cases here, a line in graph6 format is the ASCII converted equivalent of

    Example:
    Assume a graph with 5 vertices and edges: 02, 04, 13, 34 (the path 2-0-4-3-1), which has the adjacency matrix
    00101
    00010
    10000
    01001
    10010
    Hence, the upper triangle read column-wise is
    0100101001
    After padding we get
    010010100100
    and after grouping
    010010|100100
    Converting to decimal and adding 63 gives
    63+16+2|63+32+4
    that is
    81|99
    The number of vertices is 5, so we prepend 5+63=68:
    68 81 99
    The line in graph6 format is therefore
    DQc
    and our nonstandard appending of the order of the automorphism group gives
    DQc 2

    The first line of one of the files is as follows:
    $ zcat graph_15_4.txt.gz | head -n1
    Ns_???BAwjDoTOY_M_? 2

    The orders of the automorphism groups and the numbers of isomorphism classes are as follows. The (up to isomorphism) 114711393113 (16_6,32_3) regular linear spaces with no nontrivial automorphisms are not stored.

    (15_4,20_3)(15_5,25_3)(16_6,32_3)
    12517121911442354689114711393113
    2942291803671125379
    31129217817287
    49159363054
    52933
    6142180240
    8853650
    9 4
    1044
    12101330
    151
    167 3
    18432
    2022
    241052
    301
    32 1
    364 2
    4021
    484 1
    72 1
    96 1
    120 1
    600 1
    7201
    total2518087701442538454114712539165

    4-regular graphs with 15 vertices6-regular graphs with 15 vertices3-regular graphs with 16 vertices4-regular graphs with 17 vertices
    16567941396131168154776356249
    21198816992831312618665624
    3176302127
    4215003848635667997704
    5 14
    6409550601527213
    84789274294330131662
    101035
    12352213341112577
    14 4
    1610202343514719786
    18110 2
    20712
    242105596114344
    28 18
    3047
    322432463513320
    34 3
    361128 53
    481061453331500
    561 15
    6022
    645428516639
    68 1
    726165296
    964130924504
    112 7
    1205 692
    12810484132
    140 1
    1441074382
    1681 1
    192147720193
    216 2 3
    2242 6
    2401812497
    25616124
    280 1
    288536953
    320 4
    3846261158
    432 932
    448

  11. f

    Petre_Slide_CategoricalScatterplotFigShare.pptx

    • figshare.com
    pptx
    Updated Sep 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Sep 19, 2016
    Dataset provided by
    figshare
    Authors
    Benj Petre; Aurore Coince; Sophien Kamoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorical scatterplots with R for biologists: a step-by-step guide

    Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

    1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

    Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

    Protocol

    • Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

    • Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

    • Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

    Notes

    • Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

    • Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

    7 Display the graph in a separate window. Dot colors indicate

    replicates

    graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

    References

    Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

    Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

    Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

    https://cran.r-project.org/

    http://ggplot2.org/

  12. S

    A dataset on the carbon release capacity of wetland plants and its effect on...

    • scidb.cn
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tan Peiyang; Huang Xin; Hou Zhiyong; Xie Yonghong; Li Yang; Mei Jinhua (2024). A dataset on the carbon release capacity of wetland plants and its effect on nitrogen removal from artificial wetlands [Dataset]. http://doi.org/10.57760/sciencedb.j00001.00818
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Tan Peiyang; Huang Xin; Hou Zhiyong; Xie Yonghong; Li Yang; Mei Jinhua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of two parts: (1) The variation rules of nutrient release from carbon sources of wetland plants. After the experiment began, water samples were collected at the same period, the original and average concentrations of TOC and TN of each sample were tested and counted, and line charts were drawn. (2) Data on the influence of carbon source materials on nitrogen removal performance of Argento, Canna and corncob. From December 8 to April 27, 2019, water samples of each treatment were collected at the same time, the original concentration, average concentration, carbon source utilization rate and nitrogen removal efficiency of TOC, NO3--N, NH4+-N and TN of each sample were tested and counted, and a line chart was drawn.

  13. OAGT Paper Topic Dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated May 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erion Çano; Erion Çano (2022). OAGT Paper Topic Dataset [Dataset]. http://doi.org/10.5281/zenodo.6560535
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Erion Çano; Erion Çano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific publication attributes like abstracts, titles, keywords, publication years, venues, etc. The last two fields of each record are the topic id from a taxonomy of 27 topics created from the entire collection and the 20 most significant topic words. Each dataset record (sample) is stored as a JSON line in the text file.

    The data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released
    under ODC-BY license.

    This data (OAGT Paper Topic Dataset) is released under CC-BY license (https://creativecommons.org/licenses/by/4.0/).

    If using it, please cite the following paper:

    Erion Çano, Benjamin Roth: Topic Segmentation of Research Article Collections. ArXiv 2022, CoRR abs/2205.11249, https://doi.org/10.48550/arXiv.2205.11249

  14. i

    Netherlands Twin Register. (2024). Snellen Chart [Data set]. Vrije...

    • data.individualdevelopment.nl
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Netherlands Twin Register. (2024). Snellen Chart [Data set]. Vrije Universiteit Amsterdam. https://doi.org/10.60641/tf9p-gb90 [Dataset]. https://data.individualdevelopment.nl/dataset/2a3ab7ce544c67c3c8bbe719d7870ce4
    Explore at:
    Dataset updated
    Oct 17, 2024
    Area covered
    Netherlands, Amsterdam
    Description

    A Snellen chart is an eye chart that can be used to measure visual acuity. The Snellen chart is printed with eleven lines of block letters. The first line consists of one very large letter, which may be one of several letters, for example E, H, or N. Subsequent rows have increasing numbers of letters that decrease in size. A person taking the test covers one eye from 6 metres/20 feet away, and reads aloud the letters of each row, beginning at the top. The smallest row that can be read accurately indicates the visual acuity in that specific eye. In NTR, the Snellen chart was tested at the MRI scanner.

  15. f

    Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Loffing (2023). Table_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.XLSX [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Florian Loffing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.

  16. C

    Event Graph of BPI Challenge 2019

    • data.4tu.nl
    zip
    Updated Apr 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dirk Fahland (2021). Event Graph of BPI Challenge 2019 [Dataset]. http://doi.org/10.4121/14169614.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2021
    Dataset provided by
    4TU.ResearchData
    Authors
    Dirk Fahland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Business process event data modeled as labeled property graphs

    Data Format
    -----------

    The dataset comprises one labeled property graph in two different file formats.

    #1) Neo4j .dump format

    A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

    /bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

    The .dump was created with Neo4j v3.5.

    #2) .graphml format

    A .zip file containing a .graphml file of the entire graph


    Data Schema
    -----------

    The graph is a labeled property graph over business process event data. Each graph uses the following concepts

    :Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

    :Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

    :Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

    :Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

    :CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

    :DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

    :HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

    :OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

    :REL relationship - placeholder for any structural relationship between two :Entity nodes

    The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552


    Data Contents
    -------------

    neo4j-bpic19-2021-02-17 (.dump|.graphml.zip)

    An integrated graph describing the raw event data of the entire BPI Challenge 2019 dataset.
    van Dongen, B.F. (Boudewijn) (2019): BPI Challenge 2019. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

    This data originated from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions. In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data: (1) 3-way matching, invoice after goods receipt: For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true). (2) 3-way matching, invoice before goods receipt: Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item. (3) 2-way matching (no goods receipt needed): For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false). (4)Consignment: For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item. Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item. Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant. Of course, the log is anonymized, but some semantics are left in the data, for example: The resources are split between batch users and normal users indicated by their name. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process. The monetary values of each event are anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons). Company, vendor, system and document names and IDs are anonymized in a consistent way throughout the log. The company has the key, so any result can be translated by them to business insights about real customers and real purchase documents.

    The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system. For each purchase item (or case) the following attributes are recorded: concept:name: A combination of the purchase document id and the item id, Purchasing Document: The purchasing document ID, Item: The item ID, Item Type: The type of the item, GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above), Goods Receipt: Flag indicating if 3-way matching is required (see above), Source: The source system of this item, Doc. Category name: The name of the category of the purchasing document, Company: The subsidiary of the company from where the purchase originated, Spend classification text: A text explaining the class of purchase item, Spend area text: A text explaining the area for the purchase item, Sub spend area text: Another text explaining the area for the purchase item, Vendor: The vendor to which the purchase document was sent, Name: The name of the vendor, Document Type: The document type, Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

    The data contains the following entities and their events

    - PO - Purchase Order documents handled at a large multinational company operating from The Netherlands
    - POItem - an item in a Purchase Order document describing a specific item to be purchased
    - Resource - the user or worker handling the document or a specific item
    - Vendor - the external organization from which an item is to be purchased

    Data Size
    ---------

    BPIC19, nodes: 1926651, relationships: 15082099

  17. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  18. H

    Data from: Value Line Investment Survey

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Value Line Publishing (2024). Value Line Investment Survey [Dataset]. http://doi.org/10.7910/DVN/P0RROU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Value Line Publishing
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/P0RROUhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/P0RROU

    Time period covered
    Jan 4, 1980 - Dec 31, 1989
    Description

    The Value Line Investment Survey is one of the oldest, continuously running investment advisory publications. Since 1955, the Survey has been published in multiple formats including print, loose-leaf, microfilm and microfiche. Data from 1997 to present is now available online. The Survey tracks 1700 stocks across 92 industry groups. It provides reported and projected measures of firm performance, proprietary rankings and analysis for each stock on a quarterly basis. DATA AVAILABLE FOR YEARS: 1980-1989 This dataset, a subset of the Survey covering the years 1980-1989 has been digitized from the microfiche collection available at the Dewey Library (FICHE HG 4501.V26). It is only available to MIT students and faculty for academic research. Published weekly, each edition of the Survey has the following three parts: Summary & Index: includes an alphabetical listing of all industries with their relative ranking and the page number for detailed industry analysis. It also includes an alphabetical listing of all stocks in the publication with references to their location in Part 3, Ratings & Reports. Selection & Opinion: contains the latest economic and stock market commentary and advice along with one or more pages of research on interesting stocks or industries, and a variety of pertinent economic and stock market statistics. It also includes three model stock portfolios. Ratings & Reports: This is the core of the Value Line Investment Survey. Preceded by an industry report, each one-page stock report within that industry includes Timeliness, Safety and Technical rankings, 3-to 5-year analyst forecasts for stock prices, income and balance sheet items, up to 17 years of historical data, and Value Line analysts’ commentaries. The report also contains stock price charts, quarterly sales, earnings, and dividend information. Publication Schedule: Each edition of the Survey covers around 130 stocks in seven to eight industries on a preset sequential schedule so that all 1700 stocks are analyzed once every 13 weeks or each quarter. All editions are numbered 1-13 within each quarter. For example, in 1980, reports for Chrysler appear in edition 1 of each quarter on the following dates: January 4, 1980 – page 132 April 4, 1980 – page 133 July 4, 1980 – page 133 October 1, 1980 – page 133 Reports for Coca-Cola were published in edition 10 of each quarter on: March 7, 1980 – page 1514 June 6, 1980 – page 1518 Sept. 5, 1980 – page 1517 Dec. 5, 1980 – page 1548 Any significant news affecting a stock between quarters is covered in the supplementary reports that appear at the end of part 3, Ratings & Reports. File format: Digitized files within this dataset are in PDF format and are arranged by publication date within each compressed annual folder. How to Consult the Value Line Investment Survey: To find reports on a particular stock, consult the alphabetical listing of stocks in the Summary & Index part of the relevant weekly edition. Look for the page number just to the left of the company name and then use the table below to identify the edition where that page number appears. All editions within a given quarter are numbered 1-13 and follow equally sized page ranges for stock reports. The table provides page ranges for stock reports within editions 1-13 of 1980 Q1. It can be used to identify edition and page numbers for any quarter within a given year. Ratings & Reports Edition Pub. Date Pages 1 04-Jan-80 100-242 2 11-Jan-80 250-392 3 18-Jan-80 400-542 4 25-Jan-80 550-692 5 01-Feb-80 700-842 6 08-Feb-80 850-992 7 15-Feb-80 1000-1142 8 22-Feb-80 1150-1292 9 29-Feb-80 1300-1442 10 07-Mar-80 1450-1592 11 14-Mar-80 1600-1742 12 21-Mar-80 1750-1908 13 28-Mar-80 2000-2142 Another way to navigate to the Ratings & Reports part of an edition would be to look around page 50 within the PDF document. Note that the page numbers of the PDF will not match those within the publication.

  19. Environmental data associated to particular health events example dataset

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Environmental data associated to particular health events example dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5823426?locale=cs
    Explore at:
    unknown(6689542)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data set is a collection of environmental records associated with the individual events. The data set has been generated using the serdif-api wrapper (https://github.com/navarral/serdif-api) when sending a CSV file with example events for the Republic of Ireland. The serdif-api send a semantic query that (i) selects the environmental data sets within the region of the event, (ii) filters by the specific period of interest from the event, (iii) aggregates the data sets using the minimum, maximum, average or sum for each of the available variables for a specific time unit. The aggregation method and the time unit can be passed to the serdif-api through the Command Line Interface (CLI) (see example in https://github.com/navarral/serdif-api). The resulting data set format can be also specified as data table (CSV) or as graph (RDF) for analysis and publication as FAIR data. The open-ready data for research is retrieved as a zip file that contains: (i) data as csv: environmental data associated to particular events as a data table (ii) data as rdf: environmental data associated to particular events as a graph (iii) metadata for publication as rdf: metadata record with generalized information about the data that do not contain personal data as a graph; therefore, publishable. (iv) metadata for research as rdf: metadata records with detailed information about the data, such as individual dates, regions, data sets used and data lineage; which could lead to data privacy issues if published without approval from the Data Protection Officer (DPO) and data controller.

  20. Transaction Graph Dataset for the Ethereum Blockchain

    • zenodo.org
    • data.europa.eu
    Updated Dec 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç (2022). Transaction Graph Dataset for the Ethereum Blockchain [Dataset]. http://doi.org/10.5281/zenodo.3669937
    Explore at:
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç
    Description

    This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain.

    Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward transactions are not currently included in the dataset.

    Details of the datasets are given below:

    FILENAME FORMAT:

    The filenames have the following format:

    eth-tx-

    where

    For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from

    block 1000000 to block 1099999 inclusive.

    The files are compressed with bzip2. They can be uncompressed using command bunzip2.

    TRANSACTION FORMAT:

    Each line in a file corresponds to a transaction. The transaction has the following format:

    units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20

    contract) are indicated by token symbol. For example GUSD is Gemini USD stable

    coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens.

    decoder-error.txt FILE:

    This file contains transactions (block no, tx no, tx hash) on each line that produced

    error while decoding calldata. These transactions are not present in the data files.

    er20tokens.json FILE:

    This file contains the list of popular ERC20 token contracts whose transfer/transferFrom

    transactions appear in the data files.

    -------------------------------------------------------------------------------------------

    [

    {

    "address": "0xdac17f958d2ee523a2206206994597c13d831ec7",

    "decdigits": 6,

    "symbol": "USDT",

    "name": "Tether-USD"

    },

    {

    "address": "0xB8c77482e45F1F44dE1745F52C74426C631bDD52",

    "decdigits": 18,

    "symbol": "BNB",

    "name": "Binance"

    },

    {

    "address": "0x2af5d2ad76741191d15dfe7bf6ac92d4bd912ca3",

    "decdigits": 18,

    "symbol": "LEO",

    "name": "Bitfinex-LEO"

    },

    {

    "address": "0x514910771af9ca656af840dff83e8264ecf986ca",

    "decdigits": 18,

    "symbol": "LNK",

    "name": "Chainlink"

    },

    {

    "address": "0x6f259637dcd74c767781e37bc6133cd6a68aa161",

    "decdigits": 18,

    "symbol": "HT",

    "name": "HuobiToken"

    },

    {

    "address": "0xf1290473e210b2108a85237fbcd7b6eb42cc654f",

    "decdigits": 18,

    "symbol": "HEDG",

    "name": "HedgeTrade"

    },

    {

    "address": "0x9f8f72aa9304c8b593d555f12ef6589cc3a579a2",

    "decdigits": 18,

    "symbol": "MKR",

    "name": "Maker"

    },

    {

    "address": "0xa0b73e1ff0b80914ab6fe0444e65848c4c34450b",

    "decdigits": 8,

    "symbol": "CRO",

    "name": "Crypto.com"

    },

    {

    "address": "0xd850942ef8811f2a866692a623011bde52a462c1",

    "decdigits": 18,

    "symbol": "VEN",

    "name": "VeChain"

    },

    {

    "address": "0x0d8775f648430679a709e98d2b0cb6250d2887ef",

    "decdigits": 18,

    "symbol": "BAT",

    "name": "Basic-Attention"

    },

    {

    "address": "0xc9859fccc876e6b4b3c749c5d29ea04f48acb74f",

    "decdigits": 0,

    "symbol": "INO",

    "name": "INO-Coin"

    },

    {

    "address": "0x8e870d67f660d95d5be530380d0ec0bd388289e1",

    "decdigits": 18,

    "symbol": "PAX",

    "name": "Paxos-Standard"

    },

    {

    "address": "0x17aa18a4b64a55abed7fa543f2ba4e91f2dce482",

    "decdigits": 18,

    "symbol": "INB",

    "name": "Insight-Chain"

    },

    {

    "address": "0xc011a72400e58ecd99ee497cf89e3775d4bd732f",

    "decdigits": 18,

    "symbol": "SNX",

    "name": "Synthetix-Network"

    },

    {

    "address": "0x1985365e9f78359a9B6AD760e32412f4a445E862",

    "decdigits": 18,

    "symbol": "REP",

    "name": "Reputation"

    },

    {

    "address": "0x653430560be843c4a3d143d0110e896c2ab8ac0d",

    "decdigits": 16,

    "symbol": "MOF",

    "name": "Molecular-Future"

    },

    {

    "address": "0x0000000000085d4780B73119b644AE5ecd22b376",

    "decdigits": 18,

    "symbol": "TUSD",

    "name": "True-USD"

    },

    {

    "address": "0xe41d2489571d322189246dafa5ebde1f4699f498",

    "decdigits": 18,

    "symbol": "ZRX",

    "name": "ZRX"

    },

    {

    "address": "0x8ce9137d39326ad0cd6491fb5cc0cba0e089b6a9",

    "decdigits": 18,

    "symbol": "SXP",

    "name": "Swipe"

    },

    {

    "address": "0x75231f58b43240c9718dd58b4967c5114342a86c",

    "decdigits": 18,

    "symbol": "OKB",

    "name": "Okex"

    },

    {

    "address": "0xa974c709cfb4566686553a20790685a47aceaa33",

    "decdigits": 18,

    "symbol": "XIN",

    "name": "Mixin"

    },

    {

    "address": "0xd26114cd6EE289AccF82350c8d8487fedB8A0C07",

    "decdigits": 18,

    "symbol": "OMG",

    "name": "OmiseGO"

    },

    {

    "address": "0x89d24a6b4ccb1b6faa2625fe562bdd9a23260359",

    "decdigits": 18,

    "symbol": "SAI",

    "name": "Sai Stablecoin v1.0"

    },

    {

    "address": "0x6c6ee5e31d828de241282b9606c8e98ea48526e2",

    "decdigits": 18,

    "symbol": "HOT",

    "name": "HoloToken"

    },

    {

    "address": "0x6b175474e89094c44da98b954eedeac495271d0f",

    "decdigits": 18,

    "symbol": "DAI",

    "name": "Dai Stablecoin"

    },

    {

    "address": "0xdb25f211ab05b1c97d595516f45794528a807ad8",

    "decdigits": 2,

    "symbol": "EURS",

    "name": "Statis-EURS"

    },

    {

    "address": "0xa66daa57432024023db65477ba87d4e7f5f95213",

    "decdigits": 18,

    "symbol": "HPT",

    "name": "HuobiPoolToken"

    },

    {

    "address": "0x4fabb145d64652a948d72533023f6e7a623c7c53",

    "decdigits": 18,

    "symbol": "BUSD",

    "name": "Binance-USD"

    },

    {

    "address": "0x056fd409e1d7a124bd7017459dfea2f387b6d5cd",

    "decdigits": 2,

    "symbol": "GUSD",

    "name": "Gemini-USD"

    },

    {

    "address": "0x2c537e5624e4af88a7ae4060c022609376c8d0eb",

    "decdigits": 6,

    "symbol": "TRYB",

    "name": "BiLira"

    },

    {

    "address": "0x4922a015c4407f87432b179bb209e125432e4a2a",

    "decdigits": 6,

    "symbol": "XAUT",

    "name": "Tether-Gold"

    },

    {

    "address": "0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48",

    "decdigits": 6,

    "symbol": "USDC",

    "name": "USD-Coin"

    },

    {

    "address": "0xa5b55e6448197db434b92a0595389562513336ff",

    "decdigits": 16,

    "symbol": "SUSD",

    "name": "Santender"

    },

    {

    "address": "0xffe8196bc259e8dedc544d935786aa4709ec3e64",

    "decdigits": 18,

    "symbol": "HDG",

    "name": "HedgeTrade"

    },

    {

    "address": "0x4a16baf414b8e637ed12019fad5dd705735db2e0",

    "decdigits": 2,

    "symbol": "QCAD",

    "name": "QCAD"

    }

    ]

    -------------------------------------------------------------------------------------------

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
United States Department of the Interior. United States Geological Survey (2015). United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps [Dataset]. http://doi.org/10.3886/ICPSR08379.v1

Data from: United States Geological Survey Digital Cartographic Data Standards: Digital Line Graphs from 1:2,000,000-Scale Maps

Version 1

Related Article
Explore at:
v1Available download formats
Dataset updated
Aug 5, 2015
Dataset provided by
da|ra (Registration agency for social science and economic data)
Authors
United States Department of the Interior. United States Geological Survey
Description

This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.

Search
Clear search
Close search
Google apps
Main menu