100+ datasets found

Sample Graph Datasets in CSV Format

zenodo.org

csv

Updated Dec 9, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14335015

Dataset updated

Dec 9, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Edwin Carreño; Edwin Carreño

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sample Graph Datasets in CSV Format

Note: none of the data sets published here contain actual data, they are for testing purposes only.

Description

This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
dataset_30_edges_interactions.csv: contains 47 rows (edges).
the common identifier dataset_30 refers to the same graph.

CSV nodes

Each dataset contains the following columns:

Name of the Column	Type	Description
UniProt ID	string	protein identification
label	string	protein label (type of node)
properties	string	a dictionary containing properties related to the protein.

CSV edges

Each dataset contains the following columns:

Name of the Column	Type	Description
Relationship ID	string	relationship identification
Source ID	string	identification of the source protein in the relationship
Target ID	string	identification of the target protein in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_30*	30	47	Y
dataset_60*	60	181	Y
dataset_120*	120	689	Y
dataset_240*	240	2819	Y
dataset_300*	300	4658	Y
dataset_600*	600	18004	Y
dataset_1200*	1200	71785	Y
dataset_2400*	2400	288600	Y
dataset_3000*	3000	449727	Y
dataset_6000*	6000	1799413	Y
dataset_12000*	12000	7199863	Y
dataset_24000*	24000	28792361	Y
dataset_30000*	30000	44991744	Y

This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

CSV nodes (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	node identification
label	string	node label (type of node)
properties	string	a dictionary containing properties related to the node.

CSV edges (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	relationship identification
source	string	identification of the source node in the relationship
target	string	identification of the target node in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata (tiny graphs)

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_dummy*	3	6	N
dataset_dummy2*	3	6	N

Awesome Public Datasets as Neo4j Graph
kaggle.com
Updated Dec 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manav Sehgal (2016). Awesome Public Datasets as Neo4j Graph [Dataset]. https://www.kaggle.com/datasets/startupsci/awesome-datasets-graph/suggestions?status=pending
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Manav Sehgal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The awesome datasets graph is a Neo4j graph database which catalogs and classifies datasets and data sources as scraped from the Awesome Public Datasets GitHub list.

Content

We started with a simple list of links on the Awesome Public Datasets page. We now have a semantic graph database with 10 labels, five relationship types, nine property keys, and more than 400 nodes. All within 1MB of database footprint. All database operations are query driven using the powerful and flexible Cypher Graph Query Language.

The download includes CSV files which were created as an interim step after scraping and wrangling the source. The download also includes a working Neo4j Graph Database. Login: neo4j | Password: demo.

Acknowledgements

Data scraped from Awesome Public Datasets page. Prepared for the book Data Science Solutions.

Inspiration

While we have done basic data wrangling and preparation, how can this graph prove useful for your data science workflow? Can we record our data science project decisions taken across workflow stages and how the data catalog (datasources, datasets, tools) use cases help in these decisions by achieving data science solutions strategies?
f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
f
European Mountain Territory and Value Chains: Knowledge Graphs, CSV, HTML,...
figshare.com
txt
Updated Jul 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aimhdhgroup (2024). European Mountain Territory and Value Chains: Knowledge Graphs, CSV, HTML, and Excel Data [Dataset]. http://doi.org/10.6084/m9.figshare.25243009.v8
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25243009.v8
Dataset updated
Jul 29, 2024
Dataset provided by
figshare
Authors
aimhdhgroup
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a collection of data about 454 value chains from 23 rural European areas of 16 countries. This data is obtained through a semi-automatic workflow that transforms raw textual data from an unstructured MS Excel sheet into semantic knowledge graphs.In particular, the repository contains:MS Excel sheet containing different value chains details provided by MOuntain Valorisation through INterconnectedness and Green growth (MOVING) European project;454 CSV files containing events, titles, entities and coordinates of narratives of each value chain, obtained by pre-processing the MS Excel sheet454 Web Ontology Language (OWL) files. This collection of files is the result of the semi-automatic workflow, and is organized as a semantic knowledge graph of narratives, where each narrative is a sub-graph explaining one among the 454 value chains and its territory aspects. The knowledge graph is based on the Narrative Ontology, an ontology developed by Institute of Information Science and Technologies (ISTI-CNR) as an extension of CIDOC CRM, FRBRoo, and OWL Time.Two CSV files that compile all the possible available information extracted from 454 Web Ontology Language (OWL) files.GeoPackage files with the geographic coordinates related to the narratives.The HTML files that show all the different SPARQL and GeoSPARQL queries.The HTML files that show the story maps about the 454 value chains.An image showing how the various components of the dataset interact with each other.
H
Graph inference datasets. Replication Data for: "Learning Functional Causal...
dataverse.harvard.edu
tsv, txt
Updated Aug 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2017). Graph inference datasets. Replication Data for: "Learning Functional Causal Models with Generative Neural Networks" [Dataset]. http://doi.org/10.7910/DVN/UZMB69
Explore at:
tsv(206051), tsv(186072), tsv(443), tsv(235732), tsv(500), tsv(196316), tsv(283), tsv(176390), tsv(195886), tsv(265), tsv(276), tsv(476), tsv(368), tsv(489), tsv(272), tsv(176253), tsv(339), tsv(196441), tsv(166370), tsv(421), tsv(2146), tsv(205879), tsv(427), tsv(215987), tsv(469), tsv(206393), tsv(248), tsv(206395), tsv(196769), tsv(445), tsv(565), tsv(1010788), tsv(455), tsv(176284), tsv(287), tsv(203), tsv(508), tsv(195777), txt(521), tsv(176425), tsv(385), tsv(176495), tsv(216028), tsv(215849), tsv(216), tsv(438), tsv(331), tsv(2136), tsv(166779), txt(690), tsv(176232), tsv(206095), tsv(176214), tsv(531), tsv(2083), tsv(495), tsv(393), tsv(961875), tsv(381), tsv(292), tsv(353), tsv(206251), tsv(507), tsv(176462), tsv(1002508), tsv(374), tsv(2155), tsv(294), tsv(319), tsv(201), tsv(369), tsv(206019), tsv(166110), tsv(173), tsv(991344), tsv(196439), tsv(1020789), tsv(176421), tsv(206350), tsv(349), tsv(506), tsv(305), tsv(285), tsv(230), tsv(212), tsv(279), tsv(325), tsv(337), tsv(225777), tsv(241), tsv(306), tsv(175987), tsv(232), tsv(186125), tsv(166532), tsv(254), tsv(186144), tsv(215986), tsv(318), tsv(317), tsv(166634), tsv(568), tsv(216417), tsv(342), tsv(2186), tsv(206364), tsv(981518)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/UZMB69
Dataset updated
Aug 25, 2017
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Graph datasets in csv format. Used in the article Learning Functional Causal Models with Generative Neural Networks. 1) Each file *_numdata.csv contain the data of around 20 variables connected in a graph without hidden variables. G2, G3, G4 and G5 refered to graph with 2, 3, 4 and 5 parents maximum for each node. Each file *_target.csv contains the ground truth of the graph with cause -> effect File beginning by "Big" are larger graphs with 100 variables. 2) Each file *_confounders_numdata.csv contain the data of around 20 variables connected in a graph. There are 3 hidden variables. Each file *_confounders_skeleton.csv contains the skeleton of the graph (including spurious links due to common hidden cause). Each file *_confounders_target.csv contains the ground truth of the graph with the direct visible cause -> effect. The task is to recover the direct visible links cause->effect while removing the spurious links of the skeleton
Semantic links between selected CSV datasets harvested by the European Data...
zenodo.org
eprints.soton.ac.uk
zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis-Daniel Ibanez; Luis-Daniel Ibanez (2024). Semantic links between selected CSV datasets harvested by the European Data Portal and the DBpedia knowledge graph [Dataset]. http://doi.org/10.5281/zenodo.3837721
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3837721
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luis-Daniel Ibanez; Luis-Daniel Ibanez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These dataset contains the results of the interlinking process between selected csv datasets harvested by the European DAta Portal and the DBpedia knowledge graph.

We aim at answering the following questions:
What are the more popular column types? This will provide hindsight about what the datasets hold and how they can be joined. It will also provide hindsight on what specific linking schemes could be applied in future elements.
What datasets have columns of the same type? This will suggest datasets that may be similar or related.
What entities appear in most datasets (co-referent entities)? This will suggest entities for which more data is published.
What datasets share a particular entity? This will suggest datasets that may be joined, or are related through that particular entity

Results are provided as augmented tables, that contain the columns of the original csv, plus a metadata file in JSON-LD format. The metadata files can be loaded in an RDF-store and queried.

Refer to the accompanying report of activities for more details on the methodolog and how to query the dataset.
g
Datasets for manuscript: ADAM: A Web Platform for Graph-Based Modeling and...
gimi9.com
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Datasets for manuscript: ADAM: A Web Platform for Graph-Based Modeling and Optimization of Supply Chains | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_datasets-for-manuscript-adam-a-web-platform-for-graph-based-modeling-and-optimization-of-s/
Explore at:
Dataset updated
Nov 2, 2023
Description
ADAM-Data-Repository This repository contains all the data needed to run the case studies for the ADAM manuscript. ## Biogas production The directory "biogas" contains all data for the biogas production case studies (Figs 13 and 14). Specifically, "biogas/biogas_x" contains the data files for the scenario where "x" is the corresponding Renewable Energy Certificates (RECs) value. ## Plastic waste recycling The directory "plastic_waste" contains all data for the plastic waste recycling case studies (Figs 15 and 16). Different scenarios share the same supply, technology site, and technology candidate data, as specified by the "csv" files under "plastic_waste". Each scenario has a different demand data file, which is contained in "plastic_waste/Elec_price" and "plastic_waste/PET_price". ## How to run the case studies In order to run the case studies, one can create a new model in ADAM and upload appropriate CSV file at each step (e.g. upload biogas/biogas_0/supplydata197.csv in step 2 where supply data are specified). This dataset is associated with the following publication: Hu, Y., W. Zhang, P. Tominac, M. Shen, D. Göreke, E. Martín-Hernández, M. Martín, G.J. Ruiz-Mercado, and V.M. Zavala. ADAM: A web platform for graph-based modeling and optimization of supply chains. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 165: 107911, (2022).
w
Our World In Data - Dataset - waterdata
wbwaterdata.org
Updated Jul 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Our World In Data - Dataset - waterdata [Dataset]. https://wbwaterdata.org/dataset/our-world-in-data
Explore at:
Dataset updated
Jul 12, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database collates 3552 development indicators from different studies with data by country and year, including single year and multiple year time series. The data is presented as charts, the data can be downloaded from linked project pages/references for each set, and the data for each presented graph is available as a CSV file as well as a visual download of the graph (both available via the download link under each chart).
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
i
Multi-agent Kidney Exchange Program: dataset for simulation along time...
rdm.inesctec.pt
Updated Jun 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Multi-agent Kidney Exchange Program: dataset for simulation along time horizon - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/ii-2020-002
Explore at:
Dataset updated
Jun 16, 2020
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The dataset contains the instances files for the paper X. Klimentova, A. Viana, J. P. Pedroso, N. Santos. Fairness models for multi-agent kidney exchange programmes. To appear in Omega: The International Journal of Management Science (2020). The same dataset was also use in Monteiro, T., Klimentova, X., Pedroso, J.P.Pedroso, A. Viana. A comparison of matching algorithms for kidney exchange programs addressing waiting time. Cent Eur J Oper Res (2020). https://doi.org/10.1007/s10100-020-00680-y Each instance mimics pools of kidney exchange progammes of several agents (e.g. countires) over time. Incompatible donor-recipient pairs appear and leave along the time horizon. Each of the pairs belongs to the pool of one of the agents. The virutal compatiblity among pairs is represented on a directed graph G = (V,A), called compatibility graph, where the set of vertices V corresponds to the set of incompatible pairs and non-directed donors. An arc from a vertex i to a vertex j indicates compatibility of donor in i with the patient in j. The positive real crossmatch testing is also incorporated by saving the arcs that would fail in case they are chosen is a cycle in one of the matching runs. The generator creates randomly graphs based on probabilities of blood type and of donor–patient tissue compatibility; the arrival of pairs and non-directed donors is generated based on a given arrival rates. An instance of the dataset represents a pools of 4 agents, that are simulated for the period of 6 years. There are 100 instances compressed in 4 zip-archives, each containing 25 instances. Each of the instances is described by 3 files, where index s is the seed used for random function when generating the instance. a) characterisations_s.csv -- csv file that contains information on each pair in the merged pool in the following columns 0 : Pair ID 1 : Donor ID 2 : Donor blood type 3 : Donor age 4 : Patient ID 5 : Patient blood type 6 : Patient PRA 7 : Patient cross-match probability 8 : Patient age 9 : Pair arrival day 10 : Pair departure day 11 : Pair probability of failure 12 : Pair from pool (e.g. country to which the pair belongs to) In case of non-directed donor the information about the patient is filled by -1; b) acrs_s.csv - csv file containts the compatibility graph of the problem described above. In the first line the file contains values n – number of vertices in the graph and m – number of arcs in the graph. In the following m lines of the file, the existing arcs (i,j) are presented as follows: i j w_ij where i and j are IDs of pairs, w_ij is the weight of the arc, which is always equal to 1.0 for all the instances in this dataset. c) fail_arcs_s.csv - is the list of arcs that would fail due to positive crossmatch test in case they appear in a chosen cycle or chain in any matching run. The format of the file is the same as that for arcs_s.csv. The first line represents the n - number of vertices in the graph, and m_fail the number of failed arcs listed in the following m_fail lines in the same way as in arcs_s.csv
Wikipedia time-series graph
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre; Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre (2025). Wikipedia time-series graph [Dataset]. http://doi.org/10.5281/zenodo.886484
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.886484
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre; Benzi Kirell; Miz Volodymyr; Ricaud Benjamin; Vandergheynst Pierre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wikipedia temporal graph.

The dataset is based on two Wikipedia SQL dumps: (1) English language articles and (2) user visit counts per page per hour (aka pagecounts). The original datasets are publicly available on the Wikimedia website.

Static graph structure is extracted from English language Wikipedia articles. Redirects are removed. Before building the Wikipedia graph we introduce thresholds on the minimum number of visits per hour and maximum in-degree. We remove the pages that have less than 500 visits per hour at least once during the specified period. Besides, we remove the nodes (pages) with in-degree higher than 8 000 to build a more meaningful initial graph. After cleaning, the graph contains 116 016 nodes (out of total 4 856 639 pages), 6 573 475 edges. The graph can be imported in two ways: (1) using edges.csv and vertices.csv or (2) using enwiki-20150403-graph.gt file that can be opened with open source Python library Graph-Tool.

Time-series data contains users' visit counts from 02:00, 23 September 2014 until 23:00, 30 April 2015. The total number of hours is 5278. The data is stored in two formats: CSV and H5. CSV file contains data in the following format [page_id :: count_views :: layer], where layer represents an hour. In H5 file, each layer corresponds to an hour as well.
H
Text-attributed Temporal Graph Benchmark Datasets
dataverse.harvard.edu
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Longfei Ma (2025). Text-attributed Temporal Graph Benchmark Datasets [Dataset]. http://doi.org/10.7910/DVN/ZK7NGU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZK7NGU
Dataset updated
May 19, 2025
Dataset provided by
Harvard Dataverse
Authors
Longfei Ma
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These are six real-world temporal graph datasets with text attributes, varying in size and originating from different domains. Specifically, they are collected from culinary recipe feedback, movie reviews, book reading records, beer rating data, and online shopping interactions. In these temporal graphs, users and items are represented as nodes, while the interactions between them — in the form of user reviews or comments — serve as edges. Each edge is associated with both a timestamp and raw textual content. Additionally, each item node is accompanied by a descriptive text attribute. The files under the $dataset directory are as follows: 1、raw_node.npy and raw_edges.csv store the raw text attributes of nodes and edges, respectively. 2、ml_$dataset.csv records the temporal edges of the dataset, where each row in the format (u, i, ts) represents a user u interacting with an item i at timestamp ts. 3、$dataset_unique_labels.json contains the complete set of human-readable labels for the dataset. 4、Both $dataset_labels_text.json and $dataset_labels.json correspond to the labels associated with each edge in ml_$dataset.csv, where the former provides the textual form of the item labels that users are interested in, and the latter provides their corresponding numeric labels.
t
Evaluating SQuAD-based Question Answering for the Open Research Knowledge...
service.tib.eu
Updated Aug 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/luh-evaluating-squad-based-question-answering-for-the-open-research-knowledge-graph-completion
Explore at:
Dataset updated
Aug 4, 2023
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This dataset is part of the bachelor thesis "Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion". It was created for the finetuning of Bert Based models pre-trained on the SQUaD dataset. The Dataset was created using semi-automatic approach on the ORKG data. The dataset.csv file contains the entire data (all properties) in a tabular for and is unsplit. The json files contain only the necessary fields for training and evaluation, with additional fields (index of start and end of the answers in the abstracts). The data in the json files is split (training data) and evaluation data. We create 4 variants of the training and evaluation sets for each one of the question labels ("no label", "how", "what", "which") For detailed information on each of the fields in the dataset, refer to section 4.2 (Corpus) of the Thesis document that can be found in https://www.repo.uni-hannover.de/handle/123456789/12958. The script used to generate the dataset can be found in the public repository https://github.com/as18cia/thesis_work and https://gitlab.com/TIBHannover/orkg/nlp/experiments/orkg-fine-tuning-squad-based-models
j
Data from: Data on the Construction Processes of Regression Models
jstagedata.jst.go.jp
jpeg
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taichi Kimura; Riko Iwamoto; Mikio Yoshida; Tatsuya Takahashi; Shuji Sasabe; Yoshiyuki Shirakawa (2023). Data on the Construction Processes of Regression Models [Dataset]. http://doi.org/10.50931/data.kona.22180318.v2
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.50931/data.kona.22180318.v2
Dataset updated
Jul 27, 2023
Dataset provided by
Hosokawa Powder Technology Foundation
Authors
Taichi Kimura; Riko Iwamoto; Mikio Yoshida; Tatsuya Takahashi; Shuji Sasabe; Yoshiyuki Shirakawa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This CSV dataset (numbered 1–8) demonstrates the construction processes of the regression models using machine learning methods, which are used to plot Fig. 2–7. The CSV file of 1.LSM_R^2 (plotting Fig. 2) shows the data of the relationship between estimated values and actual values when the least-squares method was used for a model construction. In the CSV file 2.PCR_R^2 (plotting Fig. 3), the number of the principal components was varied from 1 to 5 during the construction of a model using the principal component regression. The data in the CSV file 3.SVR_R^2 (plotting Fig. 4) is the result of the construction using the support vector regression. The hyperparameters were decided by the comprehensive combination from the listed candidates by exploring hyperparameters with maximum R2 values. When a deep neural network was applied to the construction of a regression model, NNeur., NH.L. and NL.T. were varied. The CSV file 4.DNN_HL (plotting Fig. 5a)) shows the changes in the relationship between estimated values and actual values at each NH.L.. Similarly, changes in the relationships between estimated values and actual values in the case NNeur. or NL.T. were varied in the CSV files 5.DNN_ Neur (plotting Fig. 5b)) and 6.DNN_LT (plotting Fig. 5c)). The data in the CSV file 7.DNN_R^2 (plotting Fig. 6) is the result using optimal NNeur., NH.L. and NL.T.. In the CSV file 8.R^2 (plotting Fig. 7), the validity of each machine learning method was compared by showing the optimal results for each method. Experimental conditions Supply volume of the raw material: 25–125 mL Addition rate of TiO2: 5.0–15.0 wt% Operation time: 1–15 min Rotation speed: 2,200–5,700 min-1 Temperature: 295–319 K Nomenclature NNeur.: the number of neurons NH.L.: the number of hidden layers NL.T.: the number of learning times
Link-prediction on Biomedical Knowledge Graphs
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Cattaneo; Daniel Justus; Stephen Bonner; Stephen Bonner; Thomas Martynec; Thomas Martynec; Alberto Cattaneo; Daniel Justus (2024). Link-prediction on Biomedical Knowledge Graphs [Dataset]. http://doi.org/10.5281/zenodo.12097377
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12097377
Dataset updated
Jun 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Cattaneo; Daniel Justus; Stephen Bonner; Stephen Bonner; Thomas Martynec; Thomas Martynec; Alberto Cattaneo; Daniel Justus
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Time period covered
Jun 25, 2021
Description
Release of the experimental data from the paper Towards Linking Graph Topology to Model Performance for Biomedical Knowledge Graph Completion (accepted at Machine Learning for Life and Material Sciences workshop @ ICML2024).

Knowledge Graph Completion has been increasingly adopted as a useful method for several tasks in biomedical research, like drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models has been proposed over the years. However, little is known about the properties that render a dataset useful for a given task and, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. We conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world applications. By releasing all model predictions we invite the community to build upon our work and continue improving the understanding of these crucial applications.

Experiments were conducted on six datasets: five from the biomedical domain (Hetionet, PrimeKG, PharmKG, OpenBioLink2020 HQ, PharMeBINet) and one trivia KG (FB15k-237). All datasets were randomly split into training, validation and test set (80% / 10% / 10%; in the case of PharMeBINet, 99.3% / 0.35% / 0.35% to mitigate the increased inference cost on the larger dataset).

On each dataset, four different KGE models were compared: TransE, DistMult, RotatE, TripleRE. Hyperparameters were tuned on the validation split and we release results for tail predictions on the test split. In particular, each test query (h,r,?) is scored against all entities in the KG and we compute the rank of the score of the correct completion (h,r,t) , after masking out scores of other (h,r,t') triples contained in the graph.

Note: the ranks provided are computed as the average between the optimistic and pessimistic ranks of triple scores.

Inside experimental_data.zip, the following files are provided for each dataset:

{dataset}_preprocessing.ipynb: a Jupyter notebook for downloading and preprocessing the dataset. In particular, this generates the custom label->ID mapping for entities and relations, and the numerical tensor of (h_ID,r_ID,t_ID) triples for all edges in the graph, which can be used to compute graph topological metrics (e.g., using kg-topology-toolbox) and compare them with the edge prediction accuracy.

test_ranks.csv: csv table with columns ["h", "r", "t"] specifying the head, relation, tail IDs of the test triples, and columns ["DistMult", "TransE", "RotatE", "TripleRE"] with the rank of the ground-truth tail in the ordered list of predictions made by the four models;

entity_dict.csv: the list of entity labels, ordered by entity ID (as generated in the preprocessing notebook);

relation_dict.csv: the list of relation labels, ordered by relation ID (as generated in the preprocessing notebook).

The separate top_100_tail_predictions.zip archive contains, for each of the test queries in the corresponding test_ranks.csv table, the IDs of the top-100 tail predictions made by each of the four KGE models, ordered by decreasing likelihood. The predictions are released in a .npz archive of numpy arrays (one array of shape (n_test_triples, 100) for each of the KGE models).

All experiments (training and inference) have been run on Graphcore IPU hardware using the BESS-KGE distribution framework.
Z
PheKnowLator Human Disease Knowledge Graphs - Build Data (Original)
data.niaid.nih.gov
zenodo.org
Updated Aug 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Callahan, Tiffany J (2022). PheKnowLator Human Disease Knowledge Graphs - Build Data (Original) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7026639
Explore at:
Dataset updated
Aug 29, 2022
Dataset authored and provided by
Callahan, Tiffany J
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RELEASE V2.1.0 KNOWLEDGE GRAPH: ORIGINAL DATA SOURCES

Release: v2.1.0

The goal of this build was to create a knowledge graph that represented human disease mechanisms and included the central dogma. The data sources utilized in this release include many of the sources used in the initial release, as well as some new data made available by the Comparative Toxicogenomics Database and experimental data from the Human Protein Atlas.

Data sources are listed by type (Ontology and Data not represented in an ontology [Database Sources]). Additional details are provided for each data source below. Please see documentation on the primary release (https://github.com/callahantiff/PheKnowLator/wiki/v2-Data-Sources) for additional details on each data source as well as citation information.

Data Access:

https://console.cloud.google.com/storage/browser/pheknowlator/archived_builds/release_v2.1.0/build_01MAY2021

ONTOLOGIES

Cell Ontology

Cell Line Ontology

Chemical Entities of Biological Interest (ChEBI) Ontology

Gene Ontology

Human Phenotype Ontology

Mondo Disease Ontology

Pathway Ontology

Protein Ontology

Relations Ontology

Sequence Ontology

Uber-Anatomy Ontology

Vaccine Ontology

Cell Ontology (CL)

Homepage: GitHub Citation:

Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biology. 2005;6(2):R21

Usage: Utilized to connect transcripts and proteins to cells. Additionally, the edges between this ontology and its dependencies are utilized:

ChEBI

GO

PATO

PRO

RO

UBERON

Cell Line Ontology (CLO)

Homepage: http://www.clo-ontology.org/ Citation:

Sarntivijai S, Lin Y, Xiang Z, Meehan TF, Diehl AD, Vempati UD, Schürer SC, Pang C, Malone J, Parkinson H, Liu Y. CLO: the cell line ontology. Journal of Biomedical Semantics. 2014;5(1):37

Usage: Utilized this ontology to map cell lines to transcripts and proteins. Additionally, the edges between this ontology and its dependencies are utilized:

CL

DOID

NCBITaxon

UBERON

Chemical Entities of Biological Interest (ChEBI)

Homepage: https://www.ebi.ac.uk/chebi/ Citation:

Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2015;44(D1):D1214-9

Usage: Utilized to connect chemicals to complexes, diseases, genes, GO biological processes, GO cellular components, GO molecular functions, pathways, phenotypes, reactions, and transcripts.

Gene Ontology (GO)

Homepage: http://geneontology.org/ Citations:

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25

The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. 2018;47(D1):D330-8

Usage: Utilized to connect biological processes, cellular components, and molecular functions to chemicals, pathways, and proteins. Additionally, the edges between this ontology and its dependencies are utilized:

CL

NCBITaxon

RO

UBERON

Other Gene Ontology Data Used: goa_human.gaf.gz

Human Phenotype Ontology (HPO)

Homepage: https://hpo.jax.org/ Citation:

Köhler S, Carmody L, Vasilevsky N, Jacobsen JO, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Research. 2018;47(D1):D1018-27

Usage: Utilized to connect phenotypes to chemicals, diseases, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized:

CL

ChEBI

GO

UBERON

Files

Other Human Phenotype Ontology Data Used: phenotype.hpoa

Mondo Disease Ontology (Mondo)

Homepage: https://mondo.monarchinitiative.org/ Citation:

Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research. 2017;45(D1):D712-22

Usage: Utilized to connect diseases to chemicals, phenotypes, genes, and variants. Additionally, the edges between this ontology and its dependencies are utilized:

CL

NCBITaxon

GO

HPO

UBERON

Pathway Ontology (PW)

Homepage: rgd.mcw.edu Citation:

Petri V, Jayaraman P, Tutaj M, Hayman GT, Smith JR, De Pons J, Laulederkind SJ, Lowry TF, Nigam R, Wang SJ, Shimoyama M. The pathway ontology–updates and applications. Journal of Biomedical Semantics. 2014;5(1):7.

Usage: Utilized to connect pathways to GO biological processes, GO cellular components, GO molecular functions, Reactome pathways. Several steps are taken in order to connect Pathway Ontology identifiers to Reactome pathways and GO biological processes. To connect Pathway Ontology identifiers to Reactome pathways, we use ComPath Pathway Database Mappings developed by Daniel Domingo-Fernández (PMID:30564458).

Files

Downloaded Mapping Data

curated_mappings.txt

kegg_reactome.csv

Generated Mapping Data

REACTOME_PW_GO_MAPPINGS.txt

Protein Ontology (PRO)

Homepage: https://proconsortium.org/ Citation:

Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D’Eustachio P, Evsikov AV, Huang H, Nchoutmboube J. The Protein Ontology: a structured representation of protein forms and complexes. Nucleic Acids Research. 2010;39(suppl_1):D539-45

Usage: Utilized to connect proteins to chemicals, genes, anatomy, catalysts, cell lines, cofactors, complexes, GO biological processes, GO cellular components, GO molecular functions, pathways, proteins, reactions, and transcripts. Additionally, the edges between this ontology and its dependencies are utilized:

ChEBI

DOID

GO

Notes: A partial, human-only version of this ontology was used. Details on how this version of the ontology was generated can be found under the Protein Ontology section of the Data_Preparation.ipynb Jupyter Notebook.

Files

Generated Human Version Protein Ontology (PRO)

human_pro.owl (closed with hermit reasoner)

Other PRO Data Used: promapping.txt

Generated Mapping Data

Merged Gene, RNA, Protein Map: Merged_gene_rna_protein_identifiers.pkl

Ensembl Transcript-PRO Identifier Mapping: ENSEMBL_TRANSCRIPT_PROTEIN_ONTOLOGY_MAP.txt

Entrez Gene-PRO Identifier Mapping: ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt

UniProt Accession-PRO Identifier Mapping: UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt

STRING-PRO Identifier Mapping: STRING_PRO_ONTOLOGY_MAP.txt

Relations Ontology (RO)

Homepage: GitHub Citation:

Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biology. 2005;6(5):R46.

Usage: Utilizing this ontology to connect all data sources in knowledge graph. Additionally, the ontology is queried prior to building the knowledge graph to identify all relations, their inverse properties, and their labels.

Files

Generated RO Data

INVERSE_RELATIONS.txt

RELATIONS_LABELS.txt

Sequence Ontology (SO)

Homepage: GitHub Citation:

Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biology. 2005;6(5):R44

Usage: Utilized to connect transcripts and other genomic material like genes and variants.

Files

Generated Mapping Data

genomic_sequence_ontology_mappings.xlsx

SO_GENE_TRANSCRIPT_VARIANT_TYPE_MAPPING.txt

Uber-Anatomy Ontology (Uberon)

Homepage: GitHub Citation:

Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biology. 2012;13(1):R5

Usage: Utilized to connect tissues, fluids, and cells to proteins and transcripts. Additionally, the edges between this ontology and its dependencies are utilized:

ChEBI

CL

GO

PRO

Vaccine Ontology (VO)

Homepage: http://www.violinet.org/vaccineontology/ Citations:

He Y, Racz R, Sayers S, Lin Y, Todd T, Hur J, Li X, Patel M, Zhao B, Chung M, Ostrow J. Updates on the web-based VIOLIN vaccine database and analysis system. Nucleic Acids Research. 2013;42(D1):D1124-32

Xiang Z, Todd T, Ku KP, Kovacic BL, Larson CB, Chen F, Hodges AP, Tian Y, Olenzek EA, Zhao B, Colby LA. VIOLIN: vaccine investigation and online information network. Nucleic Acids Research. 2007;36(suppl_1):D923-8

Usage: Utilized the edges between this ontology and its dependencies:

ChEBI

DOID

GO

PRO

UBERON

DATABASE SOURCES

BioPortal

ClinVar

Comparative Toxicogenomics Database

DisGeNET

Ensembl

GeneMANIA

Genotype-Tissue Expression Project

Human Genome Organisation Gene Nomenclature Committee

Human Protein Atlas

National Center for Biotechnology Information Gene

Reactome Pathway Database

Search Tool for Recurring Instances of Neighbouring Genes Database

Universal Protein Resource Knowledgebase

BioPortal

Homepage: BioPortal Citation:

BioPortal. Lexical OWL Ontology Matcher (LOOM)

Ghazvinian A, Noy NF, Musen MA. Creating mappings for ontologies in biomedicine: simple methods work. In AMIA Annual Symposium Proceedings 2009 (Vol. 2009, p. 198). American Medical Informatics Association

Usage: BioPortal was utilized to obtain mappings between MeSH identifiers and ChEBI identifiers for chemicals-diseases, chemicals-genes, chemical-GO biological processes, chemicals-GO cellular components, chemicals-GO molecular functions, chemicals-phenotypes, chemicals-proteins, and chemicals-transcripts. Additional information on how this data was processed can be obtained
o
Output - M-Active
openaccessrepository.it
bin, png
Updated Apr 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anagnostou Anastasia; Anagnostou Anastasia (2025). Output - M-Active [Dataset]. http://doi.org/10.15161/oar.it/m019j-24769
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.15161/oar.it/m019j-24769
Dataset updated
Apr 18, 2025
Dataset provided by
oar
Authors
Anagnostou Anastasia; Anagnostou Anastasia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the output of the PALMS run with male active initial population.

To reproduce the physical activity trajectory graph, please follow the steps below:

1. Run PALMS (DOI: 10.15161/oar.it/23467) with input parameters M-Active (DOI: 10.15161/oar.it/23477).

2. To run the simulation use the PALMS OAR Reproducibility container (DOI: 10.15161/oar.it/23494).

3. The run generates five CSV files. For this graph, get the 'SimYear' (column A) and 'Avg PA status' (column H) records in the "AnnualPSA

4. The data above will reproduce the M-Active graph.
Citation Graph
kaggle.com
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/harvardlil/citation-graph/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2020
Dataset provided by
Kaggle
Authors
Caselaw Access Project
Description
Context

The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

Learn More: https://case.law/download/citation_graph/

Access Limits: https://case.law/api/#limits

Content

This dataset includes citations and metadata for the CAP citation graph in CSV format.

Acknowledgements

The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

Inspiration

People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

Cite Grid is the first visualization we've created based on data from our citation graph.

Have something to share? We're excited to hear about it.
o
Output - F-Inactive
openaccessrepository.it
bin, png
Updated Apr 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anagnostou Anastasia; Anagnostou Anastasia (2025). Output - F-Inactive [Dataset]. http://doi.org/10.15161/oar.it/r8wfp-vvt57
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.15161/oar.it/r8wfp-vvt57
Dataset updated
Apr 18, 2025
Dataset provided by
oar
Authors
Anagnostou Anastasia; Anagnostou Anastasia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the output of the PALMS run with female inactive initial population.

To reproduce the physical activity trajectory graph, please follow the steps below:

1. Run PALMS (DOI: 10.15161/oar.it/23467) with input parameters F-Inactive (DOI: 10.15161/oar.it/23471).

2. To run the simulation use the PALMS OAR Reproducibility container (DOI: 10.15161/oar.it/23494).

3. The run generates five CSV files. For this graph, get the 'SimYear' (column A) and 'Avg PA status' (column H) records in the "AnnualPSA

4. The data above will reproduce the F-Inactive graph.
NetVotes ENIC Dataset
zenodo.org
explore.openaire.eu
txt, zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6815510
Dataset updated
Oct 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

These results were used in the following conference papers:

I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25

I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

Citation. If you use our dataset or tool, please cite article [1] above.

@InProceedings{Mendonca2015,
author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
year = {2015},
pages = {122-129},
address = {Karlskrona, SE},
publisher = {IEEE Publishing},
doi = {10.1109/ENIC.2015.25},
}

-------------------------

Details. This archive contains the following folders:

`votewatch_data`: the raw data extracted from the VoteWatch website.

`VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.

`votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.

`intermediate_files`: this folder contains several CSV files:

`allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.

`loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).

`MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.

`policies.csv`: list of the topics considered during the term.

`qtd_docs.csv`: list of the topics with the corresponding number of documents.

`parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.

`output_files`: contains the file produced by our scripts.

`agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.

`community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.

`xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.

`community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.

`xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.

`xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.

`xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.

`graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.

`xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.

`xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).

`xxxx_graph.g`: network at the g format (for ILS).

`xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).

`xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.

`plots`: synthesis plots from the paper.

-------------------------

License. These data are shared under a Creative Commons 0 license.

Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>

Facebook

Twitter

Click to copy link

Link copied

Cite

Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015

Sample Graph Datasets in CSV Format

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14335015

Dataset updated

Dec 9, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Edwin Carreño; Edwin Carreño

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Sample Graph Datasets in CSV Format

Note: none of the data sets published here contain actual data, they are for testing purposes only.

Description

dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
dataset_30_edges_interactions.csv: contains 47 rows (edges).
the common identifier dataset_30 refers to the same graph.

CSV nodes

Each dataset contains the following columns:

Name of the Column	Type	Description
UniProt ID	string	protein identification
label	string	protein label (type of node)
properties	string	a dictionary containing properties related to the protein.

CSV edges

Each dataset contains the following columns:

Name of the Column	Type	Description
Relationship ID	string	relationship identification
Source ID	string	identification of the source protein in the relationship
Target ID	string	identification of the target protein in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_30*	30	47	Y
dataset_60*	60	181	Y
dataset_120*	120	689	Y
dataset_240*	240	2819	Y
dataset_300*	300	4658	Y
dataset_600*	600	18004	Y
dataset_1200*	1200	71785	Y
dataset_2400*	2400	288600	Y
dataset_3000*	3000	449727	Y
dataset_6000*	6000	1799413	Y
dataset_12000*	12000	7199863	Y
dataset_24000*	24000	28792361	Y
dataset_30000*	30000	44991744	Y

This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

CSV nodes (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	node identification
label	string	node label (type of node)
properties	string	a dictionary containing properties related to the node.

CSV edges (tiny graphs)

Each dataset contains the following columns:

Name of the Column	Type	Description
ID	string	relationship identification
source	string	identification of the source node in the relationship
target	string	identification of the target node in the relationship
label	string	relationship label (type of relationship)
properties	string	a dictionary containing properties related to the relationship.

Metadata (tiny graphs)

Graph	Number of Nodes	Number of Edges	Sparse graph
dataset_dummy*	3	6	N
dataset_dummy2*	3	6	N

Clear search

Close search

Google apps

Main menu

Sample Graph Datasets in CSV Format

Sample Graph Datasets in CSV Format

Description

CSV nodes

CSV edges

Metadata

CSV nodes (tiny graphs)

CSV edges (tiny graphs)

Metadata (tiny graphs)

Awesome Public Datasets as Neo4j Graph

Context

Content

Acknowledgements

Inspiration

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

European Mountain Territory and Value Chains: Knowledge Graphs, CSV, HTML,...

Graph inference datasets. Replication Data for: "Learning Functional Causal...

Semantic links between selected CSV datasets harvested by the European Data...

Datasets for manuscript: ADAM: A Web Platform for Graph-Based Modeling and...

Our World In Data - Dataset - waterdata

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Multi-agent Kidney Exchange Program: dataset for simulation along time...

Wikipedia time-series graph

Text-attributed Temporal Graph Benchmark Datasets

Evaluating SQuAD-based Question Answering for the Open Research Knowledge...

Data from: Data on the Construction Processes of Regression Models

Link-prediction on Biomedical Knowledge Graphs

PheKnowLator Human Disease Knowledge Graphs - Build Data (Original)

Output - M-Active

Citation Graph

Context

Content

Acknowledgements

Inspiration

Output - F-Inactive

NetVotes ENIC Dataset

Sample Graph Datasets in CSV FormatSee More Versions

Sample Graph Datasets in CSV Format

Description

CSV nodes

CSV edges

Metadata

CSV nodes (tiny graphs)

CSV edges (tiny graphs)

Metadata (tiny graphs)

Sample Graph Datasets in CSV Format