100+ datasets found

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset...
cryptodata.center
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cryptodata.center (2024). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-bitcoin-blockchain-part-2-of-4
Explore at:
Dataset updated
Dec 4, 2024
Dataset provided by
CryptoDATA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }
Wikipedia Knowledge Graph dataset
zenodo.org
produccioncientifica.ugr.es
+1more
pdf, tsv
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas (2024). Wikipedia Knowledge Graph dataset [Dataset]. http://doi.org/10.5281/zenodo.6346900
Explore at:
tsv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6346900
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wikipedia is the largest and most read online free encyclopedia currently existing. As such, Wikipedia offers a large amount of data on all its own contents and interactions around them, as well as different types of open data sources. This makes Wikipedia a unique data source that can be analyzed with quantitative data science techniques. However, the enormous amount of data makes it difficult to have an overview, and sometimes many of the analytical possibilities that Wikipedia offers remain unknown. In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.

There are a total of 9 files, all of them in tsv format, and they have been built under a relational structure. The main one that acts as the core of the dataset is the page file, after it there are 4 files with different entities related to the Wikipedia pages (category, url, pub and page_property files) and 4 other files that act as "intermediate tables" making it possible to connect the pages both with the latter and between pages (page_category, page_url, page_pub and page_link files).

The document Dataset_summary includes a detailed description of the dataset.

Thanks to Nees Jan van Eck and the Centre for Science and Technology Studies (CWTS) for the valuable comments and suggestions.

RoboFUSE-GNN-Dataset

kaggle.com

Updated May 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhammad Asfandyar Khan (2025). RoboFUSE-GNN-Dataset [Dataset]. https://www.kaggle.com/datasets/asfand59/robofuse-gnn-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 21, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Muhammad Asfandyar Khan

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

🚀 Project Summary

This dataset supports RoboFUSE-GNN, an uncertainty-aware Graph Neural Network designed for real-time collaborative perception in dynamic factory environments. The data was collected from a multi-robot radar setup in a Cyber-Physical Production System (CPPS). Each sample represents a spatial-semantic radar graph, capturing inter-robot spatial relationships and temporal dependencies through a sliding window graph formulation.

Scenario

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Fd0bd6be20d25441ddff17727f999f372%2Fphysical_setup_compressed-1.png?generation=1747997891229706&alt=media" alt=""> layout_01: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2F7706eca75693ebc76de2a49e6d49d7bb%2FLayout_01_setup-1.png?generation=1747997919528516&alt=media" alt=""> layout_02: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Ffca4f039613cd25dc4fafb1bc03a529d%2FLayout_02_setup-1.png?generation=1747997995497196&alt=media" alt=""> layout_03: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Faf235ecd458b371645f170ccbd72bf31%2FLayout_03_setup-1.png?generation=1747998010218223&alt=media" alt="">

📚 Dataset Description

Each sample in the dataset represents a radar graph snapshot composed of:

Nodes: Radar detections over a temporal window

Node features: Position, radar-specific attributes, and robot ID

Edges: Constructed using spatial proximity and inter-robot collaboration

Edge attributes: Relative motion, SNR, and temporal difference

Labels:

Node semantic classes (e.g., Robot, Workstation, Obstacle)

Edge labels indicating semantic similarity and collaboration type

📁 Folder Structure

RoboFUSE_Graphs/split/ ├── scene_000/ │ ├── 000.pt │ ├── 001.pt │ └── scene_metadata.json ├── scene_001/ │ ├── ... ├── ... └── scene_split_mapping.json

Each scene_XXX/ folder corresponds to a complete scenario and contains:

NNN.pkl: A Pickle file for the N-th graph frame

scene_metadata.json: Metadata including:

scene_name: Scenario identifier

scenario: Scenario Description

layout_name: Layout name (e.g., layout_01, layout_02, layout_03)

num_frames: Number of frames in the scene

frame_files: List of graph frame files

🧠 Graph Details

Each .pkl file contains a dictionary with the following:

Key	Description
`x`	Node features `[num_nodes, 10]`
`edge_index`	Connectivity matrix `[2, num_edges]`
`edge_attr`	Edge features `[num_edges, 5]`
`y`	Semantic node labels
`edge_class`	0 or 1 (edge label based on class similarity & distance)
`node_offsets`	Ground-truth regression to object center (used in clustering)
`cluster_node_idx`	List of node indices per object cluster
`cluster_labels`	Semantic class per cluster
`timestamp`	Frame timestamp (float)

🔧 Graph Construction Pipeline

The following steps were involved in creating the datatset:

Preprocessing:

- Points are filtered using SNR, Z height, and arena bounds
- Normalized radar features include SNR, range, angle, velocity

Sliding Window Accumulation:

- Temporal fusion over a window W improves robustness
- Used to simulate persistence and reduce sparsity

Nodes:

- Construct node features xi = [x, y, z, ŝ, r̂, sin(ϕ̂), cos(ϕ̂), sin(θ̂), cos(θ̂), robotID]
- Label nodes using MoCap-ground-truth footprints.

Edges:

- Built using KNN 
- Edge attributes eij = [Δx, Δy, Δz, ΔSNR, Δt]
- Edge Labels: 
    - 1 if nodes are of the same class and within a distance threshold
    - Includes **intra-robot** and **inter-robot** collaborative edges

🧪 Use Cases

Multi-robot perception and mapping
Semantic object detection
Graph-based reasoning in radar domains
Uncertainty-aware link prediction

nasa-eo-knowledge-graph
huggingface.co
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC) (2025). nasa-eo-knowledge-graph [Dataset]. http://doi.org/10.57967/hf/3463
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/3463
Dataset updated
Sep 10, 2025
Dataset provided by
NASAhttp://nasa.gov/
Authors
NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC)
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
NASA Knowledge Graph Dataset

Dataset Summary

The NASA Knowledge Graph Dataset is an expansive graph-based dataset designed to integrate and interconnect information about satellite datasets, scientific publications, instruments, platforms, projects, data centers, and science keywords. This knowledge graph is particularly focused on datasets managed by NASA's Distributed Active Archive Centers (DAACs), which are NASA's data repositories responsible for archiving and… See the full description on the dataset page: https://huggingface.co/datasets/nasa-gesdisc/nasa-eo-knowledge-graph.
Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData...
cryptodata.center
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cryptodata.center (2024). Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-ethereum-blockchain
Explore at:
Dataset updated
Dec 4, 2024
Dataset provided by
CryptoDATA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Уникальный идентификатор https://doi.org/10.5281/zenodo.4718440 Набор данных обновлен Dec 19, 2022 Набор данных предоставлен Zenodo Авторы Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç Лицензия Attribution 4.0 (CC BY 4.0) Информация о лицензии была получена автоматически Описание This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain. Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward (static block reward) and "uncle block inclusion reward" are added as transactions to the dataset. Transaction fee reward and "uncles reward" are not currently included in the dataset. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: eth-tx- where For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from block 1000000 to block 1099999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20 contract) are indicated by token symbol. For example GUSD is Gemini USD stable coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens. Failed transactions are prefixed with "F-". BLOCK TIME FORMAT: The block time file has the following format: erc20tokens.json FILE: This file contains the list of popular ERC20 token contracts whose transfer/transferFrom transactions appear in the data files. ERC20 token list: USDT TRYb XAUt BNB LEO LINK HT HEDG MKR CRO VEN INO PAX INB SNX REP MOF ZRX SXP OKB XIN OMG SAI HOT DAI EURS HPT BUSD USDC SUSD HDG QCAD PLUS BTCB WBTC cWBTC renBTC sBTC imBTC pBTC IMPORTANT NOTE: Public Ethereum Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as http://etherscan.io . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0 @article{kilic2022parallel, title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing}, journal={Cluster Computing}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and Sen, Alper}, year={2022}, month={Jan} }
Z
Dataset used for "A Recommender System of Buggy App Checkers for App Store...
data.niaid.nih.gov
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Gomez (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
Explore at:
Dataset updated
Jun 28, 2021
Dataset provided by
Maria Gomez
Romain Rouvoy
Lionel Seinturier
Martin Monperrus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

Dataset Stats Some stats about the datasets:

D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

Additional stats about the datasets are available here.

Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

USES_PERMISSION relationships between APP and PERMISSION nodes

HAS_REVIEW between APP and USER_REVIEW nodes

HAS_TOPIC between USER_REVIEW and TOPIC nodes

BELONGS_TO_CATEGORY between APP and CATEGORY nodes

BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

Dataset Files Info

Neo4j 2.0 Databases

googlePlayDB1-Jan2014_neo4j_2_0.rar

googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

Neo4j 3.5 Databases

googlePlayDB1-Jan2014_neo4j_3_5_28.rar

googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide. First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
m
Nihon M&A Center Inc - Net-Income
macro-rankings.com
csv, excel
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
macro-rankings (2025). Nihon M&A Center Inc - Net-Income [Dataset]. https://www.macro-rankings.com/markets/stocks/2127-tse/income-statement/net-income
Explore at:
csv, excelAvailable download formats
Dataset updated
Sep 1, 2025
Dataset authored and provided by
macro-rankings
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
japan
Description
Net-Income Time Series for Nihon M&A Center Inc. Nihon M&A Center Holdings Inc. provides mergers and acquisition (M&A) related services in Japan and internationally. The company offers M&A support services, such as reorganization, capital policies, and MBO for small and medium-sized enterprises. Nihon M&A Center Holdings Inc. was incorporated in 1991 and is headquartered in Tokyo, Japan.
d
Roads, Digital Line Graphs (1:24,000)
catalog.data.gov
Updated Feb 1, 2001
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
West Virginia GIS Technical Center (Point of Contact) (2001). Roads, Digital Line Graphs (1:24,000) [Dataset]. https://catalog.data.gov/ru/dataset/roads-digital-line-graphs-1-24000
Explore at:
Dataset updated
Feb 1, 2001
Dataset provided by
West Virginia GIS Technical Center (Point of Contact)
Description
This metadata is meant to describe the roads layer of DLGs, but may contain information on other DLG layers. Digital line graph (DLG) data are digital representations of cartographic information. DLG's of map features are converted to digital form from maps and related sources. Large-scale DLG data are available from the West Virginia University GIS Technical Center in six categories: (1) hypsography, (2) hydrography, (3) boundaries, (4) miscellaneous transportation (pipe and transmission lines), (5) roads, (6) railroads. All DLG data distributed by the USGS are DLG - Level 3 (DLG-3), which means the data contain a full range of attribute codes, have full topological structuring, and have passed certain quality-control checks. The files available from the West Virginia University GIS Technical Center are full dlg3 (USGS approved). Files are currently available in dlg and E00 format.
h
Polymed-QA
huggingface.co
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HPAI@BSC (High Performance Artificial Intelligence at Barcelona Supercomputing Center) (2024). Polymed-QA [Dataset]. https://huggingface.co/datasets/HPAI-BSC/Polymed-QA
Explore at:
Dataset updated
Nov 4, 2024
Dataset provided by
Barcelona Supercomputing Centerhttps://www.bsc.es/
Authors
HPAI@BSC (High Performance Artificial Intelligence at Barcelona Supercomputing Center)
License
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Description
Polymed-QA

Synthetically generated QA pairs from the Polymed dataset. Used to train Aloe-Beta model.

Dataset Details Dataset Description

PolyMed is a dataset developed to improve Automatic Diagnosis Systems(ADS). This dataset incorporates medical knowledge graph data and diagnosis case data to provide comprehensive evaluation, diverse disease information, effective… See the full description on the dataset page: https://huggingface.co/datasets/HPAI-BSC/Polymed-QA.
c
ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset...
cryptodata.center
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/orbitaal-comprehensive-bitcoin-dataset-for-temoral-graph-analysis
Explore at:
Dataset updated
Dec 4, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Construction This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/ [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain}, Dataset Description Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021 Overview: This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs. Every dates have been retrieved from bloc UNIX timestamp and GMT timezone. Contents: The dataset is distributed across three compressed archives: All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package. orbitaal-stream_graph.tar.gz: The root directory is STREAM_GRAPH/ Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes). The stream graph is divided into 13 files, one for each year Files format is parquet Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory STREAM_GRAPH/EDGES/ orbitaal-snapshot-all.tar.gz: The root directory is SNAPSHOT/ Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021). Files format is parquet Name format is orbitaal-snapshot-all.snappy.parquet. These files are in the subdirectory SNAPSHOT/EDGES/ALL/ orbitaal-snapshot-year.tar.gz: The root directory is SNAPSHOT/ Contains the yearly resolution of snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory SNAPSHOT/EDGES/year/ orbitaal-snapshot-month.tar.gz: The root directory is SNAPSHOT/ Contains the monthly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering These files are in the subdirectory SNAPSHOT/EDGES/month/ orbitaal-snapshot-day.tar.gz: The root directory is SNAPSHOT/ Contains the daily resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering These files are in the subdirectory SNAPSHOT/EDGES/day/ orbitaal-snapshot-hour.tar.gz: The root directory is SNAPSHOT/ Contains the hourly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering These files are in the subdirectory SNAPSHOT/EDGES/hour/ orbitaal-nodetable.tar.gz: The root directory is NODE_TABLE/ Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses. Small samples in CSV format orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv These two CSV files are related to stream graph representations of an halvening happening in 2016.
Z
OpenAIRE Graph: Dataset of funded products
data.niaid.nih.gov
Updated Jan 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lempesis, Antonis (2025). OpenAIRE Graph: Dataset of funded products [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4559725
Explore at:
Dataset updated
Jan 13, 2025
Dataset provided by
De Bonis, Michele
Foufoulas, Ioannis
Dimitropoulos, Harry
Horst, Marek
Vergoulis, Thanasis
Bardi, Alessia
Mannocci, Andrea
Atzori, Claudio
Artini, Michele
Baglioni, Miriam
Ioannidis, Alexandros
Manghi, Paolo
Chatzopoulos, Serafeim
Lempesis, Antonis
La Bruzzo, Sandro
Kokogiannaki, Argiro
Kiatropoulou, Katerina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the metadata records about research products (research literature, data, software, other types of research products) with funding information available in the OpenAIRE Graph produced on July 2024.Records are grouped by funder in a dedicated archive file (.tar).

fundRef contains the following funders

100007490 Bausch and Lomb Ireland

100007630 College of Engineering and Informatics, National University of Ireland, Galway

100007731 Endo International

100007819 Allergan

100008099 Food Safety Authority of Ireland

100008124 Department of Jobs, Enterprise and Innovation

100008303 Department for Economics, Northern Ireland

100009098 Department of Foreign Affairs and Trade, Ireland

100009099 Irish Aid

100009770 National University of Ireland

100010399 European Society of Cataract and Refractive Surgeons

100010546 Deparment of Children and Youth Affairs, Ireland

100010547 Irish Youth Justice Service

100010993 Irish Nephrology Society

100011096 Jazz Pharmaceuticals

100011396 Irish College of General Practitioners

100012733 National Parks and Wildlife Service

100012734 Department for Culture, Heritage and the Gaeltacht, Ireland

100012754 Horizon Pharma

100012891 Medical Research Charities Group

100012919 Epilepsy Ireland

100012920 GLEN

100012921 Royal College of Surgeons in Ireland

100013029 Iris O'Brien Foundation

100013206 Food Institutional Research Measure

100013381 Irish Phytochemical Food Network

100013433 Transport Infrastructure Ireland

100013917 Society for Musicology in Ireland

100014251 Humanities in the European Research Area

100014364 National Children's Research Centre

100014384 Amarin Corporation

100014902 Irish Association for Cancer Research

100015023 Ireland Funds

100015278 Pfizer Healthcare Ireland

100015319 Sport Ireland Institute

100015442 Global Brain Health Institute

100015992 St. Luke's Institute of Cancer Research

100017144 Shell E and P Ireland

100017897 Friedreich’s Ataxia Research Alliance Ireland

100018064 Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media

100018172 Department of the Environment, Climate and Communications

100018175 Dairy Processing Technology Centre

100018270 Health Service Executive

100018529 Alkermes

100018542 Irish Endocrine Society

100018754 An Roinn Sláinte

100019428 Nabriva Therapeutics

100019637 Horizon Therapeutics

100020174 Health Research Charities Ireland

100020202 UCD Foundation

100020233 Ireland Canada University Foundation

100022895 Health Research Institute, University of Limerick

100022943 National Cancer Registry Ireland

501100001581 Arts Council of Ireland

501100001582 Centre for Ageing Research and Development in Ireland

501100001583 Cystinosis Foundation Ireland

501100001584 Department of Agriculture, Food and the Marine, Ireland

501100001586 Department of Education and Skills, Ireland

501100001587 Economic and Social Research Institute

501100001588 Enterprise Ireland

501100001591 Heritage Council

501100001592 Higher Education Authority

501100001593 Irish Cancer Society

501100001594 Irish Heart Foundation

501100001595 Irish Hospice Foundation

501100001596 Irish Research Council for Science, Engineering and Technology

501100001598 Mental Health Commission

501100001599 National Council for Forest Research and Development

501100001600 Research and Education Foundation, Sligo General Hospital

501100001601 Royal Irish Academy

501100001603 Sustainable Energy Authority of Ireland

501100001604 Teagasc

501100001627 Marine Institute

501100001628 Central Remedial Clinic

501100001629 Royal Dublin Society

501100001630 Dublin Institute for Advanced Studies

501100001631 University College Dublin

501100001633 National University of Ireland, Maynooth

501100001634 University of Galway

501100001635 University of Limerick

501100001636 University College Cork

501100001637 Trinity College Dublin

501100001638 Dublin City University

501100002736 Covidien

501100002755 Brennan and Company

501100002919 Cork Institute of Technology

501100002959 Dublin City Council

501100003036 Perrigo Company Charitable Foundation

501100003037 Elan

501100003496 HeyStaks Technologies

501100003553 Gaelic Athletic Association

501100003840 Irish Institute of Clinical Neuroscience

501100003956 Aspect Medical Systems

501100004162 Meath Foundation

501100004210 Our Lady's Children's Hospital, Crumlin

501100004321 Shire

501100004981 Athlone Institute of Technology

501100006518 Department of Communications, Energy and Natural Resources, Ireland

501100006553 Collaborative Centre for Applied Nanotechnology

501100006554 IDA Ireland

501100006759 CLARITY Centre for Sensor Web Technologies

501100009246 Technological University Dublin

501100009269 Programme of Competitive Forestry Research for Development

501100009315 Cystinosis Ireland

501100010808 Geological Survey of Ireland

501100011030 Alimentary Glycoscience Research Cluster

501100011031 Alimentary Health

501100011103 Rannís

501100011626 Energy Policy Research Centre, Economic and Social Research Institute

501100012354 Inland Fisheries Ireland

501100014384 X-Bolt Orthopaedics

501100014531 Physical Education and Sport Sciences Department, University of Limerick

501100014710 PrecisionBiotics Group

501100014745 APC Microbiome Institute

501100014826 ADAPT - Centre for Digital Content Technology

501100014827 Dormant Accounts Fund

501100017501 FotoNation

501100018641 Dairy Research Ireland

501100018839 Irish Centre for High-End Computing

501100019905 Galway University Foundation

501100020270 Advanced Materials and Bioengineering Research

501100020403 Irish Composites Centre

501100020425 Irish Thoracic Society

501100020570 College of Medicine, Nursing and Health Sciences, National University of Ireland, Galway

501100020871 Bernal Institute, University of Limerick

501100021102 Waterford Institute of Technology

501100021110 Irish MPS Society

501100021525 Insight SFI Research Centre for Data Analytics

501100021694 Elan Pharma International

501100021838 Royal College of Physicians of Ireland

501100022542 Breakthrough Cancer Research

501100022610 Breast Cancer Ireland

501100022728 Munster Technological University

501100023273 HRB Clinical Research Facility Galway

501100023551 Cystic Fibrosis Ireland

501100023970 Tyndall National Institute

501100024242 Synthesis and Solid State Pharmaceutical Centre

501100024313 Irish Rugby Football Union

501100024834 Tusla - Child and Family Agency

AKA Academy of Finland

ANR French National Research Agency (ANR)

ARC Australian Research Council (ARC)

ASAP Aligning Science Across Parkinson's

CHISTERA CHIST-ERA

CIHR Canadian Institutes of Health Research

EC_ERASMUS+ European Commission - Erasmus+ funding stream

EC_FP7 European Commission - FP7 funding stream

EC_H2020 European Commission - H2020 funding stream

EC_HE European Commission - HE funding stream

EEA European Environment Agency

EPA Environmental Protection Agency

FCT Fundação para a Ciência e a Tecnologia, I.P.

FWF Austrian Science Fund

HRB Health Research Board

HRZZ Croatian Science Foundation

INCA Institut National du Cancer

IRC Irish Research Council

IReL Irish Research eLibrary

MESTD Ministry of Education, Science and Technological Development of Republic of Serbia

MZOS TOADDNAME

NHMRC National Health and Medical Research Council (NHMRC)

NIH National Institutes of Health

NSERC Natural Sciences and Engineering Research Council of Canada

NSF National Science Foundation

NWO Netherlands Organisation for Scientific Research (NWO)

SFI Science Foundation Ireland

SNSF Swiss National Science Foundation

SSHRC Social Sciences and Humanities Research Council

TARA Tara Expeditions Foundation

TIBITAK Türkiye Bilimsel ve Teknolojik Araştırma Kurumu

UKRI UK Research and Innovation

WT Wellcome Trust

Each tar archive contains gzip files with one json record per line. Json records are compliant with the schema available at https://doi.org/10.5281/zenodo.14608710.

You can also search and browse this dataset (and more) in the OpenAIRE EXPLORE portal and via the OpenAIRE API.
4
Event Graph of BPI Challenge 2016
data.4tu.nl
zip
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dirk Fahland; Stefan Esser (2021). Event Graph of BPI Challenge 2016 [Dataset]. http://doi.org/10.4121/14164220.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/14164220.v1
Dataset updated
Apr 22, 2021
Dataset provided by
4TU.ResearchData
Authors
Dirk Fahland; Stefan Esser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents
-------------

neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab
UWV (Employee Insurance Agency) is an autonomous administrative authority (ZBO) and is commissioned by the Ministry of Social Affairs and Employment (SZW) to implement employee insurances and provide labour market and data services in the Netherlands. The Dutch employee insurances are provided for via laws such as the WW (Unemployment Insurance Act), the WIA (Work and Income according to Labour Capacity Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA (Return to Work (Partially Disabled) Regulations), the Wajong (Disablement Assistance Act for Handicapped Young Persons), the WAO (Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits Act. The data in this collection pertains to customer contacts over a period of 8 months and UWV is looking for insights into their customers' journeys. Data has been collected from several different sources, namely: 1) Clickdata from the site www.werk.nl collected from visitors that were not logged in, 2) Clickdata from the customer specific part of the site www.werk.nl (a link is made with the customer that logged in), 3) Werkmap Message data, showing when customers contacted the UWV through a digital channel, 4) Call data from the callcenter, showing when customers contacted the call center by phone, and 5) Complaint data showing when customers complained. All data is accompanied by data fields with anonymized information about the customer as well as data about the site visited or the contents of the call and/or complaint. The texts in the dataset are provided in both Dutch and English where applicable. URL's are included based on the structure of the site during the period the data has been collected. UWV is interested in insights on how their channels are being used, when customers move from one contact channel to the next and why and if there are clear customer profiles to be identified in the behavioral data. Furthermore, recommendations are sought on how to serve customers without the need to change the contact channel.
The data contains the following entities and their events

- Customer - customer of a Dutch public agency for handling unemployment benefits
- Office_U - user or worker involved in an activity handling a customer interaction
- Office_W - user or worker involved in an activity handling a customer interaction
- Complaint - a complaint document handed in by a customer
- ComplaintDossier - a collection of complaints by the same customer
- Session - browser-session identifier of a user browsing the website of the agency
- IP - IP address of a user browsing the website of the agency

Data Size
---------

BPIC16, nodes: 8109680, relationships: 86833139
u
HCHSGraphXplore: Visualizing Complex Medical Data with Knowledge Graphs
fdr.uni-hamburg.de
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Bellmann; Alexander Johannes Wiederhold; Leona Trübe; Raphael Twerenbold; Frank Ückert; Karl Gottfried; Alexander Johannes Wiederhold; Leona Trübe (2023). HCHSGraphXplore: Visualizing Complex Medical Data with Knowledge Graphs [Dataset]. http://doi.org/10.25592/uhhfdm.12136
Explore at:
Unique identifier
https://doi.org/10.25592/uhhfdm.12136
Dataset updated
May 4, 2023
Dataset provided by
University Medical Center Hamburg-Eppendorf, University Heart & Vascular Center Hamburg, Hamburg, Germany
University Medical Center Hamburg-Eppendorf, Institute for Applied Medical Informatics, Christoph-Probst-Weg 1, 20251 Hamburg, Germany
Authors
Louis Bellmann; Alexander Johannes Wiederhold; Leona Trübe; Raphael Twerenbold; Frank Ückert; Karl Gottfried; Alexander Johannes Wiederhold; Leona Trübe
Description
This dataset capures statistical analysis of the HCHS cohort study using a knowledge graph and dashboard. Properties of 10,000 participants were analyzed for their association with cardiovascular disease as well as for their relationships among each other. The data is presented in the form of Neo4J database dumps and can be explored following the given user guide.
G
Graph Database Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Graph Database Market Report [Dataset]. https://www.marketresearchforecast.com/reports/graph-database-market-5306
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Graph Database Market size was valued at USD 1.9 USD billion in 2023 and is projected to reach USD 7.91 USD billion by 2032, exhibiting a CAGR of 22.6 % during the forecast period. A graph database is one form of NoSQL database that contains and represents relationships as graphs. Graph databases do not presuppose the data as relations as most contemporary relational databases do, applying nodes, edges, and properties instead. The primary types include property graphs that permit attributes on the nodes and edges and RDF triplestores that center on subject-predicate-object triplets. Some of the features include; the method's ability to traverse relationships at high rates, the schema change is easy and the method is scalable. Some of the familiar use cases are social media, recommendations, anomalies or fraud detection, and knowledge graphs where the relationships are complex and require higher comprehension. These databases are considered valuable where the future connection between the items of data is as significant as the data themselves. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
H
Data from: Kids Count Data Center
dataverse.harvard.edu
Updated Feb 23, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2011). Kids Count Data Center [Dataset]. http://doi.org/10.7910/DVN/DLA2Q2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DLA2Q2
Dataset updated
Feb 23, 2011
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can customize tables, graphs and maps on data related to children in a specific state or in the United States as a whole. Comparisons can be made between states. Background KIDS COUNT Data Center is part of the Annie E. Casey Foundation and serves to provide information on the status of children in America. The ten core indicators of interest under "Data by State" are: percent of low birth weight babies, infant mortality rate, child death rate, rate of teen deaths by accident, suicide and homicide, teen birth rate, percent of children living with parents who do not have full-time year-round employment, percent of teens who are high school drop outs, percent of teens not working and not in school, percent of children in poverty, and percent of families with children headed by a single parent. A number of other indicators, plus demographic and income information, are also included. "Data across States" is grouped into the following broad categories: demographics, education, economic well-being, family and community, health, safety and risk behaviors, and other. User Functionality Users can determine the view of the data- by table, line graph or map and can print or email the results. Data is available by state and across states. Data Across States allows users to access the raw data. Data is often present over a number of years. For a number of indicators under "Data Across States," users can view results by age, gender/ sex, or race/ ethnicity. Data Notes KIDS COUNT started in 1990. The most recent year of data is 2009 (or 2008 depending on the state, with some data available from 2010). Data is available on the national and state level, and for some states, at the county and city level.
UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and...
zenodo.org
application/gzip, bin +2
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration; University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration (2024). UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and Biodiversity Dataset Graph hash://md5/10663911550bb52a0f5741993f82db9d hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c [Dataset]. http://doi.org/10.5281/zenodo.5660088
Explore at:
application/gzip, json, bin, jpegAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5660088
Dataset updated
Dec 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration; University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Santa Barbara
Description
A biodiversity dataset graph: UCSB-IZC

The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.

This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].

This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].

The images were counted using:

$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
| grep -o -P ".*depict"\
| sort\
| uniq\
| wc -l

And the occurrences were counted using:

$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
| grep -o -P "occurrence/([0-9])+"\
| sort\
| uniq\
| wc -l

The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.

To retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:

$ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5660088/files/

After that, verify the index of the archive by reproducing the following provenance log history:

$ java -jar preston.jar history

To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

$ java -jar preston.jar verify
hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c OK CONTENT_PRESENT_VALID_HASH 66438 hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c
hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 OK CONTENT_PRESENT_VALID_HASH 4093 hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844
hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef OK CONTENT_PRESENT_VALID_HASH 5746 hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef
hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b OK CONTENT_PRESENT_VALID_HASH 6147 hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b

Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

Files in this data publication:

--- start of file descriptions ---

-- description of archive and its contents (this file) --
README

-- executable java jar containing preston [2,3] v0.3.1. --
preston.jar

-- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --
preston-[00-ff].tar.gz

-- individual provenance index files --
2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a

-- example image and meta-data --
sample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)
sample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)

--- end of file descriptions ---

References

[1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.
[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .
[3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132
[4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c
S
Nuclear Data Multimodal Knowledge Graph Construction Dataset
scidb.cn
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Yiqi; Shi Rui (2025). Nuclear Data Multimodal Knowledge Graph Construction Dataset [Dataset]. http://doi.org/10.57760/sciencedb.25111
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.25111
Dataset updated
May 15, 2025
Dataset provided by
Science Data Bank
Authors
Wei Yiqi; Shi Rui
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Retrieve semi-structured text data ENSDF from the National Nuclear Data Center through web crawlers and clean the obtained ENSDF data. Subsequently, the cleaned ENSDF data was parsed using the Nuclei tool to generate the Decay Scheme image data. A total of 18186 entities and 18983 entity pair relationships were annotated using roLabelImg and self-made tools. The dataset is divided into a test set and a training set.
m
Generated Prediction Data of COVID-19's Daily Infections in Brazil
data.mendeley.com
Updated Jul 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Hawas (2020). Generated Prediction Data of COVID-19's Daily Infections in Brazil [Dataset]. http://doi.org/10.17632/t2zk3xnt8y.1
Explore at:
Unique identifier
https://doi.org/10.17632/t2zk3xnt8y.1
Dataset updated
Jul 12, 2020
Authors
Mohamed Hawas
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Area covered
Brazil
Description
Dataset general description:

• This dataset reports 4195 recurrent neural network models, their settings, and their generated prediction csv files, graphs, and metadata files, for predicting COVID-19's daily infections in Brazil by training on limited raw data (30 time-steps and 40 time-steps alternatives). The used code is developed by the author and located in the following online data repository link: http://dx.doi.org/10.17632/yp4d95pk7n.1

Dataset content:

• Models, Graphs, and csv predictions files: 1. Deterministic mode (DM): includes 1194 generated models files (30 time-steps), and their generated 2835 graphs and 2835 predictions files. Similarly, this mode includes 1976 generated model files (40 time-steps), and their generated 7301 graphs and 7301 predictions files. 2. Non-deterministic mode (NDM): includes 20 generated model files (30 time-steps), and their generated 53 graphs and 53 predictions files. 3. Technical validation mode (TVM): includes 1001 generated model files (30 time-steps), and their generated 3619 graphs and 3619 predictions files for 358 models, which are a sample of 1001 models. Also, 1 model in control group for India. 4. 1 graph and 1 prediction files for each of DM and NDM, reporting evaluation till 2020-07-11.

• Settings and metadata for the above 3 categories: 1. Used settings in json files for reproducibility. 2. Metadata about training and prediction setup and accuracy in csv files.

Raw data source that was used to train the models:

• The used raw data for training the models is from: COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University): https://github.com/CSSEGISandData/COVID-19

• The models were trained on these versions of the raw data: 1. Link till 2020-06-29 (accessed 2020-07-08): https://github.com/CSSEGISandData/COVID-19/raw/78d91b2dbc2a26eb2b2101fa499c6798aa22fca8/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv 2. Link till 2020-06-13 (accessed 2020-07-08): https://github.com/CSSEGISandData/COVID-19/raw/02ea750a263f6d8b8945fdd3253b35d3fd9b1bee/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

License: This prediction Dataset is licensed under CC BY NC 3.0.

Notice and disclaimer: 1- This prediction Dataset is for scientific and research purposes only. 2- The generation of this Dataset complies with the terms of use of the publicly available raw data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: https://github.com/CSSEGISandData/COVID-19 and therefore, the author of the prediction Dataset disclaims any and all responsibility and warranties regarding the contents of used raw data, including but not limited to: the correctness, completeness, and any issues linked to third-party rights.
Data from: KGCW 2023 Challenge @ ESWC 2023
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias (2023). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. http://doi.org/10.5281/zenodo.7689310
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7689310
Dataset updated
May 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge Graph Construction Workshop 2023: challenge

Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.

Task description

The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.

We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.

Part 1: Knowledge Graph Construction Parameters

These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.

Data

Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).

Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).

Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).

Number of input files: scaling the number of datasets (1, 5, 10, 15).

Mappings

Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).

Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).

Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

Part 2: GTFS-Madrid-Bench

The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.

Scaling

GTFS-1 SQL

GTFS-10 SQL

GTFS-100 SQL

GTFS-1000 SQL

Heterogeneity

GTFS-100 XML + JSON

GTFS-100 CSV + XML

GTFS-100 CSV + JSON

GTFS-100 SQL + XML + JSON + CSV

Example pipeline

The ground truth dataset and baseline results are generated in different steps
for each parameter:

The provided CSV files and SQL schema are loaded into a MySQL relational database.

Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format.

The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation.

The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso.

The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Query timeout is set to 1 hour and knowledge graph construction timeout
to 24 hours. The execution is performed with the following tool

Each parameter has its own directory in the ground truth dataset with the
following files:

Input dataset as CSV.

Mapping file as RML.

Queries as SPARQL.

Execution plan for the pipeline in metadata.json.

Datasets

Knowledge Graph Construction Parameters

The dataset consists of:

Input dataset as CSV for each parameter.

Mapping file as RML for each parameter.

SPARQL queries to retrieve the results for each parameter.

Baseline results for each parameter with the example pipeline.

Ground truth dataset for each parameter generated with the example pipeline.

Format

All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.

GTFS-Madrid-Bench

The dataset consists of:

Input dataset as CSV with SQL schema for the scaling and a combination of XML,

CSV, and JSON is provided for the heterogeneity.

Mapping file as RML for both scaling and heterogeneity.

SPARQL queries to retrieve the results.

Baseline results with the example pipeline.

Ground truth dataset generated with the example pipeline.

Format

CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.

Evaluation criteria

Submissions must evaluate the following metrics:

Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.

CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.

Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

Expected output

Duplicate values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500020 triples
50 percent 1000020 triples
75 percent 500020 triples
100 percent 20 triples

Empty values

Scale Number of Triples
0 percent 2000000 triples
25 percent 1500000 triples
50 percent 1000000 triples
75 percent 500000 triples
100 percent 0 triples

Mappings

Scale Number of Triples
1TM + 15POM 1500000 triples
3TM + 5POM 1500000 triples
5TM + 3POM 1500000 triples
15TM + 1POM 1500000 triples

Properties

Scale Number of Triples
1M rows 1 column 1000000 triples
1M rows 10 columns 10000000 triples
1M rows 20 columns 20000000 triples
1M rows 30 columns 30000000 triples

Records

Scale Number of Triples
10K rows 20 columns 200000 triples
100K rows 20 columns 2000000 triples
1M rows 20 columns 20000000 triples
10M rows 20 columns 200000000 triples

Joins

1-1 joins

Scale Number of Triples
0 percent 0
w
VT USGS Digital Line Graph Surface Waters - area polygons
data.wu.ac.at
geodata.vermont.gov
+1more
Updated Apr 26, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vermont Center for Geographic Information (2018). VT USGS Digital Line Graph Surface Waters - area polygons [Dataset]. https://data.wu.ac.at/schema/data_gov/NTlkMTEyZGQtYjlkZC00ZjVjLTgzOTctZjIyNzRjNzM0MGQ0
Explore at:
zip, json, html, csv, application/vnd.ogc.wms_xml, kml, application/vnd.geo+jsonAvailable download formats
Dataset updated
Apr 26, 2018
Dataset provided by
Vermont Center for Geographic Information
Area covered
2f073ebeaa3c5372fd3650be4e7b80dd4972444c
Description
(Link to Metadata) The WaterHydro_DLGSW layer represents surface waters (hydrography) at a scale of RF 100000. WaterHydro_DLGSW was derived from RF100000 USGS Digital Line Graph (DLG). DLG's of map features are converted to digital form from maps and related sources. Refer to the USGS web site from more information on DLGs (http://www.usgs.gov)

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500020 triples
50 percent	1000020 triples
75 percent	500020 triples
100 percent	20 triples

Scale	Number of Triples
0 percent	2000000 triples
25 percent	1500000 triples
50 percent	1000000 triples
75 percent	500000 triples
100 percent	0 triples

Scale	Number of Triples
1TM + 15POM	1500000 triples
3TM + 5POM	1500000 triples
5TM + 3POM	1500000 triples
15TM + 1POM	1500000 triples

Facebook

Twitter

Click to copy link

Link copied

Cite

cryptodata.center (2024). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-bitcoin-blockchain-part-2-of-4

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub

Explore at:

Dataset updated

Dec 4, 2024

Dataset provided by

CryptoDATA

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }

Clear search

Close search

Google apps

Main menu

Scale	Number of Triples
1M rows 1 column	1000000 triples
1M rows 10 columns	10000000 triples
1M rows 20 columns	20000000 triples
1M rows 30 columns	30000000 triples

Scale	Number of Triples
10K rows 20 columns	200000 triples
100K rows 20 columns	2000000 triples
1M rows 20 columns	20000000 triples
10M rows 20 columns	200000000 triples

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset...

Wikipedia Knowledge Graph dataset

RoboFUSE-GNN-Dataset

🚀 Project Summary

Scenario

📚 Dataset Description

📁 Folder Structure

🧠 Graph Details

🔧 Graph Construction Pipeline

🧪 Use Cases

nasa-eo-knowledge-graph

Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData...

Dataset used for "A Recommender System of Buggy App Checkers for App Store...

Nihon M&A Center Inc - Net-Income

Roads, Digital Line Graphs (1:24,000)

Polymed-QA

ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset...

OpenAIRE Graph: Dataset of funded products

Event Graph of BPI Challenge 2016

HCHSGraphXplore: Visualizing Complex Medical Data with Knowledge Graphs

Graph Database Market Report

Data from: Kids Count Data Center

UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and...

Nuclear Data Multimodal Knowledge Graph Construction Dataset

Generated Prediction Data of COVID-19's Daily Infections in Brazil

Data from: KGCW 2023 Challenge @ ESWC 2023

VT USGS Digital Line Graph Surface Waters - area polygons

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData HubSee More Versions

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub