100+ datasets found
  1. Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-bitcoin-blockchain-part-2-of-4
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }

  2. Wikipedia Knowledge Graph dataset

    • zenodo.org
    • produccioncientifica.ugr.es
    • +1more
    pdf, tsv
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas (2024). Wikipedia Knowledge Graph dataset [Dataset]. http://doi.org/10.5281/zenodo.6346900
    Explore at:
    tsv, pdfAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Wikipedia is the largest and most read online free encyclopedia currently existing. As such, Wikipedia offers a large amount of data on all its own contents and interactions around them, as well as different types of open data sources. This makes Wikipedia a unique data source that can be analyzed with quantitative data science techniques. However, the enormous amount of data makes it difficult to have an overview, and sometimes many of the analytical possibilities that Wikipedia offers remain unknown. In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.

    There are a total of 9 files, all of them in tsv format, and they have been built under a relational structure. The main one that acts as the core of the dataset is the page file, after it there are 4 files with different entities related to the Wikipedia pages (category, url, pub and page_property files) and 4 other files that act as "intermediate tables" making it possible to connect the pages both with the latter and between pages (page_category, page_url, page_pub and page_link files).

    The document Dataset_summary includes a detailed description of the dataset.

    Thanks to Nees Jan van Eck and the Centre for Science and Technology Studies (CWTS) for the valuable comments and suggestions.

  3. RoboFUSE-GNN-Dataset

    • kaggle.com
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Asfandyar Khan (2025). RoboFUSE-GNN-Dataset [Dataset]. https://www.kaggle.com/datasets/asfand59/robofuse-gnn-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Asfandyar Khan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🚀 Project Summary

    This dataset supports RoboFUSE-GNN, an uncertainty-aware Graph Neural Network designed for real-time collaborative perception in dynamic factory environments. The data was collected from a multi-robot radar setup in a Cyber-Physical Production System (CPPS). Each sample represents a spatial-semantic radar graph, capturing inter-robot spatial relationships and temporal dependencies through a sliding window graph formulation.

    Scenario

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Fd0bd6be20d25441ddff17727f999f372%2Fphysical_setup_compressed-1.png?generation=1747997891229706&alt=media" alt=""> layout_01: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2F7706eca75693ebc76de2a49e6d49d7bb%2FLayout_01_setup-1.png?generation=1747997919528516&alt=media" alt=""> layout_02: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Ffca4f039613cd25dc4fafb1bc03a529d%2FLayout_02_setup-1.png?generation=1747997995497196&alt=media" alt=""> layout_03: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Faf235ecd458b371645f170ccbd72bf31%2FLayout_03_setup-1.png?generation=1747998010218223&alt=media" alt="">

    📚 Dataset Description

    Each sample in the dataset represents a radar graph snapshot composed of:

    Nodes: Radar detections over a temporal window

    Node features: Position, radar-specific attributes, and robot ID

    Edges: Constructed using spatial proximity and inter-robot collaboration

    Edge attributes: Relative motion, SNR, and temporal difference

    Labels:

    Node semantic classes (e.g., Robot, Workstation, Obstacle)

    Edge labels indicating semantic similarity and collaboration type

    📁 Folder Structure

    RoboFUSE_Graphs/split/ ├── scene_000/ │ ├── 000.pt │ ├── 001.pt │ └── scene_metadata.json ├── scene_001/ │ ├── ... ├── ... └── scene_split_mapping.json

    Each scene_XXX/ folder corresponds to a complete scenario and contains:

    NNN.pkl: A Pickle file for the N-th graph frame

    scene_metadata.json: Metadata including:

    scene_name: Scenario identifier

    scenario: Scenario Description

    layout_name: Layout name (e.g., layout_01, layout_02, layout_03)

    num_frames: Number of frames in the scene

    frame_files: List of graph frame files

    🧠 Graph Details

    Each .pkl file contains a dictionary with the following:

    KeyDescription
    xNode features [num_nodes, 10]
    edge_indexConnectivity matrix [2, num_edges]
    edge_attrEdge features [num_edges, 5]
    ySemantic node labels
    edge_class0 or 1 (edge label based on class similarity & distance)
    node_offsetsGround-truth regression to object center (used in clustering)
    cluster_node_idxList of node indices per object cluster
    cluster_labelsSemantic class per cluster
    timestampFrame timestamp (float)

    🔧 Graph Construction Pipeline

    The following steps were involved in creating the datatset:

    1. Preprocessing:

      - Points are filtered using SNR, Z height, and arena bounds
      - Normalized radar features include SNR, range, angle, velocity
      
    2. Sliding Window Accumulation:

      - Temporal fusion over a window W improves robustness
      - Used to simulate persistence and reduce sparsity
      
    3. Nodes:

      - Construct node features xi = [x, y, z, ŝ, r̂, sin(ϕ̂), cos(ϕ̂), sin(θ̂), cos(θ̂), robotID]
      - Label nodes using MoCap-ground-truth footprints.
      
    4. Edges:

      - Built using KNN 
      - Edge attributes eij = [Δx, Δy, Δz, ΔSNR, Δt]
      - Edge Labels: 
          - 1 if nodes are of the same class and within a distance threshold
          - Includes **intra-robot** and **inter-robot** collaborative edges
      

    🧪 Use Cases

    • Multi-robot perception and mapping
    • Semantic object detection
    • Graph-based reasoning in radar domains
    • Uncertainty-aware link prediction
  4. nasa-eo-knowledge-graph

    • huggingface.co
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC) (2025). nasa-eo-knowledge-graph [Dataset]. http://doi.org/10.57967/hf/3463
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Authors
    NASA Goddard Earth Sciences Data and Information Services Center (GES-DISC)
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    NASA Knowledge Graph Dataset

      Dataset Summary
    

    The NASA Knowledge Graph Dataset is an expansive graph-based dataset designed to integrate and interconnect information about satellite datasets, scientific publications, instruments, platforms, projects, data centers, and science keywords. This knowledge graph is particularly focused on datasets managed by NASA's Distributed Active Archive Centers (DAACs), which are NASA's data repositories responsible for archiving and… See the full description on the dataset page: https://huggingface.co/datasets/nasa-gesdisc/nasa-eo-knowledge-graph.

  5. Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData...

    • cryptodata.center
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-ethereum-blockchain
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Уникальный идентификатор https://doi.org/10.5281/zenodo.4718440 Набор данных обновлен Dec 19, 2022 Набор данных предоставлен Zenodo Авторы Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç Лицензия Attribution 4.0 (CC BY 4.0) Информация о лицензии была получена автоматически Описание This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain. Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward (static block reward) and "uncle block inclusion reward" are added as transactions to the dataset. Transaction fee reward and "uncles reward" are not currently included in the dataset. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: eth-tx- where For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from block 1000000 to block 1099999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20 contract) are indicated by token symbol. For example GUSD is Gemini USD stable coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens. Failed transactions are prefixed with "F-". BLOCK TIME FORMAT: The block time file has the following format: erc20tokens.json FILE: This file contains the list of popular ERC20 token contracts whose transfer/transferFrom transactions appear in the data files. ERC20 token list: USDT TRYb XAUt BNB LEO LINK HT HEDG MKR CRO VEN INO PAX INB SNX REP MOF ZRX SXP OKB XIN OMG SAI HOT DAI EURS HPT BUSD USDC SUSD HDG QCAD PLUS BTCB WBTC cWBTC renBTC sBTC imBTC pBTC IMPORTANT NOTE: Public Ethereum Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as http://etherscan.io . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0 @article{kilic2022parallel, title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing}, journal={Cluster Computing}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and Sen, Alper}, year={2022}, month={Jan} }

  6. Z

    Dataset used for "A Recommender System of Buggy App Checkers for App Store...

    • data.niaid.nih.gov
    Updated Jun 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Gomez (2021). Dataset used for "A Recommender System of Buggy App Checkers for App Store Moderators" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5034291
    Explore at:
    Dataset updated
    Jun 28, 2021
    Dataset provided by
    Maria Gomez
    Romain Rouvoy
    Lionel Seinturier
    Martin Monperrus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.

    Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.

    The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.

    For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.

    In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).

    Dataset Stats Some stats about the datasets:

    • D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.

    • D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.

    Additional stats about the datasets are available here.

    Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).

    In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).

    Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:

    • USES_PERMISSION relationships between APP and PERMISSION nodes
    • HAS_REVIEW between APP and USER_REVIEW nodes
    • HAS_TOPIC between USER_REVIEW and TOPIC nodes
    • BELONGS_TO_CATEGORY between APP and CATEGORY nodes
    • BELONGS_TO_SUBCATEGORY between APP and SUBCATEGORY nodes

    Dataset Files Info

    Neo4j 2.0 Databases

    googlePlayDB1-Jan2014_neo4j_2_0.rar

    googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).

    Neo4j 3.5 Databases

    googlePlayDB1-Jan2014_neo4j_3_5_28.rar

    googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.

      In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide.
    
      First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
    
  7. m

    Nihon M&A Center Inc - Net-Income

    • macro-rankings.com
    csv, excel
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Nihon M&A Center Inc - Net-Income [Dataset]. https://www.macro-rankings.com/markets/stocks/2127-tse/income-statement/net-income
    Explore at:
    csv, excelAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    japan
    Description

    Net-Income Time Series for Nihon M&A Center Inc. Nihon M&A Center Holdings Inc. provides mergers and acquisition (M&A) related services in Japan and internationally. The company offers M&A support services, such as reorganization, capital policies, and MBO for small and medium-sized enterprises. Nihon M&A Center Holdings Inc. was incorporated in 1991 and is headquartered in Tokyo, Japan.

  8. d

    Roads, Digital Line Graphs (1:24,000)

    • catalog.data.gov
    Updated Feb 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    West Virginia GIS Technical Center (Point of Contact) (2001). Roads, Digital Line Graphs (1:24,000) [Dataset]. https://catalog.data.gov/ru/dataset/roads-digital-line-graphs-1-24000
    Explore at:
    Dataset updated
    Feb 1, 2001
    Dataset provided by
    West Virginia GIS Technical Center (Point of Contact)
    Description

    This metadata is meant to describe the roads layer of DLGs, but may contain information on other DLG layers. Digital line graph (DLG) data are digital representations of cartographic information. DLG's of map features are converted to digital form from maps and related sources. Large-scale DLG data are available from the West Virginia University GIS Technical Center in six categories: (1) hypsography, (2) hydrography, (3) boundaries, (4) miscellaneous transportation (pipe and transmission lines), (5) roads, (6) railroads. All DLG data distributed by the USGS are DLG - Level 3 (DLG-3), which means the data contain a full range of attribute codes, have full topological structuring, and have passed certain quality-control checks. The files available from the West Virginia University GIS Technical Center are full dlg3 (USGS approved). Files are currently available in dlg and E00 format.

  9. h

    Polymed-QA

    • huggingface.co
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HPAI@BSC (High Performance Artificial Intelligence at Barcelona Supercomputing Center) (2024). Polymed-QA [Dataset]. https://huggingface.co/datasets/HPAI-BSC/Polymed-QA
    Explore at:
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Barcelona Supercomputing Centerhttps://www.bsc.es/
    Authors
    HPAI@BSC (High Performance Artificial Intelligence at Barcelona Supercomputing Center)
    License

    https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/

    Description

    Polymed-QA

    Synthetically generated QA pairs from the Polymed dataset. Used to train Aloe-Beta model.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    PolyMed is a dataset developed to improve Automatic Diagnosis Systems(ADS). This dataset incorporates medical knowledge graph data and diagnosis case data to provide comprehensive evaluation, diverse disease information, effective… See the full description on the dataset page: https://huggingface.co/datasets/HPAI-BSC/Polymed-QA.

  10. c

    ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ORBITAAL: cOmpRehensive BItcoin daTaset for temorAl grAph anaLysis - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/orbitaal-comprehensive-bitcoin-dataset-for-temoral-graph-analysis
    Explore at:
    Dataset updated
    Dec 4, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Construction This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/ [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain}, Dataset Description Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021 Overview: This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs. Every dates have been retrieved from bloc UNIX timestamp and GMT timezone. Contents: The dataset is distributed across three compressed archives: All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package. orbitaal-stream_graph.tar.gz: The root directory is STREAM_GRAPH/ Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes). The stream graph is divided into 13 files, one for each year Files format is parquet Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory STREAM_GRAPH/EDGES/ orbitaal-snapshot-all.tar.gz: The root directory is SNAPSHOT/ Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021). Files format is parquet Name format is orbitaal-snapshot-all.snappy.parquet. These files are in the subdirectory SNAPSHOT/EDGES/ALL/ orbitaal-snapshot-year.tar.gz: The root directory is SNAPSHOT/ Contains the yearly resolution of snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory SNAPSHOT/EDGES/year/ orbitaal-snapshot-month.tar.gz: The root directory is SNAPSHOT/ Contains the monthly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering These files are in the subdirectory SNAPSHOT/EDGES/month/ orbitaal-snapshot-day.tar.gz: The root directory is SNAPSHOT/ Contains the daily resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering These files are in the subdirectory SNAPSHOT/EDGES/day/ orbitaal-snapshot-hour.tar.gz: The root directory is SNAPSHOT/ Contains the hourly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering These files are in the subdirectory SNAPSHOT/EDGES/hour/ orbitaal-nodetable.tar.gz: The root directory is NODE_TABLE/ Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses. Small samples in CSV format orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv These two CSV files are related to stream graph representations of an halvening happening in 2016.

  11. Z

    OpenAIRE Graph: Dataset of funded products

    • data.niaid.nih.gov
    Updated Jan 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lempesis, Antonis (2025). OpenAIRE Graph: Dataset of funded products [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4559725
    Explore at:
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    De Bonis, Michele
    Foufoulas, Ioannis
    Dimitropoulos, Harry
    Horst, Marek
    Vergoulis, Thanasis
    Bardi, Alessia
    Mannocci, Andrea
    Atzori, Claudio
    Artini, Michele
    Baglioni, Miriam
    Ioannidis, Alexandros
    Manghi, Paolo
    Chatzopoulos, Serafeim
    Lempesis, Antonis
    La Bruzzo, Sandro
    Kokogiannaki, Argiro
    Kiatropoulou, Katerina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the metadata records about research products (research literature, data, software, other types of research products) with funding information available in the OpenAIRE Graph produced on July 2024.Records are grouped by funder in a dedicated archive file (.tar).

    fundRef contains the following funders

    100007490 Bausch and Lomb Ireland

    100007630 College of Engineering and Informatics, National University of Ireland, Galway

    100007731 Endo International

    100007819 Allergan

    100008099 Food Safety Authority of Ireland

    100008124 Department of Jobs, Enterprise and Innovation

    100008303 Department for Economics, Northern Ireland

    100009098 Department of Foreign Affairs and Trade, Ireland

    100009099 Irish Aid

    100009770 National University of Ireland

    100010399 European Society of Cataract and Refractive Surgeons

    100010546 Deparment of Children and Youth Affairs, Ireland

    100010547 Irish Youth Justice Service

    100010993 Irish Nephrology Society

    100011096 Jazz Pharmaceuticals

    100011396 Irish College of General Practitioners

    100012733 National Parks and Wildlife Service

    100012734 Department for Culture, Heritage and the Gaeltacht, Ireland

    100012754 Horizon Pharma

    100012891 Medical Research Charities Group

    100012919 Epilepsy Ireland

    100012920 GLEN

    100012921 Royal College of Surgeons in Ireland

    100013029 Iris O'Brien Foundation

    100013206 Food Institutional Research Measure

    100013381 Irish Phytochemical Food Network

    100013433 Transport Infrastructure Ireland

    100013917 Society for Musicology in Ireland

    100014251 Humanities in the European Research Area

    100014364 National Children's Research Centre

    100014384 Amarin Corporation

    100014902 Irish Association for Cancer Research

    100015023 Ireland Funds

    100015278 Pfizer Healthcare Ireland

    100015319 Sport Ireland Institute

    100015442 Global Brain Health Institute

    100015992 St. Luke's Institute of Cancer Research

    100017144 Shell E and P Ireland

    100017897 Friedreich’s Ataxia Research Alliance Ireland

    100018064 Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media

    100018172 Department of the Environment, Climate and Communications

    100018175 Dairy Processing Technology Centre

    100018270 Health Service Executive

    100018529 Alkermes

    100018542 Irish Endocrine Society

    100018754 An Roinn Sláinte

    100019428 Nabriva Therapeutics

    100019637 Horizon Therapeutics

    100020174 Health Research Charities Ireland

    100020202 UCD Foundation

    100020233 Ireland Canada University Foundation

    100022895 Health Research Institute, University of Limerick

    100022943 National Cancer Registry Ireland

    501100001581 Arts Council of Ireland

    501100001582 Centre for Ageing Research and Development in Ireland

    501100001583 Cystinosis Foundation Ireland

    501100001584 Department of Agriculture, Food and the Marine, Ireland

    501100001586 Department of Education and Skills, Ireland

    501100001587 Economic and Social Research Institute

    501100001588 Enterprise Ireland

    501100001591 Heritage Council

    501100001592 Higher Education Authority

    501100001593 Irish Cancer Society

    501100001594 Irish Heart Foundation

    501100001595 Irish Hospice Foundation

    501100001596 Irish Research Council for Science, Engineering and Technology

    501100001598 Mental Health Commission

    501100001599 National Council for Forest Research and Development

    501100001600 Research and Education Foundation, Sligo General Hospital

    501100001601 Royal Irish Academy

    501100001603 Sustainable Energy Authority of Ireland

    501100001604 Teagasc

    501100001627 Marine Institute

    501100001628 Central Remedial Clinic

    501100001629 Royal Dublin Society

    501100001630 Dublin Institute for Advanced Studies

    501100001631 University College Dublin

    501100001633 National University of Ireland, Maynooth

    501100001634 University of Galway

    501100001635 University of Limerick

    501100001636 University College Cork

    501100001637 Trinity College Dublin

    501100001638 Dublin City University

    501100002736 Covidien

    501100002755 Brennan and Company

    501100002919 Cork Institute of Technology

    501100002959 Dublin City Council

    501100003036 Perrigo Company Charitable Foundation

    501100003037 Elan

    501100003496 HeyStaks Technologies

    501100003553 Gaelic Athletic Association

    501100003840 Irish Institute of Clinical Neuroscience

    501100003956 Aspect Medical Systems

    501100004162 Meath Foundation

    501100004210 Our Lady's Children's Hospital, Crumlin

    501100004321 Shire

    501100004981 Athlone Institute of Technology

    501100006518 Department of Communications, Energy and Natural Resources, Ireland

    501100006553 Collaborative Centre for Applied Nanotechnology

    501100006554 IDA Ireland

    501100006759 CLARITY Centre for Sensor Web Technologies

    501100009246 Technological University Dublin

    501100009269 Programme of Competitive Forestry Research for Development

    501100009315 Cystinosis Ireland

    501100010808 Geological Survey of Ireland

    501100011030 Alimentary Glycoscience Research Cluster

    501100011031 Alimentary Health

    501100011103 Rannís

    501100011626 Energy Policy Research Centre, Economic and Social Research Institute

    501100012354 Inland Fisheries Ireland

    501100014384 X-Bolt Orthopaedics

    501100014531 Physical Education and Sport Sciences Department, University of Limerick

    501100014710 PrecisionBiotics Group

    501100014745 APC Microbiome Institute

    501100014826 ADAPT - Centre for Digital Content Technology

    501100014827 Dormant Accounts Fund

    501100017501 FotoNation

    501100018641 Dairy Research Ireland

    501100018839 Irish Centre for High-End Computing

    501100019905 Galway University Foundation

    501100020270 Advanced Materials and Bioengineering Research

    501100020403 Irish Composites Centre

    501100020425 Irish Thoracic Society

    501100020570 College of Medicine, Nursing and Health Sciences, National University of Ireland, Galway

    501100020871 Bernal Institute, University of Limerick

    501100021102 Waterford Institute of Technology

    501100021110 Irish MPS Society

    501100021525 Insight SFI Research Centre for Data Analytics

    501100021694 Elan Pharma International

    501100021838 Royal College of Physicians of Ireland

    501100022542 Breakthrough Cancer Research

    501100022610 Breast Cancer Ireland

    501100022728 Munster Technological University

    501100023273 HRB Clinical Research Facility Galway

    501100023551 Cystic Fibrosis Ireland

    501100023970 Tyndall National Institute

    501100024242 Synthesis and Solid State Pharmaceutical Centre

    501100024313 Irish Rugby Football Union

    501100024834 Tusla - Child and Family Agency

    AKA Academy of Finland

    ANR French National Research Agency (ANR)

    ARC Australian Research Council (ARC)

    ASAP Aligning Science Across Parkinson's

    CHISTERA CHIST-ERA

    CIHR Canadian Institutes of Health Research

    EC_ERASMUS+ European Commission - Erasmus+ funding stream

    EC_FP7 European Commission - FP7 funding stream

    EC_H2020 European Commission - H2020 funding stream

    EC_HE European Commission - HE funding stream

    EEA European Environment Agency

    EPA Environmental Protection Agency

    FCT Fundação para a Ciência e a Tecnologia, I.P.

    FWF Austrian Science Fund

    HRB Health Research Board

    HRZZ Croatian Science Foundation

    INCA Institut National du Cancer

    IRC Irish Research Council

    IReL Irish Research eLibrary

    MESTD Ministry of Education, Science and Technological Development of Republic of Serbia

    MZOS TOADDNAME

    NHMRC National Health and Medical Research Council (NHMRC)

    NIH National Institutes of Health

    NSERC Natural Sciences and Engineering Research Council of Canada

    NSF National Science Foundation

    NWO Netherlands Organisation for Scientific Research (NWO)

    SFI Science Foundation Ireland

    SNSF Swiss National Science Foundation

    SSHRC Social Sciences and Humanities Research Council

    TARA Tara Expeditions Foundation

    TIBITAK Türkiye Bilimsel ve Teknolojik Araştırma Kurumu

    UKRI UK Research and Innovation

    WT Wellcome Trust

    Each tar archive contains gzip files with one json record per line. Json records are compliant with the schema available at https://doi.org/10.5281/zenodo.14608710.

    You can also search and browse this dataset (and more) in the OpenAIRE EXPLORE portal and via the OpenAIRE API.

  12. 4

    Event Graph of BPI Challenge 2016

    • data.4tu.nl
    zip
    Updated Apr 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dirk Fahland; Stefan Esser (2021). Event Graph of BPI Challenge 2016 [Dataset]. http://doi.org/10.4121/14164220.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2021
    Dataset provided by
    4TU.ResearchData
    Authors
    Dirk Fahland; Stefan Esser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Business process event data modeled as labeled property graphs

    Data Format
    -----------

    The dataset comprises one labeled property graph in two different file formats.

    #1) Neo4j .dump format

    A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

    /bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

    The .dump was created with Neo4j v3.5.

    #2) .graphml format

    A .zip file containing a .graphml file of the entire graph


    Data Schema
    -----------

    The graph is a labeled property graph over business process event data. Each graph uses the following concepts

    :Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

    :Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

    :Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

    :Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

    :CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

    :DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

    :HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

    :OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

    :REL relationship - placeholder for any structural relationship between two :Entity nodes

    The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552


    Data Contents
    -------------

    neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)

    An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
    Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab
    UWV (Employee Insurance Agency) is an autonomous administrative authority (ZBO) and is commissioned by the Ministry of Social Affairs and Employment (SZW) to implement employee insurances and provide labour market and data services in the Netherlands. The Dutch employee insurances are provided for via laws such as the WW (Unemployment Insurance Act), the WIA (Work and Income according to Labour Capacity Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA (Return to Work (Partially Disabled) Regulations), the Wajong (Disablement Assistance Act for Handicapped Young Persons), the WAO (Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits Act. The data in this collection pertains to customer contacts over a period of 8 months and UWV is looking for insights into their customers' journeys. Data has been collected from several different sources, namely: 1) Clickdata from the site www.werk.nl collected from visitors that were not logged in, 2) Clickdata from the customer specific part of the site www.werk.nl (a link is made with the customer that logged in), 3) Werkmap Message data, showing when customers contacted the UWV through a digital channel, 4) Call data from the callcenter, showing when customers contacted the call center by phone, and 5) Complaint data showing when customers complained. All data is accompanied by data fields with anonymized information about the customer as well as data about the site visited or the contents of the call and/or complaint. The texts in the dataset are provided in both Dutch and English where applicable. URL's are included based on the structure of the site during the period the data has been collected. UWV is interested in insights on how their channels are being used, when customers move from one contact channel to the next and why and if there are clear customer profiles to be identified in the behavioral data. Furthermore, recommendations are sought on how to serve customers without the need to change the contact channel.
    The data contains the following entities and their events

    - Customer - customer of a Dutch public agency for handling unemployment benefits
    - Office_U - user or worker involved in an activity handling a customer interaction
    - Office_W - user or worker involved in an activity handling a customer interaction
    - Complaint - a complaint document handed in by a customer
    - ComplaintDossier - a collection of complaints by the same customer
    - Session - browser-session identifier of a user browsing the website of the agency
    - IP - IP address of a user browsing the website of the agency


    Data Size
    ---------

    BPIC16, nodes: 8109680, relationships: 86833139

  13. u

    HCHSGraphXplore: Visualizing Complex Medical Data with Knowledge Graphs

    • fdr.uni-hamburg.de
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louis Bellmann; Alexander Johannes Wiederhold; Leona Trübe; Raphael Twerenbold; Frank Ückert; Karl Gottfried; Alexander Johannes Wiederhold; Leona Trübe (2023). HCHSGraphXplore: Visualizing Complex Medical Data with Knowledge Graphs [Dataset]. http://doi.org/10.25592/uhhfdm.12136
    Explore at:
    Dataset updated
    May 4, 2023
    Dataset provided by
    University Medical Center Hamburg-Eppendorf, University Heart & Vascular Center Hamburg, Hamburg, Germany
    University Medical Center Hamburg-Eppendorf, Institute for Applied Medical Informatics, Christoph-Probst-Weg 1, 20251 Hamburg, Germany
    Authors
    Louis Bellmann; Alexander Johannes Wiederhold; Leona Trübe; Raphael Twerenbold; Frank Ückert; Karl Gottfried; Alexander Johannes Wiederhold; Leona Trübe
    Description

    This dataset capures statistical analysis of the HCHS cohort study using a knowledge graph and dashboard. Properties of 10,000 participants were analyzed for their association with cardiovascular disease as well as for their relationships among each other. The data is presented in the form of Neo4J database dumps and can be explored following the given user guide.

  14. G

    Graph Database Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Graph Database Market Report [Dataset]. https://www.marketresearchforecast.com/reports/graph-database-market-5306
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Graph Database Market size was valued at USD 1.9 USD billion in 2023 and is projected to reach USD 7.91 USD billion by 2032, exhibiting a CAGR of 22.6 % during the forecast period. A graph database is one form of NoSQL database that contains and represents relationships as graphs. Graph databases do not presuppose the data as relations as most contemporary relational databases do, applying nodes, edges, and properties instead. The primary types include property graphs that permit attributes on the nodes and edges and RDF triplestores that center on subject-predicate-object triplets. Some of the features include; the method's ability to traverse relationships at high rates, the schema change is easy and the method is scalable. Some of the familiar use cases are social media, recommendations, anomalies or fraud detection, and knowledge graphs where the relationships are complex and require higher comprehension. These databases are considered valuable where the future connection between the items of data is as significant as the data themselves. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.

  15. H

    Data from: Kids Count Data Center

    • dataverse.harvard.edu
    Updated Feb 23, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2011). Kids Count Data Center [Dataset]. http://doi.org/10.7910/DVN/DLA2Q2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2011
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Users can customize tables, graphs and maps on data related to children in a specific state or in the United States as a whole. Comparisons can be made between states. Background KIDS COUNT Data Center is part of the Annie E. Casey Foundation and serves to provide information on the status of children in America. The ten core indicators of interest under "Data by State" are: percent of low birth weight babies, infant mortality rate, child death rate, rate of teen deaths by accident, suicide and homicide, teen birth rate, percent of children living with parents who do not have full-time year-round employment, percent of teens who are high school drop outs, percent of teens not working and not in school, percent of children in poverty, and percent of families with children headed by a single parent. A number of other indicators, plus demographic and income information, are also included. "Data across States" is grouped into the following broad categories: demographics, education, economic well-being, family and community, health, safety and risk behaviors, and other. User Functionality Users can determine the view of the data- by table, line graph or map and can print or email the results. Data is available by state and across states. Data Across States allows users to access the raw data. Data is often present over a number of years. For a number of indicators under "Data Across States," users can view results by age, gender/ sex, or race/ ethnicity. Data Notes KIDS COUNT started in 1990. The most recent year of data is 2009 (or 2008 depending on the state, with some data available from 2010). Data is available on the national and state level, and for some states, at the county and city level.

  16. UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and...

    • zenodo.org
    application/gzip, bin +2
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration; University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration (2024). UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and Biodiversity Dataset Graph hash://md5/10663911550bb52a0f5741993f82db9d hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c [Dataset]. http://doi.org/10.5281/zenodo.5660088
    Explore at:
    application/gzip, json, bin, jpegAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration; University of California Santa Barbara Cheadle Center for Biodiversity and Ecological Restoration
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Santa Barbara
    Description

    A biodiversity dataset graph: UCSB-IZC

    The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.

    This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].

    This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].

    The images were counted using:

    $ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
    | grep -o -P ".*depict"\
    | sort\
    | uniq\
    | wc -l

    And the occurrences were counted using:

    $ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
    | grep -o -P "occurrence/([0-9])+"\
    | sort\
    | uniq\
    | wc -l

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.

    To retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5660088/files/

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c OK CONTENT_PRESENT_VALID_HASH 66438 hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c
    hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 OK CONTENT_PRESENT_VALID_HASH 4093 hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844
    hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef OK CONTENT_PRESENT_VALID_HASH 5746 hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef
    hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b OK CONTENT_PRESENT_VALID_HASH 6147 hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston [2,3] v0.3.1. --
    preston.jar

    -- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --
    preston-[00-ff].tar.gz

    -- individual provenance index files --
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a

    -- example image and meta-data --
    sample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)
    sample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)

    --- end of file descriptions ---


    References

    [1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .
    [3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132
    [4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c

  17. S

    Nuclear Data Multimodal Knowledge Graph Construction Dataset

    • scidb.cn
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Yiqi; Shi Rui (2025). Nuclear Data Multimodal Knowledge Graph Construction Dataset [Dataset]. http://doi.org/10.57760/sciencedb.25111
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Wei Yiqi; Shi Rui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Retrieve semi-structured text data ENSDF from the National Nuclear Data Center through web crawlers and clean the obtained ENSDF data. Subsequently, the cleaned ENSDF data was parsed using the Nuclei tool to generate the Decay Scheme image data. A total of 18186 entities and 18983 entity pair relationships were annotated using roLabelImg and self-made tools. The dataset is divided into a test set and a training set.

  18. m

    Generated Prediction Data of COVID-19's Daily Infections in Brazil

    • data.mendeley.com
    Updated Jul 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Hawas (2020). Generated Prediction Data of COVID-19's Daily Infections in Brazil [Dataset]. http://doi.org/10.17632/t2zk3xnt8y.1
    Explore at:
    Dataset updated
    Jul 12, 2020
    Authors
    Mohamed Hawas
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Dataset general description:

    • This dataset reports 4195 recurrent neural network models, their settings, and their generated prediction csv files, graphs, and metadata files, for predicting COVID-19's daily infections in Brazil by training on limited raw data (30 time-steps and 40 time-steps alternatives). The used code is developed by the author and located in the following online data repository link: http://dx.doi.org/10.17632/yp4d95pk7n.1

    Dataset content:

    • Models, Graphs, and csv predictions files: 1. Deterministic mode (DM): includes 1194 generated models files (30 time-steps), and their generated 2835 graphs and 2835 predictions files. Similarly, this mode includes 1976 generated model files (40 time-steps), and their generated 7301 graphs and 7301 predictions files. 2. Non-deterministic mode (NDM): includes 20 generated model files (30 time-steps), and their generated 53 graphs and 53 predictions files. 3. Technical validation mode (TVM): includes 1001 generated model files (30 time-steps), and their generated 3619 graphs and 3619 predictions files for 358 models, which are a sample of 1001 models. Also, 1 model in control group for India. 4. 1 graph and 1 prediction files for each of DM and NDM, reporting evaluation till 2020-07-11.

    • Settings and metadata for the above 3 categories: 1. Used settings in json files for reproducibility. 2. Metadata about training and prediction setup and accuracy in csv files.

    Raw data source that was used to train the models:

    • The used raw data for training the models is from: COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University): https://github.com/CSSEGISandData/COVID-19

    • The models were trained on these versions of the raw data: 1. Link till 2020-06-29 (accessed 2020-07-08): https://github.com/CSSEGISandData/COVID-19/raw/78d91b2dbc2a26eb2b2101fa499c6798aa22fca8/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv 2. Link till 2020-06-13 (accessed 2020-07-08): https://github.com/CSSEGISandData/COVID-19/raw/02ea750a263f6d8b8945fdd3253b35d3fd9b1bee/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

    License: This prediction Dataset is licensed under CC BY NC 3.0.

    Notice and disclaimer: 1- This prediction Dataset is for scientific and research purposes only. 2- The generation of this Dataset complies with the terms of use of the publicly available raw data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: https://github.com/CSSEGISandData/COVID-19 and therefore, the author of the prediction Dataset disclaims any and all responsibility and warranties regarding the contents of used raw data, including but not limited to: the correctness, completeness, and any issues linked to third-party rights.

  19. Data from: KGCW 2023 Challenge @ ESWC 2023

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated May 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias (2023). KGCW 2023 Challenge @ ESWC 2023 [Dataset]. http://doi.org/10.5281/zenodo.7689310
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dylan Van Assche; Dylan Van Assche; David Chaves-Fraga; David Chaves-Fraga; Anastasia Dimou; Anastasia Dimou; Umutcan Şimşek; Umutcan Şimşek; Ana Iglesias; Ana Iglesias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Knowledge Graph Construction Workshop 2023: challenge

    Knowledge graph construction of heterogeneous data has seen a lot of uptake
    in the last decade from compliance to performance optimizations with respect
    to execution time. Besides execution time as a metric for comparing knowledge
    graph construction, other metrics e.g. CPU or memory usage are not considered.
    This challenge aims at benchmarking systems to find which RDF graph
    construction system optimizes for metrics e.g. execution time, CPU,
    memory usage, or a combination of these metrics.

    Task description

    The task is to reduce and report the execution time and computing resources
    (CPU and memory usage) for the parameters listed in this challenge, compared
    to the state-of-the-art of the existing tools and the baseline results provided
    by this challenge. This challenge is not limited to execution times to create
    the fastest pipeline, but also computing resources to achieve the most efficient
    pipeline.

    We provide a tool which can execute such pipelines end-to-end. This tool also
    collects and aggregates the metrics such as execution time, CPU and memory
    usage, necessary for this challenge as CSV files. Moreover, the information
    about the hardware used during the execution of the pipeline is available as
    well to allow fairly comparing different pipelines. Your pipeline should consist
    of Docker images which can be executed on Linux to run the tool. The tool is
    already tested with existing systems, relational databases e.g. MySQL and
    PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
    which can be combined in any configuration. It is strongly encouraged to use
    this tool for participating in this challenge. If you prefer to use a different
    tool or our tool imposes technical requirements you cannot solve, please contact
    us directly.

    Part 1: Knowledge Graph Construction Parameters

    These parameters are evaluated using synthetic generated data to have more
    insights of their influence on the pipeline.

    Data

    • Number of data records: scaling the data size vertically by the number of records with a fixed number of data properties (10K, 100K, 1M, 10M records).
    • Number of data properties: scaling the data size horizontally by the number of data properties with a fixed number of data records (1, 10, 20, 30 columns).
    • Number of duplicate values: scaling the number of duplicate values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of empty values: scaling the number of empty values in the dataset (0%, 25%, 50%, 75%, 100%).
    • Number of input files: scaling the number of datasets (1, 5, 10, 15).

    Mappings

    • Number of subjects: scaling the number of subjects with a fixed number of predicates and objects (1, 10, 20, 30 TMs).
    • Number of predicates and objects: scaling the number of predicates and objects with a fixed number of subjects (1, 10, 20, 30 POMs).
    • Number of and type of joins: scaling the number of joins and type of joins (1-1, N-1, 1-N, N-M)

    Part 2: GTFS-Madrid-Bench

    The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
    public transport domain in Madrid.

    Scaling

    • GTFS-1 SQL
    • GTFS-10 SQL
    • GTFS-100 SQL
    • GTFS-1000 SQL

    Heterogeneity

    • GTFS-100 XML + JSON
    • GTFS-100 CSV + XML
    • GTFS-100 CSV + JSON
    • GTFS-100 SQL + XML + JSON + CSV

    Example pipeline

    The ground truth dataset and baseline results are generated in different steps
    for each parameter:

    1. The provided CSV files and SQL schema are loaded into a MySQL relational database.
    2. Mappings are executed by accessing the MySQL relational database to construct a knowledge graph in N-Triples as RDF format.
    3. The constructed knowledge graph is loaded into a Virtuoso triplestore, tuned according to the Virtuoso documentation.
    4. The provided SPARQL queries are executed on the SPARQL endpoint exposed by Virtuoso.

    The pipeline is executed 5 times from which the median execution time of each
    step is calculated and reported. Each step with the median execution time is
    then reported in the baseline results with all its measured metrics.
    Query timeout is set to 1 hour and knowledge graph construction timeout
    to 24 hours. The execution is performed with the following tool

    Each parameter has its own directory in the ground truth dataset with the
    following files:

    • Input dataset as CSV.
    • Mapping file as RML.
    • Queries as SPARQL.
    • Execution plan for the pipeline in metadata.json.

    Datasets

    Knowledge Graph Construction Parameters

    The dataset consists of:

    • Input dataset as CSV for each parameter.
    • Mapping file as RML for each parameter.
    • SPARQL queries to retrieve the results for each parameter.
    • Baseline results for each parameter with the example pipeline.
    • Ground truth dataset for each parameter generated with the example pipeline.

    Format

    All input datasets are provided as CSV, depending on the parameter that is being
    evaluated, the number of rows and columns may differ. The first row is always
    the header of the CSV.

    GTFS-Madrid-Bench

    The dataset consists of:

    • Input dataset as CSV with SQL schema for the scaling and a combination of XML,
    • CSV, and JSON is provided for the heterogeneity.
    • Mapping file as RML for both scaling and heterogeneity.
    • SPARQL queries to retrieve the results.
    • Baseline results with the example pipeline.
    • Ground truth dataset generated with the example pipeline.

    Format

    CSV datasets always have a header as their first row.
    JSON and XML datasets have their own schema.

    Evaluation criteria

    Submissions must evaluate the following metrics:

    • Execution time of all the steps in the pipeline. The execution time of a step is the difference between the begin and end time of a step.
    • CPU time as the time spent in the CPU for all steps of the pipeline. The CPU time of a step is the difference between the begin and end CPU time of a step.
    • Minimal and maximal memory consumption for each step of the pipeline. The minimal and maximal memory consumption of a step is the minimum and maximum calculated of the memory consumption during the execution of a step.

    Expected output

    Duplicate values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500020 triples
    50 percent1000020 triples
    75 percent500020 triples
    100 percent20 triples

    Empty values

    ScaleNumber of Triples
    0 percent2000000 triples
    25 percent1500000 triples
    50 percent1000000 triples
    75 percent500000 triples
    100 percent0 triples

    Mappings

    ScaleNumber of Triples
    1TM + 15POM1500000 triples
    3TM + 5POM1500000 triples
    5TM + 3POM 1500000 triples
    15TM + 1POM1500000 triples

    Properties

    ScaleNumber of Triples
    1M rows 1 column1000000 triples
    1M rows 10 columns10000000 triples
    1M rows 20 columns20000000 triples
    1M rows 30 columns30000000 triples

    Records

    ScaleNumber of Triples
    10K rows 20 columns200000 triples
    100K rows 20 columns2000000 triples
    1M rows 20 columns20000000 triples
    10M rows 20 columns200000000 triples

    Joins

    1-1 joins

    ScaleNumber of Triples
    0 percent0

  20. w

    VT USGS Digital Line Graph Surface Waters - area polygons

    • data.wu.ac.at
    • geodata.vermont.gov
    • +1more
    Updated Apr 26, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vermont Center for Geographic Information (2018). VT USGS Digital Line Graph Surface Waters - area polygons [Dataset]. https://data.wu.ac.at/schema/data_gov/NTlkMTEyZGQtYjlkZC00ZjVjLTgzOTctZjIyNzRjNzM0MGQ0
    Explore at:
    zip, json, html, csv, application/vnd.ogc.wms_xml, kml, application/vnd.geo+jsonAvailable download formats
    Dataset updated
    Apr 26, 2018
    Dataset provided by
    Vermont Center for Geographic Information
    Area covered
    2f073ebeaa3c5372fd3650be4e7b80dd4972444c
    Description

    (Link to Metadata) The WaterHydro_DLGSW layer represents surface waters (hydrography) at a scale of RF 100000. WaterHydro_DLGSW was derived from RF100000 USGS Digital Line Graph (DLG). DLG's of map features are converted to digital form from maps and related sources. Refer to the USGS web site from more information on DLGs (http://www.usgs.gov)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
cryptodata.center (2024). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-bitcoin-blockchain-part-2-of-4
Organization logo

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub

Explore at:
Dataset updated
Dec 4, 2024
Dataset provided by
CryptoDATA
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }

Search
Clear search
Close search
Google apps
Main menu