Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wikipedia is the largest and most read online free encyclopedia currently existing. As such, Wikipedia offers a large amount of data on all its own contents and interactions around them, as well as different types of open data sources. This makes Wikipedia a unique data source that can be analyzed with quantitative data science techniques. However, the enormous amount of data makes it difficult to have an overview, and sometimes many of the analytical possibilities that Wikipedia offers remain unknown. In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.
There are a total of 9 files, all of them in tsv format, and they have been built under a relational structure. The main one that acts as the core of the dataset is the page file, after it there are 4 files with different entities related to the Wikipedia pages (category, url, pub and page_property files) and 4 other files that act as "intermediate tables" making it possible to connect the pages both with the latter and between pages (page_category, page_url, page_pub and page_link files).
The document Dataset_summary includes a detailed description of the dataset.
Thanks to Nees Jan van Eck and the Centre for Science and Technology Studies (CWTS) for the valuable comments and suggestions.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset supports RoboFUSE-GNN, an uncertainty-aware Graph Neural Network designed for real-time collaborative perception in dynamic factory environments. The data was collected from a multi-robot radar setup in a Cyber-Physical Production System (CPPS). Each sample represents a spatial-semantic radar graph, capturing inter-robot spatial relationships and temporal dependencies through a sliding window graph formulation.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Fd0bd6be20d25441ddff17727f999f372%2Fphysical_setup_compressed-1.png?generation=1747997891229706&alt=media" alt="">
layout_01:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2F7706eca75693ebc76de2a49e6d49d7bb%2FLayout_01_setup-1.png?generation=1747997919528516&alt=media" alt="">
layout_02:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Ffca4f039613cd25dc4fafb1bc03a529d%2FLayout_02_setup-1.png?generation=1747997995497196&alt=media" alt="">
layout_03:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F26959588%2Faf235ecd458b371645f170ccbd72bf31%2FLayout_03_setup-1.png?generation=1747998010218223&alt=media" alt="">
Each sample in the dataset represents a radar graph snapshot composed of:
Nodes: Radar detections over a temporal window
Node features: Position, radar-specific attributes, and robot ID
Edges: Constructed using spatial proximity and inter-robot collaboration
Edge attributes: Relative motion, SNR, and temporal difference
Labels:
Node semantic classes (e.g., Robot, Workstation, Obstacle)
Edge labels indicating semantic similarity and collaboration type
RoboFUSE_Graphs/split/ ├── scene_000/ │ ├── 000.pt │ ├── 001.pt │ └── scene_metadata.json ├── scene_001/ │ ├── ... ├── ... └── scene_split_mapping.json
Each scene_XXX/ folder corresponds to a complete scenario and contains:
NNN.pkl: A Pickle file for the N-th graph frame
scene_metadata.json: Metadata including:
scene_name: Scenario identifier
scenario: Scenario Description
layout_name: Layout name (e.g., layout_01, layout_02, layout_03)
num_frames: Number of frames in the scene
frame_files: List of graph frame files
Each .pkl file contains a dictionary with the following:
Key | Description |
---|---|
x | Node features [num_nodes, 10] |
edge_index | Connectivity matrix [2, num_edges] |
edge_attr | Edge features [num_edges, 5] |
y | Semantic node labels |
edge_class | 0 or 1 (edge label based on class similarity & distance) |
node_offsets | Ground-truth regression to object center (used in clustering) |
cluster_node_idx | List of node indices per object cluster |
cluster_labels | Semantic class per cluster |
timestamp | Frame timestamp (float) |
The following steps were involved in creating the datatset:
Preprocessing:
- Points are filtered using SNR, Z height, and arena bounds
- Normalized radar features include SNR, range, angle, velocity
Sliding Window Accumulation:
- Temporal fusion over a window W improves robustness
- Used to simulate persistence and reduce sparsity
Nodes:
- Construct node features xi = [x, y, z, ŝ, r̂, sin(ϕ̂), cos(ϕ̂), sin(θ̂), cos(θ̂), robotID]
- Label nodes using MoCap-ground-truth footprints.
Edges:
- Built using KNN
- Edge attributes eij = [Δx, Δy, Δz, ΔSNR, Δt]
- Edge Labels:
- 1 if nodes are of the same class and within a distance threshold
- Includes **intra-robot** and **inter-robot** collaborative edges
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
NASA Knowledge Graph Dataset
Dataset Summary
The NASA Knowledge Graph Dataset is an expansive graph-based dataset designed to integrate and interconnect information about satellite datasets, scientific publications, instruments, platforms, projects, data centers, and science keywords. This knowledge graph is particularly focused on datasets managed by NASA's Distributed Active Archive Centers (DAACs), which are NASA's data repositories responsible for archiving and… See the full description on the dataset page: https://huggingface.co/datasets/nasa-gesdisc/nasa-eo-knowledge-graph.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Уникальный идентификатор https://doi.org/10.5281/zenodo.4718440 Набор данных обновлен Dec 19, 2022 Набор данных предоставлен Zenodo Авторы Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç Лицензия Attribution 4.0 (CC BY 4.0) Информация о лицензии была получена автоматически Описание This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain. Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward (static block reward) and "uncle block inclusion reward" are added as transactions to the dataset. Transaction fee reward and "uncles reward" are not currently included in the dataset. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: eth-tx- where For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from block 1000000 to block 1099999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20 contract) are indicated by token symbol. For example GUSD is Gemini USD stable coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens. Failed transactions are prefixed with "F-". BLOCK TIME FORMAT: The block time file has the following format: erc20tokens.json FILE: This file contains the list of popular ERC20 token contracts whose transfer/transferFrom transactions appear in the data files. ERC20 token list: USDT TRYb XAUt BNB LEO LINK HT HEDG MKR CRO VEN INO PAX INB SNX REP MOF ZRX SXP OKB XIN OMG SAI HOT DAI EURS HPT BUSD USDC SUSD HDG QCAD PLUS BTCB WBTC cWBTC renBTC sBTC imBTC pBTC IMPORTANT NOTE: Public Ethereum Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as http://etherscan.io . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0 @article{kilic2022parallel, title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing}, journal={Cluster Computing}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and Sen, Alper}, year={2022}, month={Jan} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used for paper: "A Recommender System of Buggy App Checkers for App Store Moderators", published on the International Conference on Mobile Software Engineering and Systems (MOBILESoft) in 2015.
Dataset Collection We built a dataset that consists of a random sample of Android app metadata and user reviews available on the Google Play Store on January and March 2014. Since the Google Play Store is continuously evolving (adding, removing and/or updating apps), we updated the dataset twice. The dataset D1 contains available apps in the Google Play Store in January 2014. Then, we created a new snapshot (D2) of the Google Play Store in March 2014.
The apps belong to the 27 different categories defined by Google (at the time of writing the paper), and the 4 predefined subcategories (free, paid, new_free, and new_paid). For each category-subcategory pair (e.g. tools-free, tools-paid, sports-new_free, etc.), we collected a maximum of 500 samples, resulting in a median number of 1.978 apps per category.
For each app, we retrieved the following metadata: name, package, creator, version code, version name, number of downloads, size, upload date, star rating, star counting, and the set of permission requests.
In addition, for each app, we collected up to a maximum of the latest 500 reviews posted by users in the Google Play Store. For each review, we retrieved its metadata: title, description, device, and version of the app. None of these fields were mandatory, thus several reviews lack some of these details. From all the reviews attached to an app, we only considered the reviews associated with the latest version of the app —i.e., we discarded unversioned and old-versioned reviews. Thus, resulting in a corpus of 1,402,717 reviews (2014 Jan.).
Dataset Stats Some stats about the datasets:
D1 (Jan. 2014) contains 38,781 apps requesting 7,826 different permissions, and 1,402,717 user reviews.
D2 (Mar. 2014) contains 46,644 apps and 9,319 different permission requests, and 1,361,319 user reviews.
Additional stats about the datasets are available here.
Dataset Description To store the dataset, we created a graph database with Neo4j. This dataset therefore consists of a graph describing the apps as nodes and edges. We chose a graph database because the graph visualization helps to identify connections among data (e.g., clusters of apps sharing similar sets of permission requests).
In particular, our dataset graph contains six types of nodes: - APP nodes containing metadata of each app, - PERMISSION nodes describing permission types, - CATEGORY nodes describing app categories, - SUBCATEGORY nodes describing app subcategories, - USER_REVIEW nodes storing user reviews. - TOPIC topics mined from user reviews (using LDA).
Furthermore, there are five types of relationships between APP nodes and each of the remaining nodes:
Dataset Files Info
Neo4j 2.0 Databases
googlePlayDB1-Jan2014_neo4j_2_0.rar
googlePlayDB2-Mar2014_neo4j_2_0.rar We provide two Neo4j databases containing the 2 snapshots of the Google Play Store (January and March 2014). These are the original databases created for the paper. The databases were created with Neo4j 2.0. In particular with the tool version 'Neo4j 2.0.0-M06 Community Edition' (latest version available at the time of implementing the paper in 2014).
Neo4j 3.5 Databases
googlePlayDB1-Jan2014_neo4j_3_5_28.rar
googlePlayDB2-Mar2014_neo4j_3_5_28.rar Currently, the version Neo4j 2.0 is deprecated and it is not available for download in the official Neo4j Download Center. We have migrated the original databases (Neo4j 2.0) to Neo4j 3.5.28. The databases can be opened with the tool version: 'Neo4j Community Edition 3.5.28'. The tool can be downloaded from the official Neo4j Donwload page.
In order to open the databases with more recent versions of Neo4j, the databases must be first migrated to the corresponding version. Instructions about the migration process can be found in the Neo4j Migration Guide.
First time the Neo4j database is connected, it could request credentials. The username and pasword are: neo4j/neo4j
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Net-Income Time Series for Nihon M&A Center Inc. Nihon M&A Center Holdings Inc. provides mergers and acquisition (M&A) related services in Japan and internationally. The company offers M&A support services, such as reorganization, capital policies, and MBO for small and medium-sized enterprises. Nihon M&A Center Holdings Inc. was incorporated in 1991 and is headquartered in Tokyo, Japan.
This metadata is meant to describe the roads layer of DLGs, but may contain information on other DLG layers. Digital line graph (DLG) data are digital representations of cartographic information. DLG's of map features are converted to digital form from maps and related sources. Large-scale DLG data are available from the West Virginia University GIS Technical Center in six categories: (1) hypsography, (2) hydrography, (3) boundaries, (4) miscellaneous transportation (pipe and transmission lines), (5) roads, (6) railroads. All DLG data distributed by the USGS are DLG - Level 3 (DLG-3), which means the data contain a full range of attribute codes, have full topological structuring, and have passed certain quality-control checks. The files available from the West Virginia University GIS Technical Center are full dlg3 (USGS approved). Files are currently available in dlg and E00 format.
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Polymed-QA
Synthetically generated QA pairs from the Polymed dataset. Used to train Aloe-Beta model.
Dataset Details
Dataset Description
PolyMed is a dataset developed to improve Automatic Diagnosis Systems(ADS). This dataset incorporates medical knowledge graph data and diagnosis case data to provide comprehensive evaluation, diverse disease information, effective… See the full description on the dataset page: https://huggingface.co/datasets/HPAI-BSC/Polymed-QA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Construction This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/ [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain}, Dataset Description Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021 Overview: This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs. Every dates have been retrieved from bloc UNIX timestamp and GMT timezone. Contents: The dataset is distributed across three compressed archives: All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package. orbitaal-stream_graph.tar.gz: The root directory is STREAM_GRAPH/ Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes). The stream graph is divided into 13 files, one for each year Files format is parquet Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory STREAM_GRAPH/EDGES/ orbitaal-snapshot-all.tar.gz: The root directory is SNAPSHOT/ Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021). Files format is parquet Name format is orbitaal-snapshot-all.snappy.parquet. These files are in the subdirectory SNAPSHOT/EDGES/ALL/ orbitaal-snapshot-year.tar.gz: The root directory is SNAPSHOT/ Contains the yearly resolution of snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory SNAPSHOT/EDGES/year/ orbitaal-snapshot-month.tar.gz: The root directory is SNAPSHOT/ Contains the monthly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering These files are in the subdirectory SNAPSHOT/EDGES/month/ orbitaal-snapshot-day.tar.gz: The root directory is SNAPSHOT/ Contains the daily resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering These files are in the subdirectory SNAPSHOT/EDGES/day/ orbitaal-snapshot-hour.tar.gz: The root directory is SNAPSHOT/ Contains the hourly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering These files are in the subdirectory SNAPSHOT/EDGES/hour/ orbitaal-nodetable.tar.gz: The root directory is NODE_TABLE/ Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses. Small samples in CSV format orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv These two CSV files are related to stream graph representations of an halvening happening in 2016.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the metadata records about research products (research literature, data, software, other types of research products) with funding information available in the OpenAIRE Graph produced on July 2024.Records are grouped by funder in a dedicated archive file (.tar).
fundRef contains the following funders
100007490 Bausch and Lomb Ireland
100007630 College of Engineering and Informatics, National University of Ireland, Galway
100007731 Endo International
100007819 Allergan
100008099 Food Safety Authority of Ireland
100008124 Department of Jobs, Enterprise and Innovation
100008303 Department for Economics, Northern Ireland
100009098 Department of Foreign Affairs and Trade, Ireland
100009099 Irish Aid
100009770 National University of Ireland
100010399 European Society of Cataract and Refractive Surgeons
100010546 Deparment of Children and Youth Affairs, Ireland
100010547 Irish Youth Justice Service
100010993 Irish Nephrology Society
100011096 Jazz Pharmaceuticals
100011396 Irish College of General Practitioners
100012733 National Parks and Wildlife Service
100012734 Department for Culture, Heritage and the Gaeltacht, Ireland
100012754 Horizon Pharma
100012891 Medical Research Charities Group
100012919 Epilepsy Ireland
100012920 GLEN
100012921 Royal College of Surgeons in Ireland
100013029 Iris O'Brien Foundation
100013206 Food Institutional Research Measure
100013381 Irish Phytochemical Food Network
100013433 Transport Infrastructure Ireland
100013917 Society for Musicology in Ireland
100014251 Humanities in the European Research Area
100014364 National Children's Research Centre
100014384 Amarin Corporation
100014902 Irish Association for Cancer Research
100015023 Ireland Funds
100015278 Pfizer Healthcare Ireland
100015319 Sport Ireland Institute
100015442 Global Brain Health Institute
100015992 St. Luke's Institute of Cancer Research
100017144 Shell E and P Ireland
100017897 Friedreich’s Ataxia Research Alliance Ireland
100018064 Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media
100018172 Department of the Environment, Climate and Communications
100018175 Dairy Processing Technology Centre
100018270 Health Service Executive
100018529 Alkermes
100018542 Irish Endocrine Society
100018754 An Roinn Sláinte
100019428 Nabriva Therapeutics
100019637 Horizon Therapeutics
100020174 Health Research Charities Ireland
100020202 UCD Foundation
100020233 Ireland Canada University Foundation
100022895 Health Research Institute, University of Limerick
100022943 National Cancer Registry Ireland
501100001581 Arts Council of Ireland
501100001582 Centre for Ageing Research and Development in Ireland
501100001583 Cystinosis Foundation Ireland
501100001584 Department of Agriculture, Food and the Marine, Ireland
501100001586 Department of Education and Skills, Ireland
501100001587 Economic and Social Research Institute
501100001588 Enterprise Ireland
501100001591 Heritage Council
501100001592 Higher Education Authority
501100001593 Irish Cancer Society
501100001594 Irish Heart Foundation
501100001595 Irish Hospice Foundation
501100001596 Irish Research Council for Science, Engineering and Technology
501100001598 Mental Health Commission
501100001599 National Council for Forest Research and Development
501100001600 Research and Education Foundation, Sligo General Hospital
501100001601 Royal Irish Academy
501100001603 Sustainable Energy Authority of Ireland
501100001604 Teagasc
501100001627 Marine Institute
501100001628 Central Remedial Clinic
501100001629 Royal Dublin Society
501100001630 Dublin Institute for Advanced Studies
501100001631 University College Dublin
501100001633 National University of Ireland, Maynooth
501100001634 University of Galway
501100001635 University of Limerick
501100001636 University College Cork
501100001637 Trinity College Dublin
501100001638 Dublin City University
501100002736 Covidien
501100002755 Brennan and Company
501100002919 Cork Institute of Technology
501100002959 Dublin City Council
501100003036 Perrigo Company Charitable Foundation
501100003037 Elan
501100003496 HeyStaks Technologies
501100003553 Gaelic Athletic Association
501100003840 Irish Institute of Clinical Neuroscience
501100003956 Aspect Medical Systems
501100004162 Meath Foundation
501100004210 Our Lady's Children's Hospital, Crumlin
501100004321 Shire
501100004981 Athlone Institute of Technology
501100006518 Department of Communications, Energy and Natural Resources, Ireland
501100006553 Collaborative Centre for Applied Nanotechnology
501100006554 IDA Ireland
501100006759 CLARITY Centre for Sensor Web Technologies
501100009246 Technological University Dublin
501100009269 Programme of Competitive Forestry Research for Development
501100009315 Cystinosis Ireland
501100010808 Geological Survey of Ireland
501100011030 Alimentary Glycoscience Research Cluster
501100011031 Alimentary Health
501100011103 Rannís
501100011626 Energy Policy Research Centre, Economic and Social Research Institute
501100012354 Inland Fisheries Ireland
501100014384 X-Bolt Orthopaedics
501100014531 Physical Education and Sport Sciences Department, University of Limerick
501100014710 PrecisionBiotics Group
501100014745 APC Microbiome Institute
501100014826 ADAPT - Centre for Digital Content Technology
501100014827 Dormant Accounts Fund
501100017501 FotoNation
501100018641 Dairy Research Ireland
501100018839 Irish Centre for High-End Computing
501100019905 Galway University Foundation
501100020270 Advanced Materials and Bioengineering Research
501100020403 Irish Composites Centre
501100020425 Irish Thoracic Society
501100020570 College of Medicine, Nursing and Health Sciences, National University of Ireland, Galway
501100020871 Bernal Institute, University of Limerick
501100021102 Waterford Institute of Technology
501100021110 Irish MPS Society
501100021525 Insight SFI Research Centre for Data Analytics
501100021694 Elan Pharma International
501100021838 Royal College of Physicians of Ireland
501100022542 Breakthrough Cancer Research
501100022610 Breast Cancer Ireland
501100022728 Munster Technological University
501100023273 HRB Clinical Research Facility Galway
501100023551 Cystic Fibrosis Ireland
501100023970 Tyndall National Institute
501100024242 Synthesis and Solid State Pharmaceutical Centre
501100024313 Irish Rugby Football Union
501100024834 Tusla - Child and Family Agency
AKA Academy of Finland
ANR French National Research Agency (ANR)
ARC Australian Research Council (ARC)
ASAP Aligning Science Across Parkinson's
CHISTERA CHIST-ERA
CIHR Canadian Institutes of Health Research
EC_ERASMUS+ European Commission - Erasmus+ funding stream
EC_FP7 European Commission - FP7 funding stream
EC_H2020 European Commission - H2020 funding stream
EC_HE European Commission - HE funding stream
EEA European Environment Agency
EPA Environmental Protection Agency
FCT Fundação para a Ciência e a Tecnologia, I.P.
FWF Austrian Science Fund
HRB Health Research Board
HRZZ Croatian Science Foundation
INCA Institut National du Cancer
IRC Irish Research Council
IReL Irish Research eLibrary
MESTD Ministry of Education, Science and Technological Development of Republic of Serbia
MZOS TOADDNAME
NHMRC National Health and Medical Research Council (NHMRC)
NIH National Institutes of Health
NSERC Natural Sciences and Engineering Research Council of Canada
NSF National Science Foundation
NWO Netherlands Organisation for Scientific Research (NWO)
SFI Science Foundation Ireland
SNSF Swiss National Science Foundation
SSHRC Social Sciences and Humanities Research Council
TARA Tara Expeditions Foundation
TIBITAK Türkiye Bilimsel ve Teknolojik Araştırma Kurumu
UKRI UK Research and Innovation
WT Wellcome Trust
Each tar archive contains gzip files with one json record per line. Json records are compliant with the schema available at https://doi.org/10.5281/zenodo.14608710.
You can also search and browse this dataset (and more) in the OpenAIRE EXPLORE portal and via the OpenAIRE API.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab
UWV (Employee Insurance Agency) is an autonomous administrative
authority (ZBO) and is commissioned by the Ministry of Social Affairs
and Employment (SZW) to implement employee insurances and provide labour
market and data services in the Netherlands. The Dutch employee
insurances are provided for via laws such as the WW (Unemployment
Insurance Act), the WIA (Work and Income according to Labour Capacity
Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA
(Return to Work (Partially Disabled) Regulations), the Wajong
(Disablement Assistance Act for Handicapped Young Persons), the WAO
(Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement
Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits
Act. The data in this collection pertains to customer contacts over a
period of 8 months and UWV is looking for insights into their customers'
journeys. Data has been collected from several different sources,
namely: 1) Clickdata from the site www.werk.nl collected from visitors
that were not logged in, 2) Clickdata from the customer specific part of
the site www.werk.nl (a link is made with the customer that logged in),
3) Werkmap Message data, showing when customers contacted the UWV
through a digital channel, 4) Call data from the callcenter, showing
when customers contacted the call center by phone, and 5) Complaint data
showing when customers complained. All data is accompanied by data
fields with anonymized information about the customer as well as data
about the site visited or the contents of the call and/or complaint. The
texts in the dataset are provided in both Dutch and English where
applicable. URL's are included based on the structure of the site during
the period the data has been collected. UWV is interested in insights
on how their channels are being used, when customers move from one
contact channel to the next and why and if there are clear customer
profiles to be identified in the behavioral data. Furthermore,
recommendations are sought on how to serve customers without the need to
change the contact channel.
The data contains the following entities and their events
- Customer - customer of a Dutch public agency for handling unemployment benefits
- Office_U - user or worker involved in an activity handling a customer interaction
- Office_W - user or worker involved in an activity handling a customer interaction
- Complaint - a complaint document handed in by a customer
- ComplaintDossier - a collection of complaints by the same customer
- Session - browser-session identifier of a user browsing the website of the agency
- IP - IP address of a user browsing the website of the agency
Data Size
---------
BPIC16, nodes: 8109680, relationships: 86833139
This dataset capures statistical analysis of the HCHS cohort study using a knowledge graph and dashboard. Properties of 10,000 participants were analyzed for their association with cardiovascular disease as well as for their relationships among each other. The data is presented in the form of Neo4J database dumps and can be explored following the given user guide.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Graph Database Market size was valued at USD 1.9 USD billion in 2023 and is projected to reach USD 7.91 USD billion by 2032, exhibiting a CAGR of 22.6 % during the forecast period. A graph database is one form of NoSQL database that contains and represents relationships as graphs. Graph databases do not presuppose the data as relations as most contemporary relational databases do, applying nodes, edges, and properties instead. The primary types include property graphs that permit attributes on the nodes and edges and RDF triplestores that center on subject-predicate-object triplets. Some of the features include; the method's ability to traverse relationships at high rates, the schema change is easy and the method is scalable. Some of the familiar use cases are social media, recommendations, anomalies or fraud detection, and knowledge graphs where the relationships are complex and require higher comprehension. These databases are considered valuable where the future connection between the items of data is as significant as the data themselves. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can customize tables, graphs and maps on data related to children in a specific state or in the United States as a whole. Comparisons can be made between states. Background KIDS COUNT Data Center is part of the Annie E. Casey Foundation and serves to provide information on the status of children in America. The ten core indicators of interest under "Data by State" are: percent of low birth weight babies, infant mortality rate, child death rate, rate of teen deaths by accident, suicide and homicide, teen birth rate, percent of children living with parents who do not have full-time year-round employment, percent of teens who are high school drop outs, percent of teens not working and not in school, percent of children in poverty, and percent of families with children headed by a single parent. A number of other indicators, plus demographic and income information, are also included. "Data across States" is grouped into the following broad categories: demographics, education, economic well-being, family and community, health, safety and risk behaviors, and other. User Functionality Users can determine the view of the data- by table, line graph or map and can print or email the results. Data is available by state and across states. Data Across States allows users to access the raw data. Data is often present over a number of years. For a number of indicators under "Data Across States," users can view results by age, gender/ sex, or race/ ethnicity. Data Notes KIDS COUNT started in 1990. The most recent year of data is 2009 (or 2008 depending on the state, with some data available from 2010). Data is available on the national and state level, and for some states, at the county and city level.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A biodiversity dataset graph: UCSB-IZC
The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.
This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].
This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].
The images were counted using:
$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
| grep -o -P ".*depict"\
| sort\
| uniq\
| wc -l
And the occurrences were counted using:
$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
| grep -o -P "occurrence/([0-9])+"\
| sort\
| uniq\
| wc -l
The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.
To retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:
$ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5660088/files/
After that, verify the index of the archive by reproducing the following provenance log history:
$ java -jar preston.jar history
To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.
$ java -jar preston.jar verify
hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c OK CONTENT_PRESENT_VALID_HASH 66438 hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c
hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 OK CONTENT_PRESENT_VALID_HASH 4093 hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844
hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef OK CONTENT_PRESENT_VALID_HASH 5746 hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef
hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b OK CONTENT_PRESENT_VALID_HASH 6147 hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b
Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".
Files in this data publication:
--- start of file descriptions ---
-- description of archive and its contents (this file) --
README
-- executable java jar containing preston [2,3] v0.3.1. --
preston.jar
-- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --
preston-[00-ff].tar.gz
-- individual provenance index files --
2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
-- example image and meta-data --
sample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)
sample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)
--- end of file descriptions ---
References
[1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.
[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .
[3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132
[4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Retrieve semi-structured text data ENSDF from the National Nuclear Data Center through web crawlers and clean the obtained ENSDF data. Subsequently, the cleaned ENSDF data was parsed using the Nuclei tool to generate the Decay Scheme image data. A total of 18186 entities and 18983 entity pair relationships were annotated using roLabelImg and self-made tools. The dataset is divided into a test set and a training set.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Dataset general description:
• This dataset reports 4195 recurrent neural network models, their settings, and their generated prediction csv files, graphs, and metadata files, for predicting COVID-19's daily infections in Brazil by training on limited raw data (30 time-steps and 40 time-steps alternatives). The used code is developed by the author and located in the following online data repository link: http://dx.doi.org/10.17632/yp4d95pk7n.1
Dataset content:
• Models, Graphs, and csv predictions files: 1. Deterministic mode (DM): includes 1194 generated models files (30 time-steps), and their generated 2835 graphs and 2835 predictions files. Similarly, this mode includes 1976 generated model files (40 time-steps), and their generated 7301 graphs and 7301 predictions files. 2. Non-deterministic mode (NDM): includes 20 generated model files (30 time-steps), and their generated 53 graphs and 53 predictions files. 3. Technical validation mode (TVM): includes 1001 generated model files (30 time-steps), and their generated 3619 graphs and 3619 predictions files for 358 models, which are a sample of 1001 models. Also, 1 model in control group for India. 4. 1 graph and 1 prediction files for each of DM and NDM, reporting evaluation till 2020-07-11.
• Settings and metadata for the above 3 categories: 1. Used settings in json files for reproducibility. 2. Metadata about training and prediction setup and accuracy in csv files.
Raw data source that was used to train the models:
• The used raw data for training the models is from: COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University): https://github.com/CSSEGISandData/COVID-19
• The models were trained on these versions of the raw data: 1. Link till 2020-06-29 (accessed 2020-07-08): https://github.com/CSSEGISandData/COVID-19/raw/78d91b2dbc2a26eb2b2101fa499c6798aa22fca8/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv 2. Link till 2020-06-13 (accessed 2020-07-08): https://github.com/CSSEGISandData/COVID-19/raw/02ea750a263f6d8b8945fdd3253b35d3fd9b1bee/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
License: This prediction Dataset is licensed under CC BY NC 3.0.
Notice and disclaimer: 1- This prediction Dataset is for scientific and research purposes only. 2- The generation of this Dataset complies with the terms of use of the publicly available raw data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University: https://github.com/CSSEGISandData/COVID-19 and therefore, the author of the prediction Dataset disclaims any and all responsibility and warranties regarding the contents of used raw data, including but not limited to: the correctness, completeness, and any issues linked to third-party rights.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Knowledge Graph Construction Workshop 2023: challenge
Knowledge graph construction of heterogeneous data has seen a lot of uptake
in the last decade from compliance to performance optimizations with respect
to execution time. Besides execution time as a metric for comparing knowledge
graph construction, other metrics e.g. CPU or memory usage are not considered.
This challenge aims at benchmarking systems to find which RDF graph
construction system optimizes for metrics e.g. execution time, CPU,
memory usage, or a combination of these metrics.
Task description
The task is to reduce and report the execution time and computing resources
(CPU and memory usage) for the parameters listed in this challenge, compared
to the state-of-the-art of the existing tools and the baseline results provided
by this challenge. This challenge is not limited to execution times to create
the fastest pipeline, but also computing resources to achieve the most efficient
pipeline.
We provide a tool which can execute such pipelines end-to-end. This tool also
collects and aggregates the metrics such as execution time, CPU and memory
usage, necessary for this challenge as CSV files. Moreover, the information
about the hardware used during the execution of the pipeline is available as
well to allow fairly comparing different pipelines. Your pipeline should consist
of Docker images which can be executed on Linux to run the tool. The tool is
already tested with existing systems, relational databases e.g. MySQL and
PostgreSQL, and triplestores e.g. Apache Jena Fuseki and OpenLink Virtuoso
which can be combined in any configuration. It is strongly encouraged to use
this tool for participating in this challenge. If you prefer to use a different
tool or our tool imposes technical requirements you cannot solve, please contact
us directly.
Part 1: Knowledge Graph Construction Parameters
These parameters are evaluated using synthetic generated data to have more
insights of their influence on the pipeline.
Data
Mappings
Part 2: GTFS-Madrid-Bench
The GTFS-Madrid-Bench provides insights in the pipeline with real data from the
public transport domain in Madrid.
Scaling
Heterogeneity
Example pipeline
The ground truth dataset and baseline results are generated in different steps
for each parameter:
The pipeline is executed 5 times from which the median execution time of each
step is calculated and reported. Each step with the median execution time is
then reported in the baseline results with all its measured metrics.
Query timeout is set to 1 hour and knowledge graph construction timeout
to 24 hours. The execution is performed with the following tool
Each parameter has its own directory in the ground truth dataset with the
following files:
metadata.json
.Datasets
Knowledge Graph Construction Parameters
The dataset consists of:
Format
All input datasets are provided as CSV, depending on the parameter that is being
evaluated, the number of rows and columns may differ. The first row is always
the header of the CSV.
GTFS-Madrid-Bench
The dataset consists of:
Format
CSV datasets always have a header as their first row.
JSON and XML datasets have their own schema.
Evaluation criteria
Submissions must evaluate the following metrics:
Expected output
Duplicate values
Scale | Number of Triples |
---|---|
0 percent | 2000000 triples |
25 percent | 1500020 triples |
50 percent | 1000020 triples |
75 percent | 500020 triples |
100 percent | 20 triples |
Empty values
Scale | Number of Triples |
---|---|
0 percent | 2000000 triples |
25 percent | 1500000 triples |
50 percent | 1000000 triples |
75 percent | 500000 triples |
100 percent | 0 triples |
Mappings
Scale | Number of Triples |
---|---|
1TM + 15POM | 1500000 triples |
3TM + 5POM | 1500000 triples |
5TM + 3POM | 1500000 triples |
15TM + 1POM | 1500000 triples |
Properties
Scale | Number of Triples |
---|---|
1M rows 1 column | 1000000 triples |
1M rows 10 columns | 10000000 triples |
1M rows 20 columns | 20000000 triples |
1M rows 30 columns | 30000000 triples |
Records
Scale | Number of Triples |
---|---|
10K rows 20 columns | 200000 triples |
100K rows 20 columns | 2000000 triples |
1M rows 20 columns | 20000000 triples |
10M rows 20 columns | 200000000 triples |
Joins
1-1 joins
Scale | Number of Triples |
---|---|
0 percent | 0 |
(Link to Metadata) The WaterHydro_DLGSW layer represents surface waters (hydrography) at a scale of RF 100000. WaterHydro_DLGSW was derived from RF100000 USGS Digital Line Graph (DLG). DLG's of map features are converted to digital form from maps and related sources. Refer to the USGS web site from more information on DLGs (http://www.usgs.gov)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }