This repository contains datasets to quickly test graph classification algorithms, such as Graph Kernels and Graph Neural Networks. The purpose of this dataset is to make the features on the nodes and the adjacency matrix to be completely uninformative if considered alone. Therefore, an algorithm that relies only on the node features or on the graph structure will fail to achieve good classification results. A more detailed description of the dataset construction can be found on the Github page (https://github.com/FilippoMB/Benchmark_dataset_for_graph_classification), in the original publication and in the original publication: Bianchi, Filippo Maria, Claudio Gallicchio, and Alessio Micheli. "Pyramidal Reservoir Graph Neural Network." Neurocomputing 470 (2022): 389-404, and in the README.txt file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.
Dataset Details
The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics.Anatomical and functional MRI images of the brain have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such as Alzheimer's, Parkinson's, and Autism. Recently, the study of the brain in the form of brain networks using machine learning and graph analytics has become increasingly popular, especially to predict the early onset of these conditions. A brain network, represented as a graph, retains richer structural and positional information that traditional examination methods are unable to capture. However, the lack of brain network data transformed from functional MRI images prevents researchers from data-driven explorations. One of the main difficulties lies in the complicated domain-specific preprocessing steps and the exhaustive computation required to convert data from MRI images into brain networks. We bridge this gap by collecting a large amount of available MRI images from existing studies, working with domain experts to make sensible design choices, and preprocessing the MRI images to produce a collection of brain network datasets. The datasets originate from 6 different sources, cover 4 neurodegenerative conditions, and consist of a total of 2,688 subjects.Due to the data protocol, we are unable to release the ADNI dataset here. The data will be released via the ADNI external data submissions within their data system.We test our graph datasets on 5 machine learning models commonly used in neuroscience and on a recent graph-based analysis model to validate the data quality and to provide domain baselines. To lower the barrier to entry and promote the research in this interdisciplinary field, we release our complete preprocessing details, codes, and brain network data: https://github.com/brainnetuoa/data_driven_network_neuroscience.To stay informed about the new updates of the datasets, kindly provide us with your email address:https://forms.gle/KGAajR6LEysXWKvKAUpdated on 10/09/2024:Please note that we have identified 14 subjects in the PPMI (Parkinson's Progression Markers Initiative) dataset, prodromal group, where the time-series images include only 10 time slots. The invalid subjects are:sub-prodromal103857sub-prodromal120622sub-prodromal146573sub-prodromal40737sub-prodromal52874sub-prodromal55560sub-prodromal56680sub-prodromal58027sub-prodromal58680sub-prodromal59390sub-prodromal59483sub-prodromal59503sub-prodromal71658sub-prodromal75422We have removed the invalid images, and updated the dataset by including both the parcellated images (ppmi_v2.zip) and the preprocessed images (Ppmi_Preprocessed_v2.z*).
This tutorial will teach you how to take time-series data from many field sites and create a shareable online map, where clicking on a field location brings you to a page with interactive graph(s).
The tutorial can be completed with a sample dataset (provided via a Google Drive link within the document) or with your own time-series data from multiple field sites.
Part 1 covers how to make interactive graphs in Google Data Studio and Part 2 covers how to link data pages to an interactive map with ArcGIS Online. The tutorial will take 1-2 hours to complete.
An example interactive map and data portal can be found at: https://temple.maps.arcgis.com/apps/View/index.html?appid=a259e4ec88c94ddfbf3528dc8a5d77e8
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.
To build a graph G = (V, E), we use email addresses as nodes V. Every node vi has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge eij between two nodes i and j if there is at least one email exchange between the corresponding addresses.
Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.
The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.
'id-email.csv' is a file containing the actual email addresses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Charts, Histograms, and Time Series • Create a histogram graph from band values of an image collection • Create a time series graph from band values of an image collection
Explanation/Overview: Corresponding graph files of the extracted Zooniverse networks described in D3.3 (can be found here), which are the result of our research that culminated into the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. Usernames have been anonymised. The graph files are in .gexf
(graph exchange XML format) and .gml
(graph modeling language) formats which can be used by common graph/network-analysis and visualisation tools such as Gephi. Purpose: The purpose of this dataset is to provide the basis for possible further examinations of the network structure, involving additional (not yet analysed) features such as the content of the comments etc. Relatedness: The data of the different projects was derived from the forums of 7 Zooniverse projects based on similar discussion board features. The projects are: 'Galaxy Zoo', 'Gravity Spy', 'Seabirdwatch', 'Snapshot Wisconsin', 'Wildwatch Kenya', 'Galaxy Nurseries', 'Penguin Watch'. Content: The dataset contains distinct graph files for each of the analysed projects. For each graph file, there are nodes and edges and their associated attributes (i.e., each edge can have an attribute). For the edges, apart from source and target, we have as attributes: weight
project_title
body
(i.e., text) created_at
userRoles
discussion_title
discussion_id
user_id
board_title
relation
target_role
For the nodes, the attributes are: user_id
userRoles
degree_reply
(i.e., degree for the reply relation) in_degree_reply
out_degree_reply
degree_comment
in_degree_comment
out_degree_comment
degree_total
in_degree_total
out_degree_total
target_role
Grouping: Each graph file represents all the comments for the respective project across its lifespan irrespective of any time slices. Edges represent the comments and users represent the nodes. While the different boards are still contained within the data, all boards occur in the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ebitda Time Series for Beijing Compass Technology Develop. Beijing Compass Technology Development Co., Ltd. develops and delivers securities analysis software and securities information solutions in China. The company was formerly known as Beijing Compass Securities Research Co., Ltd. and changed its name to Beijing Compass Technology Development Co., Ltd. in April 2001. Beijing Compass Technology Development Co., Ltd. was founded in 1997 and is headquartered in Beijing, China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Total-Revenue Time Series for Beijing Compass Technology Develop. Beijing Compass Technology Development Co., Ltd. develops and delivers securities analysis software and securities information solutions in China. The company was formerly known as Beijing Compass Securities Research Co., Ltd. and changed its name to Beijing Compass Technology Development Co., Ltd. in April 2001. Beijing Compass Technology Development Co., Ltd. was founded in 1997 and is headquartered in Beijing, China.
Graph Database Market Size 2025-2029
The graph database market size is forecast to increase by USD 11.24 billion at a CAGR of 29% between 2024 and 2029.
The market is experiencing significant growth, driven by the increasing popularity of open knowledge networks and the rising demand for low-latency query processing. These trends reflect the growing importance of real-time data analytics and the need for more complex data relationships to be managed effectively. However, the market also faces challenges, including the lack of standardization and programming flexibility. These obstacles require innovative solutions from market participants to ensure interoperability and ease of use for businesses looking to adopt graph databases.
Companies seeking to capitalize on market opportunities must focus on addressing these challenges while also offering advanced features and strong performance to differentiate themselves. Effective navigation of these dynamics will be crucial for success in the evolving graph database landscape. Compliance requirements and data privacy regulations drive the need for security access control and data anonymization methods. Graph databases are deployed in both on-premises data centers and cloud regions, providing flexibility for businesses with varying IT infrastructures.
What will be the Size of the Graph Database Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample
In the dynamic market, security and data management are increasingly prioritized. Authorization mechanisms and encryption techniques ensure data access control and confidentiality. Query optimization strategies and indexing enhance query performance, while data anonymization methods protect sensitive information. Fault tolerance mechanisms and data governance frameworks maintain data availability and compliance with regulations. Data quality assessment and consistency checks address data integrity issues, and authentication protocols secure concurrent graph updates. This model is particularly well-suited for applications in social networks, recommendation engines, and business processes that require real-time analytics and visualization.
Graph database tuning and monitoring optimize hardware resource usage and detect performance bottlenecks. Data recovery procedures and replication methods ensure data availability during disasters and maintain data consistency. Data version control and concurrent graph updates address versioning and conflict resolution challenges. Data anomaly detection and consistency checks maintain data accuracy and reliability. Distributed transactions and data recovery procedures ensure data consistency across nodes in a distributed graph database system.
How is this Graph Database Industry segmented?
The graph database industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
End-user
Large enterprises
SMEs
Type
RDF
LPG
Solution
Native graph database
Knowledge graph engines
Graph processing engines
Graph extension
Geography
North America
US
Canada
Europe
France
Germany
Italy
Spain
UK
APAC
China
India
Japan
Rest of World (ROW)
By End-user Insights
The Large enterprises segment is estimated to witness significant growth during the forecast period. In today's business landscape, large enterprises are turning to graph databases to manage intricate data relationships and improve decision-making processes. Graph databases offer unique advantages over traditional relational databases, enabling superior agility in modeling and querying interconnected data. These systems are particularly valuable for applications such as fraud detection, supply chain optimization, customer 360 views, and network analysis. Graph databases provide the scalability and performance required to handle large, dynamic datasets and uncover hidden patterns and insights in real time. Their support for advanced analytics and AI-driven applications further bolsters their role in enterprise digital transformation strategies. Additionally, their flexibility and integration capabilities make them well-suited for deployment in hybrid and multi-cloud environments.
Graph databases offer various features that cater to diverse business needs. Data lineage tracking ensures accountability and transparency, while graph analytics engines provide advanced insights. Graph database benchmarking helps organizations evaluate performance, and relationship property indexing streamlines data access. Node relationship management facilitates complex data modeling, an
The global precipitation time series provides time series charts showing observations of daily precipitation as well as accumulated precipitation compared to normal accumulated amounts for various stations around the world. These charts are created for different scales of time (30, 90, 365 days). Each station has a graphic that contains two charts. The first chart in the graphic is a time series in the format of a line graph, representing accumulated precipitation for each day in the time series compared to the accumulated normal amount of precipitation. The second chart is a bar graph displaying actual daily precipitation. The total accumulation and surplus or deficit amounts are displayed as text on the charts representing the entire time scale, in both inches and millimeters. The graphics are updated daily and the graphics reflect the updated observations and accumulated precipitation amounts including the latest daily data available. The available graphics are rotated, meaning that only the most recently created graphics are available. Previously made graphics are not archived.
The COKI Open Access Dataset measures open access performance for 142 countries and 5117 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/. The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution. Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, Microsoft Academic Graph, Unpaywall and the Research Organization Registry. After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count. See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week. Code The COKI Academic Observatory data collection pipeline is used to create the dataset. The COKI OA Website Github project contains the code for the web app that visualises the dataset at open.coki.ac. It can be found on Zenodo here. License COKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0. Attributions This work contains information from: Microsoft Academic Graph which is made available under the ODC Attribution Licence. Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details. Unpaywall. The Unpaywall Data Feed is used under license. Data is freely available from Unpaywall via the API, data dumps and as a data feed. Research Organization Registry which is made available under a CC0 licence. The Curtin Open Knowledge Initiative (COKI) is a strategic initiative of the Research Office at Curtin, the Faculty of Humanities, School of Media, Creative Arts and Social Inquiry and the Curtin Institute for Computation, with additional support from the Andrew W. Mellon Foundation and the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accounts-Payable Time Series for Beijing Compass Technology Develop. Beijing Compass Technology Development Co., Ltd. develops and delivers securities analysis software and securities information solutions in China. The company was formerly known as Beijing Compass Securities Research Co., Ltd. and changed its name to Beijing Compass Technology Development Co., Ltd. in April 2001. Beijing Compass Technology Development Co., Ltd. was founded in 1997 and is headquartered in Beijing, China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Total-Current-Assets Time Series for Beijing Compass Technology Develop. Beijing Compass Technology Development Co., Ltd. develops and delivers securities analysis software and securities information solutions in China. The company was formerly known as Beijing Compass Securities Research Co., Ltd. and changed its name to Beijing Compass Technology Development Co., Ltd. in April 2001. Beijing Compass Technology Development Co., Ltd. was founded in 1997 and is headquartered in Beijing, China.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains 250 million rows
of information from the ~500 bike stations
of the Barcelona public bicycle sharing service. The data consists in time series information of the electric and mechanical bicycles available every 4 minutes
aprox., from March 2019 to March 2024
(latest available csv file, with the idea of being updated with every new month's file). This data could inspire many different use cases, from geographical data analysis to hierarchical ML time series models or Graph Neural Networks among others. Feel free to create a New Notebook from this page to use it and share your ideas with everyone!
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3317928%2F64409b5bd3c220993e05f5e155fd8c25%2Fstations_map_2024.png?generation=1713725887609128&alt=media" alt="">
Every month's information is separated in a different file as {year}_{month}_STATIONS.csv
. Then the metadata info of every station has been simplified and compressed in the {year}_INFO.csv
files where there is a single entry for every station and day, separated in a different file for every year.
The original data has some different errors, few of them have been already corrected but there are still some missing values, columns with wrong data types and other fewer artifacts or missing data. From time to time I may be manually correcting more of those.
The data is collected from the public BCN Open Data website, which is available for everyone (some resources need from creating a free account and token): - Stations data: https://opendata-ajuntament.barcelona.cat/data/en/dataset/estat-estacions-bicing - Stations info: https://opendata-ajuntament.barcelona.cat/data/en/dataset/informacio-estacions-bicing
You can find more information in them.
Please, consider upvoting this dataset if you find it interesting! 🤗
Some observations:
The historical data for June '19 does not have data for the 20th between 7:40 am and 2:00 pm.
The historical data for July '19 does not have data from the 26th at 1:30 pm until the 29th at 10:40 am.
The historical data for November '19 may not have some data from 10:00 pm on the 26th to 11:00 am on the 27th.
The historical data for August '20 does not have data from the 7th at 2:25 am until the 10th at 10:40 am.
The historical data for November '20 does not have data on the following days/times: 4th from 1:45 am to 11:05 am 20th from 7:50 pm to the 21st at 10:50 am 27th from 2:50 am to the 30th at 9:50 am.
The historical data for August '23 does not have data from the 22nd to the 31st due to a technical incident.
The historical data for September '23 does not have data from the 1st to the 5th due to a technical incident.
The historical data for February '24 does not have data on the 5th between 12:50 pm and 1:05 pm.
Others: Due to COVID-19 measures, the Bicing service was temporarily stopped, reflecting this situation in the historical data.
Field Description:
Array of data for each station:
station_id
: Identifier of the station
num_bikes_available
: Number of available bikes
num_bikes_available_types
: Array of types of available bikes
mechanical
: Number of available mechanical bikes
ebike
: Number of available electric bikes
num_docks_available
: Number of available docks
is_installed
: The station is properly installed (0-NO,1-YES)
is_renting
: The station is providing bikes correctly
is_returning
: The station is docking bikes correctly
last_reported
: Timestamp of the station information
is_charging_station
: The station has electric bike charging capacity
status
: Status of the station (IN_SERVICE=In service, CLOSED=Closed)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For K4 graph, a coloring type (K4,K4;n) is such an edge coloring of the full Kn graph, which does not have the K4 subgraph in the first color (representing by no edges in the graph) or the K4 subgraph in the second color (representing by edges in the graph).The Ramsey number R(4,4) is the smallest natural number n such that for any edge coloring of the full Kn graph there is an isomorphic subgraph with K4 in the first color (no edge in the graph) or isomorphic with K4 in the second color (exists edge in the graph). Coloring types (K4,K4;n) exist for n<R(4,4).The dataset consists of 14 files containing all non-isomorphic graphs that are coloring types (K4,K4;n) for 1<n<16.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing templates in the ORKG semantically relevant to the given paper.
Two approaches have been trained on this dataset in the context of this master's thesis, namely a Natural Language Inference (NLI) approach based on SciBERT embeddings and an unsupervised approach based on ElasticSearch.
This publication consists therefore of one general dataset, two training sets for each approach, validation set for the supervised approach and a test set for both approaches.
dataset.json
The main JSON object consists of a list of templates and a list of neutral papers.
Each template object has an ID, label, list of research fields, list of properties and list of papers using that template, whereas each paper object has ID, label, DOI, research field and abstract.
Each neutral paper object has the same schema of a paper object using that template.
See an example instance below.
{
"templates": [
{
"id": "R138668",
"label": "Psychiatric Disorders AI Overview",
"research_fields": [
{
"id": "http://orkg.org/orkg/resource/R133",
"label": "Artificial Intelligence"
}
...
],
"properties": [
"Study cohort",
...
],
"papers": [
{
"id": "R138698",
"label": "Application of Autoencoder in Depression Diagnosis",
"doi": "10.12783/dtcse/csma2017/17335",
"research_field": {
"id": "R104",
"label": "Bioinformatics"
},
"abstract": "Major depressive disorder (MDD) is a mental disorder characterized by at least two weeks of low mood which is present across most situations. Diagnosis of MDD using rest-state functional magnetic resonance imaging (fMRI) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability. No method can automatically extract discriminative features from the origin time series in fMRI images for MDD diagnosis. In this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge. An autoencoder was used to learn pre-training parameters of a dimensionality reduction process using 3-D convolution network. Through comparison with the other three feature extraction methods, our method achieved the best classification performance. This method can be used not only in MDD diagnosis, but also other similar disorders."
},
...
},
...
]
"neutral_papers": [
{
"id": "R109377",
"label": "Structural basis of SARS-CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants",
"doi": "10.1016/j.jpha.2020.03.009",
"research_field": {
"id": "R104",
"label": "Bioinformatics"
},
"abstract": "Abstract The recent outbreak of coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 in December 2019 raised global health concerns. The viral 3-chymotrypsin-like cysteine protease (3CLpro) enzyme controls coronavirus replication and is essential for its life cycle. 3CLpro is a proven drug discovery target in the case of severe acute respiratory syndrome coronavirus (SARS-CoV) and middle east respiratory syndrome coronavirus (MERS-CoV). Recent studies revealed that the genome sequence of SARS-CoV-2 is very similar to that of SARS-CoV. Therefore, herein, we analysed the 3CLpro sequence, constructed its 3D homology model, and screened it against a medicinal plant library containing 32,297 potential anti-viral phytochemicals/traditional Chinese medicinal compounds. Our analyses revealed that the top nine hits might serve as potential anti- SARS-CoV-2 lead molecules for further optimisation and drug development process to combat COVID-19."
},
...
]
}
All other files
The main JSON object consists of a list of entailments, a list of contradiction and a list of neutrals.
Each object of the above mentioned lists has the same schema. An instance_id created by concatenating the template_id (when exists) with the paper_id, a template_id, a paper_id, premise (representing the paper's title), hypthesis (representing the paper's abstract), their concatenation in sequence and the target class.
See an example instance below.
{
"entailments": [
{
"instance_id": "R138668xR138698",
"template_id": "R138668",
"paper_id": "R138698",
"premise": "psychiatric disorders ai overview study cohort outcome assessment aims performance findings used models data",
"hypothesis": "application of autoencoder in depression diagnosis major depressive disorder (mdd) is a mental disorder characterized by at least two weeks of low mood which is present across most situations diagnosis of mdd using rest state functional magnetic resonance imaging (fmri) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability no method can automatically extract discriminative features from the origin time series in fmri images for mdd diagnosis in this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge an autoencoder was used to learn pre training parameters of a dimensionality reduction process using 3 d convolution network through comparison with the other three feature extraction methods, our method achieved the best classification performance this method can be used not only in mdd diagnosis, but also other similar disorders",
"sequence": "[CLS] psychiatric disorders ai overview study cohort outcome assessment aims performance findings used models data [SEP] application of autoencoder in depression diagnosis major depressive disorder (mdd) is a mental disorder characterized by at least two weeks of low mood which is present across most situations diagnosis of mdd using rest state functional magnetic resonance imaging (fmri) data faces many challenges due to the high dimensionality, small samples, noisy and individual variability no method can automatically extract discriminative features from the origin time series in fmri images for mdd diagnosis in this study, we proposed a new method for feature extraction and a workflow which can make an automatic feature extraction and classification without a prior knowledge an autoencoder was used to learn pre training parameters of a dimensionality reduction process using 3 d convolution network through comparison with the other three feature extraction methods, our method achieved the best classification performance this method can be used not only in mdd diagnosis, but also other similar disorders [SEP]",
"target": "entailment"
},
...
],
"contradictions": [ ... ],
"neutrals": [ ... ]
}
Statistics
- | Training (supervised) | Validation (supervised) | Training (unsupervised) | Test |
Entailment | 180 | 20 | 200 | 52 |
Neutral | 180 | 20 | 200 | 64 |
Contradictrion | 736 | 84 | 0 | 0 |
Total | 1096 | 124 | 400 | 116 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Net-Income-Including-Non-Controlling-Interests Time Series for Beijing Compass Technology Develop. Beijing Compass Technology Development Co., Ltd. develops and delivers securities analysis software and securities information solutions in China. The company was formerly known as Beijing Compass Securities Research Co., Ltd. and changed its name to Beijing Compass Technology Development Co., Ltd. in April 2001. Beijing Compass Technology Development Co., Ltd. was founded in 1997 and is headquartered in Beijing, China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Studying the graph characteristics of these networks is beneficial;
Moreover, understanding the vulnerabilities and attack possibilities unique to these networks allows us to develop proactive defense mechanisms and mitigate potential threats.
Data collection method: ask all reachable nodes continuously for their known peers. In Bitcoin's parlor, we send GETADDR messages and store all ADDR replies, drawing a connection between the sending node to all ip addresses contained in the ADDR message.
All IP addresses have been replaced by numbers (NodeID) for ethical reasons. NodeIDs are consistent accross all files. The same NodeID corresponds to the same ip in ALL files (if present). Filenames contain the timestamp and the corresponding network. The date-time format is YYYYMMDD-HHMISS.
File Contents: The edgelist files store information about the structure of the connectivity graph. Each file represents an edgelist of a graph at the specified time-stamp. Each line in a file corresponds the the list of known peers to a node. The NodeID of the node is the first number of each line. Example: the following line
S N1 N2 N3 N4
means that node S knows of nodes N1..N4; their ip addresses were included in S's ADDR responses.
To process the files in snap and networkx proper transformations have to be made. Please read the relevant documentation to find the appropriate input.
This dataset has been used in the following works:
- @inproceedings{aris_ssec,
author = {Paphitis, Aristodemos and Kourtellis, Nicolas and Sirivianos, Michael},
title = {Graph Analysis of Blockchain {P2P} Overlays and their Security Implications},
booktitle = {Proceedings of the 9th International Symposium on Security and Privacy in Social Networks and Big Data (SocialSec 2023)},
series = {Lecture Notes in Computer Science},
volume = {13983},
publisher = {Springer Nature},
year = {2023},
}
Please cite as:
Aristodemos Paphitis, Nicolas Kourtellis, and Michael Sirivianos. A First Look into the Structural Properties of Blockchain P2P Overlays. DOI:https://doi.org/10.6084/m9.figshare.23522919
bibtex:
@misc{paphitis_first_nodate,
author = {Paphitis, Aristodemos and Kourtellis, Nicolas and Sirivianos, Michael},
title = {A First Look into the Structural Properties of Blockchain {P2P} Overlays},
howpublished = {Public dataset with figshare},
doi = {10.6084/m9.figshare.23522919},
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Index Time Series for BNP Paribas Easy MSCI Europe SRI S-Series PAB 5% Capped UCITS ETF Distribution. The frequency of the observation is daily. Moving average series are also typically included. NA
This repository contains datasets to quickly test graph classification algorithms, such as Graph Kernels and Graph Neural Networks. The purpose of this dataset is to make the features on the nodes and the adjacency matrix to be completely uninformative if considered alone. Therefore, an algorithm that relies only on the node features or on the graph structure will fail to achieve good classification results. A more detailed description of the dataset construction can be found on the Github page (https://github.com/FilippoMB/Benchmark_dataset_for_graph_classification), in the original publication and in the original publication: Bianchi, Filippo Maria, Claudio Gallicchio, and Alessio Micheli. "Pyramidal Reservoir Graph Neural Network." Neurocomputing 470 (2022): 389-404, and in the README.txt file.