100+ datasets found

Data from: A Novel Curated Scholarly Graph Connecting Textual and Data...
data.europa.eu
zenodo.org
unknown
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). A Novel Curated Scholarly Graph Connecting Textual and Data Publications [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7464120?locale=en
Explore at:
unknown(349944309)Available download formats
Dataset updated
May 31, 2024
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains an open and curated scholarly graph we built as a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks. This graph represents the European Marine Science community included in the OpenAIRE Graph. The nodes of the graph we release represent publications, datasets, software, and authors respectively; edges interconnecting research products always have the publication as source, and the dataset/software as target. In addition, edges are labeled with semantics that outline whether the publication is referencing, citing, documenting, or supplementing the related outcome. To curate and enrich nodes metadata and edges semantics, we relied on the information extracted from the PDF of the publications and the datasets/software webpages respectively. We curated the authors so to remove duplicated nodes representing the same person. The resource we release counts 4,047 publications, 5,488 datasets, 22 software, 21,561 authors, and 9,692 edges connect publications to datasets/software. This graph is in the curated_MES folder. We provide this resource as: a property graph: we provide the dump that can be imported in neo4j 5 jsonl files containing publications, datasets, software, authors, and relationships respectively. Each line of a jsonl file contains a JSON object representing a node and contains the metadata of that node (or a relationship). We provide two additional scholarly graphs: The curated MES graph with the removed edges. During the curation we removed some edges since they were labeled with an inconsistent or imprecise semantics. This graph includes the same nodes and edges as the previous one, and, in addition, it contains the edges removed during the curation pipeline; these edges are marked as Removed. This graph is in the curated_MES_with_removed_semantics folder. The original MES community of OpenAIRE. It represents the MES community extracted from the OpenAIRE Research Graph. This graph has not been curated, and the metadata and semantics are those of the OpenAIRE Research Graph. This graph is in the original_MES_community folder.
d
Graphical representations of data from sediment cores collected in 2009...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Graphical representations of data from sediment cores collected in 2009 offshore from Palos Verdes, California [Dataset]. https://catalog.data.gov/dataset/graphical-representations-of-data-from-sediment-cores-collected-in-2009-offshore-from-palo
Explore at:
Dataset updated
Nov 20, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Palos Verdes Peninsula, Rancho Palos Verdes, California
Description
This part of the data release includes graphical representation (figures) of data from sediment cores collected in 2009 offshore of Palos Verdes, California. This file graphically presents combined data for each core (one core per page). Data on each figure are continuous core photograph, CT scan (where available), graphic diagram core description (graphic legend included at right; visual grain size scale of clay, silt, very fine sand [vf], fine sand [f], medium sand [med], coarse sand [c], and very coarse sand [vc]), multi-sensor core logger (MSCL) p-wave velocity (meters per second) and gamma-ray density (grams per cc), radiocarbon age (calibrated years before present) with analytical error (years), and pie charts that present grain-size data as percent sand (white), silt (light gray), and clay (dark gray). This is one of seven files included in this U.S. Geological Survey data release that include data from a set of sediment cores acquired from the continental slope, offshore Los Angeles and the Palos Verdes Peninsula, adjacent to the Palos Verdes Fault. Gravity cores were collected by the USGS in 2009 (cruise ID S-I2-09-SC; http://cmgds.marine.usgs.gov/fan_info.php?fan=SI209SC), and vibracores were collected with the Monterey Bay Aquarium Research Institute's remotely operated vehicle (ROV) Doc Ricketts in 2010 (cruise ID W-1-10-SC; http://cmgds.marine.usgs.gov/fan_info.php?fan=W110SC). One spreadsheet (PalosVerdesCores_Info.xlsx) contains core name, location, and length. One spreadsheet (PalosVerdesCores_MSCLdata.xlsx) contains Multi-Sensor Core Logger P-wave velocity, gamma-ray density, and magnetic susceptibility whole-core logs. One zipped folder of .bmp files (PalosVerdesCores_Photos.zip) contains continuous core photographs of the archive half of each core. One spreadsheet (PalosVerdesCores_GrainSize.xlsx) contains laser particle grain size sample information and analytical results. One spreadsheet (PalosVerdesCores_Radiocarbon.xlsx) contains radiocarbon sample information, results, and calibrated ages. One zipped folder of DICOM files (PalosVerdesCores_CT.zip) contains raw computed tomography (CT) image files. One .pdf file (PalosVerdesCores_Figures.pdf) contains combined displays of data for each core, including graphic diagram descriptive logs. This particular metadata file describes the information contained in the file PalosVerdesCores_Figures.pdf. All cores are archived by the U.S. Geological Survey Pacific Coastal and Marine Science Center.
f
Data from: Aspects of University Students' Graph Sense in a Virtual Learning...
scielo.figshare.com
jpeg
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabiana Chagas de Andrade; Carolina Vieira Schiller; Dione Aparecido Ferreira da Silva; Larissa Pereira Menezes; Alexandre Sousa da Silva (2023). Aspects of University Students' Graph Sense in a Virtual Learning Environment [Dataset]. http://doi.org/10.6084/m9.figshare.14304727.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14304727.v1
Dataset updated
Jun 3, 2023
Dataset provided by
SciELO journals
Authors
Fabiana Chagas de Andrade; Carolina Vieira Schiller; Dione Aparecido Ferreira da Silva; Larissa Pereira Menezes; Alexandre Sousa da Silva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract To break with the traditional model of Basic Statistics classes in Higher Education, we sought on Statistical Literacy and Critical Education to develop an activity about graphic interpretation, which took place in a Virtual Learning Environment (VLE), as a complement to classroom meetings. Twenty-three engineering students from a public higher education institution in Rio de Janeiro took part in the research. Our objective was to analyze elements of graphic comprehension in an activity that consisted of identifying incorrect statistical graphs, conveyed by the media, followed by argumentation and interaction among students about these errors. The main results evidenced that elements of the Graphic Sense were present in the discussions and were the goal of the students' critical analysis. The VLE was responsible for facilitating communication, fostering student participation, and linguistic writing, so the use of digital technologies and activities favored by collaboration and interaction are important for statistical development, but such construction is a gradual process.
Classes Knowledge Graph
kaggle.com
zip
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afroz (2024). Classes Knowledge Graph [Dataset]. https://www.kaggle.com/datasets/pythonafroz/dbpedia-classes-knowledge-graph
Explore at:
zip(174050111 bytes)Available download formats
Dataset updated
Aug 31, 2024
Authors
Afroz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DBPedia Classes

DBpedia is a knowledge graph extracted from Wikipedia, providing structured data about real-world entities and their relationships. DBpedia Classes are the core building blocks of this knowledge graph, representing different categories or types of entities.

Key Concepts:

Entity: A real-world object, such as a person, place, thing, or concept. Class: A group of entities that share common properties or characteristics. Instance: A specific member of a class.

Examples of DBPedia Classes:

Person: Represents individuals, e.g., "Barack Obama," "Albert Einstein." Place: Represents locations, e.g., "Paris," "Mount Everest." Organization: Represents groups, institutions, or companies, e.g., "Google," "United Nations." Event: Represents occurrences, e.g., "World Cup," "French Revolution." Artwork: Represents creative works, e.g., "Mona Lisa," "Star Wars."

Hierarchy and Relationships:

DBpedia classes often have a hierarchical structure, where subclasses inherit properties from their parent classes. For example, the class "Person" might have subclasses like "Politician," "Scientist," and "Artist."

Relationships between classes are also important. For instance, a "Person" might have a "birthPlace" relationship with a "Place," or an "Artist" might have a "hasArtwork" relationship with an "Artwork."

Applications of DBPedia Classes:

Semantic Search: DBPedia classes can be used to enhance search results by understanding the context and meaning of queries.

Knowledge Graph Construction: DBPedia classes form the foundation of knowledge graphs, which can be used for various applications like question answering, recommendation systems, and data integration.

Data Analysis: DBPedia classes can be used to analyze and extract insights from large datasets.
Communication Graphs
kaggle.com
zip
Updated Nov 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Communication Graphs [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-communication/discussion
Explore at:
zip(66715371 bytes)Available download formats
Dataset updated
Nov 15, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
email-EuAll: EU email communication network

The network was generated using email data from a large European research institution. For a period from October 2003 to May 2005 (18 months) we have anonymized information about all incoming and outgoing email of the research institution. For each sent or received email message we know the time, the sender and the recipient of the email. Overall we have 3,038,531 emails between 287,755 different email addresses. Note that we have a complete email graph for only 1,258 email addresses that come from the research institution. Furthermore, there are 34,203 email addresses that both sent and received email within the span of our dataset. All other email addresses are either non-existing, mistyped or spam.

Given a set of email messages, each node corresponds to an email address. We create a directed edge between nodes i and j, if i sent at least one message to j.

email-Enron: Enron email network

Enron email communication network covers all the email communication within a dataset of around half million emails. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i sent at least one email to address j, the graph contains an undirected edge from i to j. Note that non-Enron email addresses act as sinks and sources in the network as we only observe their communication with the Enron email addresses.

The Enron email data was originally released by William Cohen at CMU.

wiki-Talk: Wikipedia Talk network

Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. Each registered user has a talk page, that she and other users can edit in order to communicate and discuss updates to various articles on Wikipedia. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all user talk page changes and created a network.

The network contains all the users and discussion from the inception of Wikipedia till January 2008. Nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i at least once edited a talk page of user j.

comm-f2f-Resistance: Dynamic Face-to-Face Interaction Networks

The dynamic face-to-face interaction networks represent the interactions that happen during discussions between a group of participants playing the Resistance game. This dataset contains networks extracted from 62 games. Each game is played by 5-8 participants and lasts between 45--60 minutes. We extract dynamically evolving networks from the free-form discussions using the ICAF algorithm. The extracted networks are used to characterize and detect group deceptive behavior using the DeceptionRank algorithm.

The networks are weighted, directed and temporal. Each node represents a participant. At each 1/3 second, a directed edge from node u to v is weighted by the probability of participant u looking at participant v or the laptop. Additionally, we also provide a binary version where an edge from u to v indicates participant u looks at participant v (or the laptop).

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

http://snap.stanford.edu/data/index.html#email
m
Graph-Based Social Media Data on Mental Health Topics
data.mendeley.com
Updated Nov 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
Explore at:
Unique identifier
https://doi.org/10.17632/z45txpdp7f.2
Dataset updated
Nov 4, 2024
Authors
Samuel Ady Sanjaya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)
Web Graphs
kaggle.com
zip
Updated Nov 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Web Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-web
Explore at:
zip(52848952 bytes)Available download formats
Dataset updated
Nov 11, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dynamic face-to-face interaction networks represent the interactions that happen during discussions between a group of participants playing the Resistance game. This dataset contains networks extracted from 62 games. Each game is played by 5-8 participants and lasts between 45--60 minutes. We extract dynamically evolving networks from the free-form discussions using the ICAF algorithm. The extracted networks are used to characterize and detect group deceptive behavior using the DeceptionRank algorithm.

The networks are weighted, directed and temporal. Each node represents a participant. At each 1/3 second, a directed edge from node u to v is weighted by the probability of participant u looking at participant v or the laptop. Additionally, we also provide a binary version where an edge from u to v indicates participant u looks at participant v (or the laptop).

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

http://snap.stanford.edu/data/index.html#face2face
R
RKD-Knowledge-Graph
rkd.triply.cc
application/n-quads +5
Updated Nov 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RKD (2025). RKD-Knowledge-Graph [Dataset]. https://rkd.triply.cc/rkd/RKD-Knowledge-Graph
Explore at:
application/sparql-results+json, ttl, application/n-quads, application/n-triples, jsonld, application/trigAvailable download formats
Dataset updated
Nov 16, 2025
Dataset authored and provided by
RKD
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
We manage unique archives, documentation and photographic material and the largest art historical library on Western art from the Late Middle Ages to the present, with the focus on Netherlandish art. Our collections cover not only paintings, drawings and sculptures, but also monumental art, modern media and design. The collections are present in both digital and analogue form (the latter in our study rooms).

This knowledge graph represents our collection as Linked Data, primarily using the CIDOC-CRM and LinkedArt vocabularies.
Web Graphs (SNAP)
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Web Graphs (SNAP) [Dataset]. https://www.kaggle.com/wolfram77/graphs-snap-web
Explore at:
zip(54678245 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Berkeley-Stanford web graph

NOTE: This is an earlier version (2002) of the data obtained from
Sep Kamvar, Stanford (2003) (the Kamvar/Stanford_Berkeley graph
in the UF collection, matrix ID 980).

Dataset information

Nodes represent pages from berkely.edu and stanford.edu domains and directed
edges represent hyperlinks between them. The data was collected in 2002.

Dataset statistics
Nodes 685230
Edges 7600595
Nodes in largest WCC 654782 (0.956)
Edges in largest WCC 7499425 (0.987)
Nodes in largest SCC 334857 (0.489)
Edges in largest SCC 4523232 (0.595)
Average clustering coefficient 0.6149
Number of triangles 64690980
Fraction of closed triangles 0.08769
Diameter (longest shortest path) 669
90-percentile effective diameter 10

Source (citation)

J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large
Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. arXiv.org:0810.1355, 2008.

Files
File Description
web-BerkStan.txt.gz Berkely-Stanford web graph from 2002

NOTE: a near duplicate of this problem already appears in the UF Collection:

web-BerkStan Kamvar/Stanford_Berkeley
in SNAP/: n: 685,230 nz: 7,600,595
in Kamvar/ n: 683,446 nz: 7,583,376

I obtained the Kamvar/Stanford_Berkeley directly from Sep Kamvar. It is slightly smaller than the version in SNAP. It is thus likely that Sep created multiple versions of the graph.

Google web graph

Dataset information

Nodes represent web pages and directed edges represent hyperlinks between them. The data was released in 2002 by Google as a part of Google Programming
Contest.

Dataset statistics
Nodes 875713
Edges 5105039
Nodes in largest WCC 855802 (0.977)
Edges in largest WCC 5066842 (0.993)
Nodes in largest SCC 434818 (0.497)
Edges in largest SCC 3419124 (0.670)
Average clustering coefficient 0.6047
Number of triangles 13391903
Fraction of closed triangles 0.05523...
f
Data_Sheet_1_Toward a Taxonomy for Adaptive Data Visualization in Analytics...
frontiersin.figshare.com
figshare.com
xlsx
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tristan Poetzsch; Panagiotis Germanakos; Lynn Huestegge (2023). Data_Sheet_1_Toward a Taxonomy for Adaptive Data Visualization in Analytics Applications.xlsx [Dataset]. http://doi.org/10.3389/frai.2020.00009.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2020.00009.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Tristan Poetzsch; Panagiotis Germanakos; Lynn Huestegge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data analytics as a field is currently at a crucial point in its development, as a commoditization takes place in the context of increasing amounts of data, more user diversity, and automated analysis solutions, the latter potentially eliminating the need for expert analysts. A central hypothesis of the present paper is that data visualizations should be adapted to both the user and the context. This idea was initially addressed in Study 1, which demonstrated substantial interindividual variability among a group of experts when freely choosing an option to visualize data sets. To lay the theoretical groundwork for a systematic, taxonomic approach, a user model combining user traits, states, strategies, and actions was proposed and further evaluated empirically in Studies 2 and 3. The results implied that for adapting to user traits, statistical expertise is a relevant dimension that should be considered. Additionally, for adapting to user states different user intentions such as monitoring and analysis should be accounted for. These results were used to develop a taxonomy which adapts visualization recommendations to these (and other) factors. A preliminary attempt to validate the taxonomy in Study 4 tested its visualization recommendations with a group of experts. While the corresponding results were somewhat ambiguous overall, some aspects nevertheless supported the claim that a user-adaptive data visualization approach based on the principles outlined in the taxonomy can indeed be useful. While the present approach to user adaptivity is still in its infancy and should be extended (e.g., by testing more participants), the general approach appears to be very promising.
r
Data from: Generalized Typed Attributed Graph Transformation Systems based...
resodate.org
Updated Jun 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hartmut Ehrig; Karsten Ehrig; Claudia Ermel; Ulrike Prange (2020). Generalized Typed Attributed Graph Transformation Systems based on Morphisms Changing Type Graphs and Data Signature [Dataset]. http://doi.org/10.14279/depositonce-10258
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-10258
Dataset updated
Jun 15, 2020
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Hartmut Ehrig; Karsten Ehrig; Claudia Ermel; Ulrike Prange
Description
Our aim is to extend the framework of typed attributed graphs in [1] to generalized typed attributed graphs. They are based on generalized attributed graph morphisms, short GAG-morphisms, which allow to change the type graph, data signature, and domain. This allows to formulate type hierarchies and views of visual languages defined by GAG-morphisms between type graphs, short GATG-morphisms. In order to study interaction and integration of views, restriction of views along type hierarchies, restriction and integration of consistent view models and reflection of behaviour between different typed attributed graph transformation systems we present suitable conditions for the construction of pushouts and pullbacks, and special van Kampen properties in the category GAGraphs of generalized attributed graphs. Moreover, we show that (GAGraphs,M) and (GAGraphsATG,M) are adhesive HLR categories for the class M of injective, persistent, and signature preserving morphisms.
N
graph
data.cityofnewyork.us
data.wu.ac.at
csv, xlsx, xml
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Housing Preservation and Development (HPD) (2025). graph [Dataset]. https://data.cityofnewyork.us/Housing-Development/graph/g28n-t6w9
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Nov 1, 2025
Authors
Department of Housing Preservation and Development (HPD)
Description
The Department of Housing Preservation and Development (HPD) Housing Litigation Division (HLD) initiates' actions in the Housing Court against owners of privately-owned buildings to enforce compliance with the housing quality standards contained in the New York State Multiple Dwelling Law and the New York City Housing Maintenance Code. HLD attorneys also represent HPD when tenants initiate actions against private owners. HPD is automatically named as party to such actions. The goal of these court proceedings is to obtain enforceable Orders to Correct, Civil Penalties (fines) and Contempt Sanctions, compelling owners to comply with the Housing Code.
Z
Data Set Knowledge Graph (DSKG)
nde-dev.biothings.io
zenodo.org
Updated Feb 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Färber (2021). Data Set Knowledge Graph (DSKG) [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_4478920
Explore at:
Dataset updated
Feb 18, 2021
Dataset provided by
Michael Färber
David Lamprecht
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present the Data Set Knowledge Graph (DSKG.org), an RDF dataset about datasets that are linked to publications (modeled in the Microsoft Academic Knowledge Graph, MAKG) that mention the datasets. The metadata of the datasets is based on datasets that are registered in OpenAIRE and Wikidata.

What exactly do we provide?

Periodically updated RDF dump files of the Data Set Knowledge Graph.

URI resolution of the Data Set Knowledge Graph within the Linked Open Data.

A publicly accessible SPARQL endpoint containing the latest Dataset Knowledge Graph data.

How big is the Dataset Knowledge Graph?

The Dataset Knowledge Graph models, among others,

2,208 datasets from all scientific disciplines

813,551 links to 634,803 unique papers

1,169 authors of datasets

208 ORCID IDs.

Potential use cases:

Use the DSKG for the development of semantic search engines (e.g. use the metadata of the linked publications of the datasets for advanced search capabilities)

Easier data integration by using the RDF standard vocabulary DCAT and by linking resources to other data sources (e.g., combining the DSKG with other dataset collections in RDF).

Data analysis to measure and award the provisioning of datasets (e.g., determine the scientific influence of datasets and authors).
n
Data from: Multimodal Learning on Graphs: Methods and Applications
curate.nd.edu
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yihong Ma (2025). Multimodal Learning on Graphs: Methods and Applications [Dataset]. http://doi.org/10.7274/28792454.v1
Explore at:
Unique identifier
https://doi.org/10.7274/28792454.v1
Dataset updated
May 14, 2025
Dataset provided by
University of Notre Dame
Authors
Yihong Ma
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Graph data represents complex relationships across diverse domains, from social networks to healthcare and chemical sciences. However, real-world graph data often spans multiple modalities, including time-varying signals from sensors, semantic information from textual representations, and domain-specific encodings. This dissertation introduces innovative multimodal learning techniques for graph-based predictive modeling, addressing the intricate nature of these multidimensional data representations. The research systematically advances graph learning through innovative methodological approaches across three critical modalities. Initially, we establish robust graph-based methodological foundations through advanced techniques including prompt tuning for heterogeneous graphs and a comprehensive framework for imbalanced learning on graph data. we then extend these methods to time series analysis, demonstrating their practical utility through applications such as hierarchical spatio-temporal modeling for COVID-19 forecasting and graph-based density estimation for anomaly detection in unmanned aerial systems. Finally, we explore textual representations of graphs in the chemical domain, reformulating reaction yield prediction as an imbalanced regression problem to enhance performance in underrepresented high-yield regions critical to chemists.
f
Data from: Linear manifold modeling and graph estimation based on...
tandf.figshare.com
application/x-gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eugen Pircalabelu; Gerda Claeskens (2023). Linear manifold modeling and graph estimation based on multivariate functional data with different coarseness scales [Dataset]. http://doi.org/10.6084/m9.figshare.20426067.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20426067.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Eugen Pircalabelu; Gerda Claeskens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We develop a high-dimensional graphical modeling approach for functional data where the number of functions exceeds the available sample size. This is accomplished by proposing a sparse estimator for a concentration matrix when identifying linear manifolds. As such, the procedure extends the ideas of the manifold representation for functional data to high-dimensional settings where the number of functions is larger than the sample size. By working in a penalized setting it enriches the functional data framework by estimating sparse undirected graphs that show how functional nodes connect to other functional nodes. The procedure allows multiple coarseness scales to be present in the data and proposes a simultaneous estimation of several related graphs. Its performance is illustrated using a real-life fMRI dataset and with simulated data.
d
Data from: Grammar transformations of topographic feature type annotations...
catalog.data.gov
data.usgs.gov
+2more
Updated Oct 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Grammar transformations of topographic feature type annotations of the U.S. to structured graph data. [Dataset]. https://catalog.data.gov/dataset/grammar-transformations-of-topographic-feature-type-annotations-of-the-u-s-to-structured-g
Explore at:
Dataset updated
Oct 29, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-speech, tag, dependency relation, dependency head, and lemma. Each word within the lexicon was tokenized. A stop-words list was utilized only to remove punctuation and symbols from the text, excluding hyphenated words (ex. bowl-shaped) which remained as such. The tokens’ lemmas were then aggregated and totaled to find their recurrences within the lexicon. This procedure was repeated for tokenizing noun chunks using the same glossary definitions.
Data from: [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Nov 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Krukowski; Simon Krukowski; Ishari Amarasinghe; Ishari Amarasinghe; Nicolás Felipe Gutiérrez-Páez; Nicolás Felipe Gutiérrez-Páez; H. Ulrich Hoppe; H. Ulrich Hoppe (2022). [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects - Graph Files [Dataset]. http://doi.org/10.5281/zenodo.7356426
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7356426
Dataset updated
Nov 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon Krukowski; Simon Krukowski; Ishari Amarasinghe; Ishari Amarasinghe; Nicolás Felipe Gutiérrez-Páez; Nicolás Felipe Gutiérrez-Páez; H. Ulrich Hoppe; H. Ulrich Hoppe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Explanation/Overview:

Corresponding graph files of the extracted Zooniverse networks described in D3.3 (can be found here), which are the result of our research that culminated into the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. Usernames have been anonymised.

The graph files are in .gexf (graph exchange XML format) and .gml (graph modeling language) formats which can be used by common graph/network-analysis and visualisation tools such as Gephi.

Purpose:

The purpose of this dataset is to provide the basis for possible further examinations of the network structure, involving additional (not yet analysed) features such as the content of the comments etc.

Relatedness:

The data of the different projects was derived from the forums of 7 Zooniverse projects based on similar discussion board features. The projects are: 'Galaxy Zoo', 'Gravity Spy', 'Seabirdwatch', 'Snapshot Wisconsin', 'Wildwatch Kenya', 'Galaxy Nurseries', 'Penguin Watch'.

Content:

The dataset contains distinct graph files for each of the analysed projects. For each graph file, there are nodes and edges and their associated attributes (i.e., each edge can have an attribute). For the edges, apart from source and target, we have as attributes:

weight

project_title

body (i.e., text)

created_at

userRoles

discussion_title

discussion_id

user_id

board_title

relation

target_role

For the nodes, the attributes are:

user_id

userRoles

degree_reply (i.e., degree for the reply relation)

in_degree_reply

out_degree_reply

degree_comment

in_degree_comment

out_degree_comment

degree_total

in_degree_total

out_degree_total

target_role

Grouping:

Each graph file represents all the comments for the respective project across its lifespan irrespective of any time slices. Edges represent the comments and users represent the nodes. While the different boards are still contained within the data, all boards occur in the data.
D
Data from: Data related to Panzer: A Machine Learning Based Approach to...
darus.uni-stuttgart.de
Updated Nov 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Panzer (2024). Data related to Panzer: A Machine Learning Based Approach to Analyze Supersecondary Structures of Proteins [Dataset]. http://doi.org/10.18419/DARUS-4576
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4576
Dataset updated
Nov 27, 2024
Dataset provided by
DaRUS
Authors
Tim Panzer
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576
Time period covered
Nov 1, 1976 - Feb 29, 2024
Dataset funded by
DFG
Description
This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7
c
The global Graph Analytics market size is USD 2522 million in 2024 and will...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). The global Graph Analytics market size is USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/graph-analytics-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Sep 15, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Graph Analytics market size was USD 2522 million in 2024 and will expand at a compound annual growth rate (CAGR) of 34.0% from 2024 to 2031. Key Dynamics of Graph Analytics Market

Key Drivers of Graph Analytics Market

Increasing Demand for Immediate Big Data Insights: Organizations are progressively depending on graph analytics to handle extensive amounts of interconnected data for instantaneous insights. This is essential for applications such as fraud detection, recommendation systems, and customer behavior analysis, particularly within the finance, retail, and social media industries.

Rising Utilization in Fraud Detection and Cybersecurity: Graph analytics facilitates the discovery of intricate relationships within transactional data, aiding in the identification of anomalies, insider threats, and fraudulent patterns. Its capacity to analyze nodes and edges in real-time is leading to significant adoption in cybersecurity and banking sectors.

Progress in AI and Machine Learning Integration: Graph analytics platforms are progressively merging with AI and ML algorithms to improve predictive functionalities. This collaboration fosters enhanced pattern recognition, network analysis, and more precise forecasting across various sectors including healthcare, logistics, and telecommunications.

Key Restrains for Graph Analytics Market

High Implementation and Infrastructure Expenses: Establishing a graph analytics system necessitates sophisticated infrastructure, storage, and processing capabilities. These substantial expenses may discourage small and medium-sized enterprises from embracing graph-based solutions, particularly in the absence of a clear return on investment.

Challenges in Data Modeling and Querying: In contrast to conventional relational databases, graph databases demand specialized expertise for schema design, data modeling, and query languages such as Cypher or Gremlin. This significant learning curve hampers adoption in organizations lacking technical expertise.

Concerns Regarding Data Privacy and Security: Since graph analytics frequently involves the examination of sensitive personal and behavioral data, it presents regulatory and privacy challenges. Complying with data protection regulations like GDPR becomes increasingly difficult when handling large-scale, interconnected datasets.

Key Trends in Graph Analytics Market

Increased Utilization in Supply Chain and Logistics Optimization: Graph analytics is increasingly being adopted in logistics for the purpose of mapping routes, managing supplier relationships, and pinpointing bottlenecks. The implementation of real-time graph-based decision-making is enhancing both efficiency and resilience within global supply chains.

Growth of Cloud-Based Graph Analytics Platforms: Cloud service providers such as AWS, Azure, and Google Cloud are broadening their support for graph databases and analytics solutions. This shift minimizes initial infrastructure expenses and facilitates scalable deployments for enterprises of various sizes.

Advent of Explainable AI (XAI) in Graph Analytics: The need for explainability is becoming a significant priority in graph analytics. Organizations are pursuing transparency regarding how graph algorithms reach their conclusions, particularly in regulated sectors, which is increasing the demand for tools that offer inherent interpretability and traceability. Introduction of the Graph Analytics Market

The Graph Analytics Market is rapidly expanding, driven by the growing need for advanced data analysis techniques in various sectors. Graph analytics leverages graph structures to represent and analyze relationships and dependencies, providing deeper insights than traditional data analysis methods. Key factors propelling this market include the rise of big data, the increasing adoption of artificial intelligence and machine learning, and the demand for real-time data processing. Industries such as finance, healthcare, telecommunications, and retail are major contributors, utilizing graph analytics for fraud detection, personalized recommendations, network optimization, and more. Leading vendors are continually innovating to offer scalable, efficient solutions, incorporating advanced features like graph databases and visualization tools.
I
A Citation Graph from OpenAlex (Works)
databank.illinois.edu
Updated Jul 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorran Caetano Machado Lopes; George Chacko (2024). A Citation Graph from OpenAlex (Works) [Dataset]. http://doi.org/10.13012/B2IDB-7362697_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-7362697_V1
Dataset updated
Jul 29, 2024
Authors
Lorran Caetano Machado Lopes; George Chacko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
Illinois: Insper Collaboration
Description
This dataset consists of a citation graph. It was constructed by downloading and parsing the Works section of the Open Alex catalog of the global research system. Open Alex (see citation below) contains detailed information about scholarly research, including articles, authors, journals, institutions, and their relationships. The data were downloaded on 2024-07-15. The dataset comprises two compressed (.xz) files. 1) filename: openalexID_integer_id_hasDOI.parquet.xz. The tabular data within contains three columns: openalex_id, integer_id, and hasDOI. Each row represents a record with the following data types: • openalex_id: A unique identifier from the Open Alex catalog. • integer_id: An integer representing the new identifier (assigned by the authors) • hasDOI: An integer (0 or 1) indicating whether the record has a DOI (0 for no, 1 for yes). 2) filename: citation_table.tsv.xz This edgelist of citations has two columns (no header) of integer values that represent citing and cited integer_id, respectively. Summary Features • Total Nodes (Documents): 256,997,006 • Total Edges (citations): 2,148,871,058 • Documents with DOIs: 163,495,446 • Edges between documents with DOIs: 1,936,722,541 [corrected to 2,148,788,148 edges Nov 13, 2025] • Count of unique nodes in edgelist 111,453,719 [updated Nov 13, 2025] Note: Nov 13, 2025. An improved curation process will be applied to a future version of this dataset Note: Nov 13, 2025. The code used to generate these files can be found here: https://github.com/illinois-or-research-analytics/lorran_openalex/

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2024). A Novel Curated Scholarly Graph Connecting Textual and Data Publications [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7464120?locale=en

Data from: A Novel Curated Scholarly Graph Connecting Textual and Data Publications

Explore at:

unknown(349944309)Available download formats

Dataset updated

May 31, 2024

Dataset authored and provided by

Zenodohttp://zenodo.org/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains an open and curated scholarly graph we built as a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks. This graph represents the European Marine Science community included in the OpenAIRE Graph. The nodes of the graph we release represent publications, datasets, software, and authors respectively; edges interconnecting research products always have the publication as source, and the dataset/software as target. In addition, edges are labeled with semantics that outline whether the publication is referencing, citing, documenting, or supplementing the related outcome. To curate and enrich nodes metadata and edges semantics, we relied on the information extracted from the PDF of the publications and the datasets/software webpages respectively. We curated the authors so to remove duplicated nodes representing the same person. The resource we release counts 4,047 publications, 5,488 datasets, 22 software, 21,561 authors, and 9,692 edges connect publications to datasets/software. This graph is in the curated_MES folder. We provide this resource as: a property graph: we provide the dump that can be imported in neo4j 5 jsonl files containing publications, datasets, software, authors, and relationships respectively. Each line of a jsonl file contains a JSON object representing a node and contains the metadata of that node (or a relationship). We provide two additional scholarly graphs: The curated MES graph with the removed edges. During the curation we removed some edges since they were labeled with an inconsistent or imprecise semantics. This graph includes the same nodes and edges as the previous one, and, in addition, it contains the edges removed during the curation pipeline; these edges are marked as Removed. This graph is in the curated_MES_with_removed_semantics folder. The original MES community of OpenAIRE. It represents the MES community extracted from the OpenAIRE Research Graph. This graph has not been curated, and the metadata and semantics are those of the OpenAIRE Research Graph. This graph is in the original_MES_community folder.

Clear search

Close search

Google apps

Main menu

Data from: A Novel Curated Scholarly Graph Connecting Textual and Data...

Graphical representations of data from sediment cores collected in 2009...

Data from: Aspects of University Students' Graph Sense in a Virtual Learning...

Classes Knowledge Graph

Communication Graphs

email-EuAll: EU email communication network

email-Enron: Enron email network

wiki-Talk: Wikipedia Talk network

comm-f2f-Resistance: Dynamic Face-to-Face Interaction Networks

Graph-Based Social Media Data on Mental Health Topics

Web Graphs

RKD-Knowledge-Graph

Web Graphs (SNAP)

Berkeley-Stanford web graph

Google web graph

Data_Sheet_1_Toward a Taxonomy for Adaptive Data Visualization in Analytics...

Data from: Generalized Typed Attributed Graph Transformation Systems based...

graph

Data Set Knowledge Graph (DSKG)

Data from: Multimodal Learning on Graphs: Methods and Applications

Data from: Linear manifold modeling and graph estimation based on...

Data from: Grammar transformations of topographic feature type annotations...

Data from: [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...

Data from: Data related to Panzer: A Machine Learning Based Approach to...

The global Graph Analytics market size is USD 2522 million in 2024 and will...

A Citation Graph from OpenAlex (Works)

Data from: A Novel Curated Scholarly Graph Connecting Textual and Data Publications