9 datasets found
  1. Graphs for the paper Beyond Macrobenchmarks: Microbenchmark-based Graph...

    • zenodo.org
    application/gzip, bin +2
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Brugnara; Martin Brugnara; Matteo Lissandrini; Matteo Lissandrini; Yannis Velegrakis; Yannis Velegrakis (2025). Graphs for the paper Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation. [Dataset]. http://doi.org/10.5281/zenodo.15571202
    Explore at:
    application/gzip, json, bin, txtAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Brugnara; Martin Brugnara; Matteo Lissandrini; Matteo Lissandrini; Yannis Velegrakis; Yannis Velegrakis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Graph Data

    We disribute here the datasets used in the tests for the paper:

    «Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation.»
    by Lissandrini, Matteo; Brugnara, Martin; and Velegrakis, Yannis.
    In PVLDB, 12(4):390-403, 2018.

    From the official webpage: https://graphbenchmark.com/

    The original files where stored on Google Drive. Now going to be discontinued.

    The datasets used in the tests are stored in GraphSON format for the versions of the engines supporting Tinkerpop 3. System using Tinkerpop 2 support instead GraphSON 1.0. Our datasets can be easily converted to an updated or older version. For an example see our Docker image.

    The MiCo Dataset comes from the authors of GraMi
    For more details, you can read:

    «GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph»
    by Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis.
    In PVLDB, 7(7):517-528, 2014.

    The Yeast Dataset has been converted from the one transformed in Pajek format by V. Batagelj. The original dataset comes from

    «Topological structure analysis of the protein-protein interaction network in budding yeast»
    by Shiwei Sun, Lunjiang Ling, Nan Zhang, Guojie Li and Runsheng Chen.
    In Nucleic Acids Research, 2003, Vol. 31, No. 9 2443-2450

    Moreover you can read about the details of our Freebase ExQ datasets, or you can use our Docker image to generate the LDBC synthetic dataset.

    Details on file sizes

    NameFilesSize (bytes)Graph Size (Nodes/Edges)
    Yeastyeast.json
    yeast.json.gz
    1.5M
    180K
    2.3K / 7.1K
    MiComico.json
    mico.json.gz
    84M
    12M
    0.1M / 1.1M
    Frb-Ofreebase_org.json
    freebase_org.json.gz
    584M
    81M
    1.9M / 4.3M
    Frb-Sfreebase_small.json
    freebase_small.json.gz
    87M
    12M
    0.5M / 0.3M
    Frb-Mfreebase_medium.json
    freebase_medium.json.gz
    816M
    117M
    4M / 3.1M
    Frb-Lfreebase_large.json
    freebase_large.json.gz
    6.3G
    616M
    28.4M / 31.2M
    LDBCldbc.json
    ldbc.json.gz
    144M
    13M
    0.18M / 1.5M
  2. JSON export from a Neo4j Graph database experimental data for bird...

    • figshare.com
    json
    Updated Mar 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Anderson; Brian Wee (2021). JSON export from a Neo4j Graph database experimental data for bird conservation planning [Dataset]. http://doi.org/10.6084/m9.figshare.14200058.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Mar 11, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Scott Anderson; Brian Wee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structured data characterizing selected avian conservation aspects of North Carolina's Wildlife Action Plans were already encoded in a Semantic MediaWiki database (http://wiki.ncpif.org/). That database was created, and is maintained by, the North Carolina Partners in Flight (NC PIF) program, which is a program of the North Carolina Wildlife Resources Commission. The NC PIF wiki database was ported into a Neo4j labeled property graph database for an experiment in linking avian species, organizations, geographies, and management plans. This JSON file is an export from that Neo4j database.

  3. Hadith Project

    • kaggle.com
    zip
    Updated Nov 10, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeeshan-ul-hassan Usmani (2017). Hadith Project [Dataset]. https://www.kaggle.com/zusmani/hadithsahibukhari
    Explore at:
    zip(1890240 bytes)Available download formats
    Dataset updated
    Nov 10, 2017
    Authors
    Zeeshan-ul-hassan Usmani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    A Hadith is a report describing the words, actions, intentions or habits of the last Prophet and Messenger Muhammed (Peace Be Upon Him). The term literally means report, account or narrative.

    Ṣaḥīḥ al-Bukhārī (صحيح البخاري‎‎), is one of the six major hadith collections books. It was collected by a Muslim scholar Imam Muhammad al-Bukhari, after being transmitted orally for generations. There are 7,658 full hadiths in this collection narrated by 1,755 narrators/transmitters.

    Imam Bukhari finished his work around 846 A.D.

    Content

    The two main sources of data regarding hadith are works of hadith and works containing biographical information about hadith narrators. The dataset contains 7,658 hadiths in Arabic and the names of 1,755 transmitters. Imam Bukhari followed the following criterion to include a hadith in this book.

    1. Quality and soundness of chain of narrators - the lifetime of a narrator should overlap with the lifetime of the authority from whom he narrates.

    2. Verifiable - it should be verifiable that narrators have met with their source persons. They should also expressly state that they obtained the narrative from these authorities.

    3. Piousness – he only accepted the narratives from only those who, according to his knowledge, not only believed in Islam but practiced its teachings.

    Acknowledgements

    More information on Hadith and Sahih Bukhari can be found from this link - Hadith Books

    Inspiration

    Here are some ideas worth exploring:

    1. The traditional criteria for determining if a hadith is Sahih (authentic) requires that there should be an uninterrupted chain of narrators; that all those narrators should be highly reliable and there should not be any hidden defects. Can we make a social network graph of all the narrators and then timestamp it with their age and era to see who overlaps who?

    2. The name of a transmitter mentioned in a given hadith is not the full name, and many transmitters have similar names. So identifying who is the transmitter of a given hadith based on the names mentioned in the text might be a good problem to tackle

    3. Can we analyze the chains of transmitters for entire collections using Neo4j or some other graph database

    4. There exist different chains that reports the same hadith with little variation of words, can you identify those

    5. Can you link the text with other external data sources?

    6. Can we produce the word cloud for each chapter of the book?

    7. Can we train a neural network to authenticate if the hadith is real or not?

    8. Can we find out the specific style or vocabulary of each narrator?

    9. Can we develop a system for comparing variant wordings for the same hadith to identify how reliable a given transmitter is.

    Please also help me extend this dataset. If you have any other hadith book in CSV or text format, please send me a message and I will add.

  4. y

    Yago 3

    • yago-knowledge.org
    Updated Jul 3, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farzaneh Mahdisoltani; Joanna Biega; Fabian Suchanek (2011). Yago 3 [Dataset]. https://yago-knowledge.org/
    Explore at:
    Dataset updated
    Jul 3, 2011
    Authors
    Farzaneh Mahdisoltani; Joanna Biega; Fabian Suchanek
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    YAGO 3 combines the information from the Wikipedias in multiple languages with WordNet, GeoNames, and other data sources. YAGO 3 taps into multilingual resources of Wikipedia, getting to know more local entities and facts. This version has been extracted from 10 different Wikipedia versions (English, German, French, Dutch, Italian, Spanish, Polish, Romanian, Persian, and Arabic). YAGO 3 is special in several ways: * YAGO 3 combines the clean taxonomy of WordNet with the richness of the Wikipedia category system, assigning the entities to more than 350,000 classes. * YAGO 3 is anchored in time and space. YAGO attaches a temporal dimension and a spatial dimension to many of its facts and entities. * In addition to taxonomy, YAGO has thematic domains such as “music” or “science” from WordNet Domains. * YAGO 3 extracts and combines entities and facts from 10 Wikipedias in different languages. * YAGO 3 contains canonical representations of entities appearing in different Wikipedia language editions. * YAGO 3 integrates all non-English entities into the rich type taxonomy of YAGO. * YAGO 3 provides a mapping between non-English infobox attributes and YAGO relations. YAGO 3 knows more than 17 million entities (like persons, organizations, cities, etc.) and contains more than 150 million facts about these entities. As with all major releases, the accuracy of YAGO 3 has been manually evaluated, proving a confirmed accuracy of 95%. Every relation is annotated with its confidence value.

  5. y

    Yago 4

    • yago-knowledge.org
    Updated Jul 3, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Pellissier Tanon; Gerhard Weikum; Fabian Suchanek (2011). Yago 4 [Dataset]. https://yago-knowledge.org/
    Explore at:
    Dataset updated
    Jul 3, 2011
    Authors
    Thomas Pellissier Tanon; Gerhard Weikum; Fabian Suchanek
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    YAGO 4 is a version of the YAGO knowledge base that is based on Wikidata — the largest public general-purpose knowledge base. YAGO refines the data as follows: * All entity identifiers and property identifiers are human-readable. * The top-level classes come from schema.org — a standard repertoire of classes and properties maintained by Google and others, combined with bioschemas.org. The lower level classes are a selection of Wikidata classes. * The properties come from schema.org. * YAGO 4 contains semantic constraints in the form of SHACL. These constraints keep the data clean, and allow for logical reasoning on YAGO. YAGO contains more than 50 million entities and 2 billion facts.

  6. y

    Data from: Yago 2

    • yago-knowledge.org
    Updated Jul 3, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Hoffart; Fabian Suchanek; Klaus Berberich; Gerhard Weikum (2011). Yago 2 [Dataset]. https://yago-knowledge.org/
    Explore at:
    Dataset updated
    Jul 3, 2011
    Authors
    Johannes Hoffart; Fabian Suchanek; Klaus Berberich; Gerhard Weikum
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    YAGO 2 is an improved version of the original YAGO knowledge base: * YAGO 2 is anchored in time and space. YAGO 2 attaches a temporal dimension and a spacial dimension to many of its facts and entities. * YAGO 2 is particularly suited for disambiguation purposes, as it contains a large number of names for entities. It also knows the gender of people. * As all major releases, the accuracy of YAGO 2 has been manually evaluated, proving an accuracy of 95% with respect to Wikipedia. Every relation is annotated with its confidence value.

  7. All Subreddits and Relations between them

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). All Subreddits and Relations between them [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-subreddits-and-relations-between-them
    Explore at:
    zip(17648915 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    The Devastator
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Reddit Graph Dataset

    This dataset aims to build a graph of subreddit links based on how they reference each other. The original database dump can be found here.

    Subreddits Columns

    • name (str): name of the subreddit.
      • between 2 and 21 characters (lowercase letters, digits and underscores).
    • type (str): type of the subreddit.
    • title (str): title of the subreddit
    • description (str): short description of the subreddit.
    • subscribers (int?): amount of subscribers at the moment.
    • nsfw (bool?): indicator if its flaged as not safe for work 🔞.
    • quarantined (bool?): indicator if it has been quarantined 😷.
    • color (str): key color of the subreddit.
    • img_banner (str?): url of the image used as the banner.
    • img_icon (str?): url of the image used as the icon (snoo).
    • created_at (datetime): utc timestamp of when the subreddit was created.
    • updated_at (datetime): utc timestamp of when the information of the subreddit was last updated.

    note: the '?' indicates that the value can be null under certain conditions.

    Subreddits Stats

    TYPEAMOUNT
    TOTAL127800
    public59227
    banned31473
    restricted14601
    public [nsfw]14244
    private5139
    restricted [nsfw]3014
    public [quarantined]29
    restricted [quarantined]21
    archived17
    premium12
    public [nsfw] [quarantined]11
    user [nsfw]6
    user4
    restricted [nsfw] [quarantined]1
    employees1

    Links Columns

    • source (str): name of the subreddit where the link was found.
    • target (str): name of the linked subreddit.
    • type (str): place where the reference from source to target was found.
    • updated_at (datetime): utc timestamp of when the information the link was last updated.

    Subreddits Stats

    TYPEAMOUNT
    TOTAL349744
    wiki214206
    sidebar123650
    topbar7291
    description4597
  8. y

    Yago 1

    • yago-knowledge.org
    Updated Jul 3, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Suchanek; Gjergji Kasneci; Gerhard Weikum (2011). Yago 1 [Dataset]. https://yago-knowledge.org/
    Explore at:
    Dataset updated
    Jul 3, 2011
    Authors
    Fabian Suchanek; Gjergji Kasneci; Gerhard Weikum
    License

    https://www.gnu.org/copyleft/fdl.htmlhttps://www.gnu.org/copyleft/fdl.html

    Description

    This is the 2008 version of YAGO. It knows more than 2 million entities (like persons, organizations, cities, etc.). It knows 20 million facts about these entities. This version of YAGO includes the data extracted from the categories and infoboxes of Wikipedia, combined with the taxonomy of WordNet. YAGO 1 was manually evaluated, and found to have an accuracy of 95% with respect to the extraction source.

  9. SynthCypher

    • huggingface.co
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ServiceNow-AI (2024). SynthCypher [Dataset]. https://huggingface.co/datasets/ServiceNow-AI/SynthCypher
    Explore at:
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    ServiceNowhttp://servicenow.com/
    Authors
    ServiceNow-AI
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    SynthCypher Dataset Repository

      Overview
    

    This repository hosts SynthCypher, a novel synthetic dataset designed to bridge the gap in Text-to-Cypher (Text2Cypher) tasks. SynthCypher leverages state-of-the-art large language models (LLMs) to automatically generate and validate high-quality data for training and evaluating models that convert natural language questions into Cypher queries for graph databases like Neo4j. Our dataset and pipeline contribute significantly to… See the full description on the dataset page: https://huggingface.co/datasets/ServiceNow-AI/SynthCypher.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Brugnara; Martin Brugnara; Matteo Lissandrini; Matteo Lissandrini; Yannis Velegrakis; Yannis Velegrakis (2025). Graphs for the paper Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation. [Dataset]. http://doi.org/10.5281/zenodo.15571202
Organization logo

Graphs for the paper Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation.

Explore at:
application/gzip, json, bin, txtAvailable download formats
Dataset updated
Jun 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martin Brugnara; Martin Brugnara; Matteo Lissandrini; Matteo Lissandrini; Yannis Velegrakis; Yannis Velegrakis
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Graph Data

We disribute here the datasets used in the tests for the paper:

«Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation.»
by Lissandrini, Matteo; Brugnara, Martin; and Velegrakis, Yannis.
In PVLDB, 12(4):390-403, 2018.

From the official webpage: https://graphbenchmark.com/

The original files where stored on Google Drive. Now going to be discontinued.

The datasets used in the tests are stored in GraphSON format for the versions of the engines supporting Tinkerpop 3. System using Tinkerpop 2 support instead GraphSON 1.0. Our datasets can be easily converted to an updated or older version. For an example see our Docker image.

The MiCo Dataset comes from the authors of GraMi
For more details, you can read:

«GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph»
by Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis.
In PVLDB, 7(7):517-528, 2014.

The Yeast Dataset has been converted from the one transformed in Pajek format by V. Batagelj. The original dataset comes from

«Topological structure analysis of the protein-protein interaction network in budding yeast»
by Shiwei Sun, Lunjiang Ling, Nan Zhang, Guojie Li and Runsheng Chen.
In Nucleic Acids Research, 2003, Vol. 31, No. 9 2443-2450

Moreover you can read about the details of our Freebase ExQ datasets, or you can use our Docker image to generate the LDBC synthetic dataset.

Details on file sizes

NameFilesSize (bytes)Graph Size (Nodes/Edges)
Yeastyeast.json
yeast.json.gz
1.5M
180K
2.3K / 7.1K
MiComico.json
mico.json.gz
84M
12M
0.1M / 1.1M
Frb-Ofreebase_org.json
freebase_org.json.gz
584M
81M
1.9M / 4.3M
Frb-Sfreebase_small.json
freebase_small.json.gz
87M
12M
0.5M / 0.3M
Frb-Mfreebase_medium.json
freebase_medium.json.gz
816M
117M
4M / 3.1M
Frb-Lfreebase_large.json
freebase_large.json.gz
6.3G
616M
28.4M / 31.2M
LDBCldbc.json
ldbc.json.gz
144M
13M
0.18M / 1.5M
Search
Clear search
Close search
Google apps
Main menu