5 datasets found

TrendyGenes, a computational pipeline for the detection of literature trends...
zenodo.org
application/gzip, txt
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Narganes-Carlon; David Narganes-Carlon (2023). TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery [Dataset]. http://doi.org/10.1038/s41598-021-94897-9
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.1038/s41598-021-94897-9
Dataset updated
Sep 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Narganes-Carlon; David Narganes-Carlon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TrendyGenes Literature Mining

This repository contains the files and code to build the TrendyGenes pipeline described in the paper "TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery" (Serrano Nájera et al. 2021).

Contents

The folder contains the following files:

PubMed_*.csv.gz: CSV files containing PubMed metadata (titles, abstracts etc.) split into multiple files

CoCitations*.csv.gz: CSV files containing co-citation networks computed from PubMed

MeSH2PMID.csv.gz: Map of MeSH terms to PMIDs

Authorship_Neo4J_complete.csv.gz: Authorship information for PubMed papers

Disease2PMID_Neo4J_complete.csv.gz: Map of disease terms to PMIDs after disambiguation

Genes_Neo4J_complete_CCPU.csv.gz: Map of genes to PMIDs after disambiguation

genes.csv.gz: List of human genes

diseases.csv.gz: List of MeSH disease terms

import_command*.txt: Commands to import data into Neo4j graph database

Building the Knowledge Graph

The various CSV files can be imported into a Neo4j graph database to build the knowledge graph containing publications, authors, genes, diseases etc. and their connections as described in the paper.

The import_command*.txt files contain the Neo4J bulk import syntax needed to import the data into Neo4j:
https://neo4j.com/developer/guide-import-csv/

Citation

Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Scientific Reports. 2021 Aug 3;11(1):15747.

License

[MIT]

This summarizes the key files provided and briefly explains how they can be used to build the knowledge graph database for the TrendyGenes pipeline. The citation provides a reference to the original paper.
Z
Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...
data.niaid.nih.gov
zenodo.org
Updated Aug 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bener, Ayse Basar (2024). Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and KDE [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_400614
Explore at:
Dataset updated
Aug 3, 2024
Dataset provided by
Miranskyy, Andriy V.
Bener, Ayse Basar
Sadat, Mefta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present three defect rediscovery datasets mined from Bugzilla. The datasets capture data for three groups of open source software projects: Apache, Eclipse, and KDE. The datasets contain information about approximately 914 thousands of defect reports over a period of 18 years (1999-2017) to capture the inter-relationships among duplicate defects.

File Descriptions

apache.csv - Apache Defect Rediscovery dataset

eclipse.csv - Eclipse Defect Rediscovery dataset

kde.csv - KDE Defect Rediscovery dataset

apache.relations.csv - Inter-relations of rediscovered defects of Apache

eclipse.relations.csv - Inter-relations of rediscovered defects of Eclipse

kde.relations.csv - Inter-relations of rediscovered defects of KDE

create_and_populate_neo4j_objects.cypher - Populates Neo4j graphDB by importing all the data from the CSV files. Note that you have to set dbms.import.csv.legacy_quote_escaping configuration setting to false to load the CSV files as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_dbms.import.csv.legacy_quote_escaping

create_and_populate_mysql_objects.sql - Populates MySQL RDBMS by importing all the data from the CSV files

rediscovery_db_mysql.zip - For your convenience, we also provide full backup of the MySQL database

neo4j_examples.txt - Sample Neo4j queries

mysql_examples.txt - Sample MySQL queries

rediscovery_eclipse_6325.png - Output of Neo4j example #1

distinct_attrs.csv - Distinct values of bug_status, resolution, priority, severity for each project
d
Desarquivo - dataset 04 grafo de ligações entre entidades Neo4j
dados.gov.pt
data.europa.eu
zip
Updated Aug 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miguel Sozinho Ramalho (2021). Desarquivo - dataset 04 grafo de ligações entre entidades Neo4j [Dataset]. https://dados.gov.pt/en/datasets/612e5460078190eed7ba36d1/
Explore at:
zipAvailable download formats
Dataset updated
Aug 31, 2021
Authors
Miguel Sozinho Ramalho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Grafo de ligações entre entidades e notícias (neste caso não foi preparado o comando com o neo4j-import mas aconselha-se esse face à opção LOAD CSV para datasets grandes) os dados são os mesmos do dataset 03 b mas, ao importar, são reorganizados de outra forma gerando um nó no grafo para cada notícia. instruções de importação para neo4j: USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///people.csv' AS row MERGE (e:PER {_id: row._id, text: row.text}); USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///orgs.csv' AS row MERGE (e:ORG {_id: row._id, text: row.text}); USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///locations.csv' AS row MERGE (e:LOC {_id: row._id, text: row.text}); USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///misc.csv' AS row MERGE (e:MISC {_id: row._id, text: row.text}); USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///news.csv' AS row MERGE (n:NEWS {_id: row._id, title: row.title}); USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///connections_1.csv' AS row MERGE (e1 {_id: row._id1}) MERGE (e2 {_id: row._id2}) WITH row, e1, e2 MERGE (e1)-[:rel{weight: toInteger(row.weight)}]-(e2); Para mais informações ver: https://github.com/msramalho/desarquivo/blob/master/DATASETS.md
e
Desarquivo - dataset 03 grafo de ligações entre entidades Neo4j
data.europa.eu
dados.gov.pt
csv
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Desarquivo - dataset 03 grafo de ligações entre entidades Neo4j [Dataset]. https://data.europa.eu/data/datasets/desarquivo-dataset-03-grafo-de-ligacoes-entre-entidades-neo4j?locale=fi
Explore at:
csvAvailable download formats
Dataset updated
Apr 30, 2025
Description
Grafo de ligações entre entidades usadas na versão atual do desarquivo disponível em https://msramalho.github.io/desarquivo/

Para importar os dados usar neo4j-admin import --id-type=STRING --nodes=import/i_entities.csv --relationships=rel=import/i_connections.csv

Para mais informações consultar: https://github.com/msramalho/desarquivo/blob/master/DATASETS.md
MS174 - China Human Trafficking and Slaving Database Project Graph Dataset...
zenodo.org
bin, csv, svg
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claude Chevaleyre; Claude Chevaleyre (2025). MS174 - China Human Trafficking and Slaving Database Project Graph Dataset 01 [Dataset]. http://doi.org/10.5281/zenodo.15648432
Explore at:
csv, bin, svgAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15648432
Dataset updated
Jun 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Claude Chevaleyre; Claude Chevaleyre
Area covered
China
Description
This MS174 dataset is the first dataset made public by the China Human Trafficking and Slaving graph Database project (CHTSDB). CHTSDB is based on a versatile action-centric model and is implemented in a graph database structure. For an overview of the project, please have a look at the README.md file.

The project is also publicly available on Github.

It is the result of an exploration of the first 174 rolls of the official Annals of the Ming Dynasty (the Mingshi 明史). It is based on the edition of the History of the Ming published by Wikisource under CC BY-SA 4.0 license. A very state-centric source focusing on the higher social strata and with little interest in recording the lived experiences of the common people, the Annals of the Ming Dynasty are probably the worst source one could think of to start the CHTSDB project. This first exploration nonetheless yielded an interesting result, shedding light on the extended scope and enduring presence of war capture under the Ming. Providing very little numerical data, this first dataset still allows us to provide a first estimate of 150,000 captives, which in all likelihood are only the tip of the iceberg.

This dataset contains the following:

The six csv files of the MS174 dataset.

The data model description: a detailed description of the graph data structure (labels, property keys, and in some cases list of values). It is our implementation of the GRAM data model.

An import_instructions.md file explaining how to import the dataset into a neo4j desktop instance.

A README.md file providing an overview of the project, its history, conceptual underpinnings, challenges.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

David Narganes-Carlon; David Narganes-Carlon (2023). TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery [Dataset]. http://doi.org/10.1038/s41598-021-94897-9

TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

application/gzip, txtAvailable download formats

Unique identifier

https://doi.org/10.1038/s41598-021-94897-9

Dataset updated

Sep 20, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

David Narganes-Carlon; David Narganes-Carlon

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

TrendyGenes Literature Mining

This repository contains the files and code to build the TrendyGenes pipeline described in the paper "TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery" (Serrano Nájera et al. 2021).

Contents

The folder contains the following files:

PubMed_*.csv.gz: CSV files containing PubMed metadata (titles, abstracts etc.) split into multiple files
CoCitations*.csv.gz: CSV files containing co-citation networks computed from PubMed
MeSH2PMID.csv.gz: Map of MeSH terms to PMIDs
Authorship_Neo4J_complete.csv.gz: Authorship information for PubMed papers
Disease2PMID_Neo4J_complete.csv.gz: Map of disease terms to PMIDs after disambiguation
Genes_Neo4J_complete_CCPU.csv.gz: Map of genes to PMIDs after disambiguation
genes.csv.gz: List of human genes
diseases.csv.gz: List of MeSH disease terms
import_command*.txt: Commands to import data into Neo4j graph database

Building the Knowledge Graph

The various CSV files can be imported into a Neo4j graph database to build the knowledge graph containing publications, authors, genes, diseases etc. and their connections as described in the paper.

The import_command*.txt files contain the Neo4J bulk import syntax needed to import the data into Neo4j:
https://neo4j.com/developer/guide-import-csv/

Citation

Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Scientific Reports. 2021 Aug 3;11(1):15747.

License

[MIT]

This summarizes the key files provided and briefly explains how they can be used to build the knowledge graph database for the TrendyGenes pipeline. The citation provides a reference to the original paper.

Clear search

Close search

Google apps

Main menu

TrendyGenes, a computational pipeline for the detection of literature trends...

Rediscovery Datasets: Connecting Duplicate Reports of Apache, Eclipse, and...

Desarquivo - dataset 04 grafo de ligações entre entidades Neo4j

Desarquivo - dataset 03 grafo de ligações entre entidades Neo4j

MS174 - China Human Trafficking and Slaving Database Project Graph Dataset...

TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery