100+ datasets found

Supplementary material to manuscript: Analyzing data citation practices to...
figshare.com
xlsx
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Robinson-garcia; Evaristo Jiménez Contreras; Daniel Torres-Salinas (2016). Supplementary material to manuscript: Analyzing data citation practices to the Data Citation Index [Dataset]. http://doi.org/10.6084/m9.figshare.1250031.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1250031.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Nicolas Robinson-garcia; Evaristo Jiménez Contreras; Daniel Torres-Salinas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary material to an analysis on data citation practices based on the Data Citation Index from Thomson Reuters. This database launched in 2012 aims to link data sets and data studies with citation received from the rest of their citation indexes. Funding bodies and research organizations are increasingly demanding the need of researchers to make their scientific data available in a reusable and reproducible manner, aiming to maximize the allocation of funding while providing transparency on the scientific process. The DCI harvests citations to research data from papers indexed in the Web of Knowledge. It relies on the information provided by the data repository as data citation practices are inconsistent or inexistent in many cases. The findings of this study show that data citation practices are far from common in most research fields.. Some differences have been reported on the way researchers cite data: while in the areas of Science and Engineering & Technology data sets were the most cited, in Social Sciences and Arts & Humanities data studies play a greater role. 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. While data citation practices are rare in most fields, they have expanded in disciplines such as Crystallography or Genomics. We conclude by emphasizing the role the DCI may play to encourage consistent and standardized citation of research data which will allow considering its use on following the research process developed by researchers, from data collection to publication.
m
Bibliographic data on datasets (from 2020) affiliated to Most Wiedzy and...
mostwiedzy.pl
csv
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olejnik Dorota (2021). Bibliographic data on datasets (from 2020) affiliated to Most Wiedzy and indexed in Data Citation Index (retrieved by Web of Science service in December 2021) [Dataset]. http://doi.org/10.34808/hmh1-n520
Explore at:
csv(18226)Available download formats
Unique identifier
https://doi.org/10.34808/hmh1-n520
Dataset updated
Dec 1, 2021
Authors
Olejnik Dorota
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The file contains the number of datasets published by the researchers affiliated to Most Wiedzy and indexed in Data Citation Index by Web of Science. The Search was perfprmed using the name of institution in the 'assress' filed or 'group author' field. Data retrieved and published during the 5th Open Science Conference (1-3.12.2021).
d
August 2025 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2025). August 2025 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.8
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.8
Dataset updated
Sep 19, 2025
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Z
Datasets indexed in Data Citation Index in the Astronomy and Astrophysics...
data.niaid.nih.gov
Updated Sep 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cortés Rodríguez, Patricio; Depoortere, Denise; Opazo Calfin, Lucy (2021). Datasets indexed in Data Citation Index in the Astronomy and Astrophysics category, 2010-2019 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5518171
Explore at:
Dataset updated
Sep 21, 2021
Dataset provided by
Bibliotecas, Pontificia Universidad Católica de Chile, Santiago, Chile
Authors
Cortés Rodríguez, Patricio; Depoortere, Denise; Opazo Calfin, Lucy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset comprises a single list of datasets exported from Data Citation Index (Web of Science, Clarivate Analytics) in the Astronomy and Astrophysics category, for the period 2010 - 2019, allowing to identify annual evolution, countries and institutions with higher productivity, main repositories and hosting platforms, use in publications indexed in Web of Science.
Countries and universities rankings of their research output according to...
figshare.com
xlsx
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Robinson-garcia; Daniel Torres-Salinas (2016). Countries and universities rankings of their research output according to Thomson Reuters' citation indexes. 2010-2014 [Dataset]. http://doi.org/10.6084/m9.figshare.1287652.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1287652.v3
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Nicolas Robinson-garcia; Daniel Torres-Salinas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research output of countries and Spanish universities for the 2010-2014 period according to Thomson Reuters' citation indexes: SCI, SSCI, A&HCI, CPCI, BKCI, DCI
Papers Citations VS H-index
kaggle.com
zip
Updated Oct 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Papers Citations VS H-index [Dataset]. https://www.kaggle.com/datasets/thedevastator/cs-researchers-h-index-and-citation-analysis
Explore at:
zip(12787 bytes)Available download formats
Dataset updated
Oct 27, 2022
Authors
The Devastator
Description
CS Researchers H-index and Citation Analysis

A New Method to Evaluate Impact

About this dataset

This dataset provides information on the H-index and citations of computer science researchers. The H-index is a measure of a researcher's productivity and impact. The higher the H-index, the more productive and influential the researcher is. Citations are another way of measuring a researcher's impact. The more citations a researcher has, the more other researchers have cited their work. This dataset can be used to compare the productivity and impact of computer science researchers

How to use the dataset

To use this dataset, simply download it and import it into your favorite statistical software. Then, you can begin to analyze the data in order to answer any questions that you may have about computer science researchers and their impact

Research Ideas

Evaluating the impact of computer science researchers

Identifying areas of research that are highly cited

Identifying computer science researchers with high h-index scores

Columns

File: data.csv | Column name | Description | |:------------------------|:-----------------------------------------------------------------| | Name | The name of the researcher. (String) | | Citations 2020 | The number of citations the researcher has in 2020. (Integer) | | Total_citation | The total number of citations the researcher has. (Integer) | | Citation_since_2016 | The number of citations the researcher has since 2016. (Integer) | | HomePage | The researcher's home page. (String) | | Area of Research | The researcher's area of research. (String) | | Google_Scholar | The researcher's Google Scholar page. (String) |
OpenCitations Index N-Triples dataset of all the citation data
figshare.com
zip
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2025). OpenCitations Index N-Triples dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.24369136.v6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24369136.v6
Dataset updated
Jul 15, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in N-Triples format) included in the OpenCitations Index, released on July 10, 2025. In particular, any citation in the dataset, defined as an individual of the class cito:Citation, includes the following information:[citation IRI] the Open Citation Identifier (OCI) for the citation, defined in the final part of the URL identifying the citation (https://w3id.org/oc/index/ci/[OCI]);[property "cito:hasCitingEntity"] the citing entity identified by its OMID URL (https://https://opencitations.net/meta/[OMID]);[property "cito:hasCitedEntity"] the cited entity identified by its OMID URL (https://https://opencitations.net/meta/[OMID]);[property "cito:hasCitationCreationDate"] the creation date of the citation (i.e. the publication date of the citing entity);[property "cito:hasCitationTimeSpan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity);[type "cito:JournalSelfCitation"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal);[type "cito:AuthorSelfCitation"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).Note: the information for each citation is sourced from OpenCitations Meta (https://opencitations.net/meta), a database that stores and delivers bibliographic metadata for all bibliographic resources included in the OpenCitations Indexes. The data provided in this dump is therefore based on the state of OpenCitations Meta at the time this collection was generated.This version of the dataset contains:2,216,426,689 citationsThe size of the zipped archive is 87.4 GB, while the size of the unzipped N-Triples files is 2.1 TB.
POCI CSV dataset of all the citation data
figshare.com
zip
Updated Dec 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21776351.v1
Dataset updated
Dec 27, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

This version of the dataset contains:

717,654,703 citations; 26,024,862 bibliographic resources.

The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.
m
Bibliographic data on datasets affiliated to Most Wiedzy and indexed in Data...
mostwiedzy.pl
csv
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lidia Trzcinska (2021). Bibliographic data on datasets affiliated to Most Wiedzy and indexed in Data Citation Index (retrieved by Web of Science service in December 2021) [Dataset]. http://doi.org/10.34808/tkt8-m428
Explore at:
csv(453865)Available download formats
Unique identifier
https://doi.org/10.34808/tkt8-m428
Dataset updated
Dec 1, 2021
Authors
Lidia Trzcinska
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The file contains the number of datasetes published by the reserchers affiliated to Most Wiedzy and indexed in Data Citation Index provided by Web of Science. The Search was performed using the name of institution in the 'address' filed or 'group author' filed . Data retrieved and published during the '5th Open Science Conference (1-3.12.2021).
Z
Citation network data sets for 'Oxytocin – a social peptide? Deconstructing...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jun 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leng, Rhodri Ivor (2022). Citation network data sets for 'Oxytocin – a social peptide? Deconstructing the evidence' [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5578956
Explore at:
Dataset updated
Jun 5, 2022
Dataset provided by
University of Edinburgh
Authors
Leng, Rhodri Ivor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

This note describes the data sets used for all analyses contained in the manuscript 'Oxytocin - a social peptide?’[1] that is currently under review.

Data Collection

The data sets described here were originally retrieved from Web of Science (WoS) Core Collection via the University of Edinburgh’s library subscription [2]. The aim of the original study for which these data were gathered was to survey peer-reviewed primary studies on oxytocin and social behaviour. To capture relevant papers, we used the following query:

TI = (“oxytocin” OR “pitocin” OR “syntocinon”) AND TS = (“social*” OR “pro$social” OR “anti$social”)

The final search was performed on the 13 September 2021. This returned a total of 2,747 records, of which 2,049 were classified by WoS as ‘articles’. Given our interest in primary studies only – articles reporting original data – we excluded all other document types. We further excluded all articles sub-classified as ‘book chapters’ or as ‘proceeding papers’ in order to limit our analysis to primary studies published in peer-reviewed academic journals. This reduced the set to 1,977 articles. All of these were published in the English language, and no further language refinements were unnecessary.

All available metadata on these 1,977 articles was exported as plain text ‘flat’ format files in four batches, which we later merged together via Notepad++. Upon manually examination, we discovered examples of papers classified as ‘articles’ by WoS that were, in fact, reviews. To further filter our results, we searched all available PMIDs in PubMed (1,903 had associated PMIDs - ~96% of set). We then filtered results to identify all records classified as ‘review’, ‘systematic review’, or ‘meta-analysis’, identifying 75 records 3. After examining a sample and agreeing with the PubMed classification, these were removed these from our dataset - leaving a total of 1,902 articles.

From these data, we constructed two datasets via parsing out relevant reference data via the Sci2 Tool [4]. First, we constructed a ‘node-attribute-list’ by first linking unique reference strings (‘Cite Me As’ column in WoS data files) to unique identifiers, we then parsed into this dataset information on the identify of a paper, including the title of the article, all authors, journal publication, year of publication, total citations as recorded from WoS, and WoS accession number. Second, we constructed an ‘edge-list’ that records the citations from a citing paper in the ‘Source’ column and identifies the cited paper in the ‘Target’ column, using the unique identifies as described previously to link these data to the node-attribute-list.

We then constructed a network in which papers are nodes, and citation links between nodes are directed edges between nodes. We used Gephi Version 0.9.2 [5] to manually clean these data by merging duplicate references that are caused by different reference formats or by referencing errors. To do this, we needed to retain both all retrieved records (1,902) as well as including all of their references to papers whether these were included in our original search or not. In total, this produced a network of 46,633 nodes (unique reference strings) and 112,520 edges (citation links). Thus, the average reference list size of these articles is ~59 references. The mean indegree (within network citations) is 2.4 (median is 1) for the entire network reflecting a great diversity in referencing choices among our 1,902 articles.

After merging duplicates, we then restricted the network to include only articles fully retrieved (1,902), and retrained only those that were connected together by citations links in a large interconnected network (i.e. the largest component). In total, 1,892 (99.5%) of our initial set were connected together via citation links, meaning a total of ten papers were removed from the following analysis – and these were neither connected to the largest component, nor did they form connections with one another (i.e. these were ‘isolates’).

This left us with a network of 1,892 nodes connected together by 26,019 edges. It is this network that is described by the ‘node-attribute-list’ and ‘edge-list’ provided here. This network has a mean in-degree of 13.76 (median in-degree of 4). By restricting our analysis in this way, we lose 44,741 unique references (96%) and 86,501 citations (77%) from the full network, but retain a set of articles tightly knitted together, all of which have been fully retrieved due to possessing certain terms related to oxytocin AND social behaviour in their title, abstract, or associated keywords.

Before moving on, we calculated indegree for all nodes in this network – this counts the number of citations to a given paper from other papers within this network – and have included this in the node-attribute-list. We further clustered this network via modularity maximisation via the Leiden algorithm [6]. We set the algorithm to resolution 1, and allowed the algorithm to run over 100 iterations and 100 restarts. This gave Q=0.43 and identified seven clusters, which we describe in detail within the body of the paper. We have included cluster membership as an attribute in the node-attribute-list.

Data description

We include here two datasets: (i) ‘OTSOC-node-attribute-list.csv’ consists of the attributes of 1,892 primary articles retrieved from WoS that include terms indicating a focus on oxytocin and social behaviour; (ii) ‘OTSOC-edge-list.csv’ records the citations between these papers. Together, these can be imported into a range of different software for network analysis; however, we have formatted these for ease of upload into Gephi 0.9.2. Below, we detail their contents:

‘OTSOC-node-attribute-list.csv’ is a comma-separate values file that contains all node attributes for the citation network (n=1,892) analysed in the paper. The columns refer to:

Id, the unique identifier

Label, the reference string of the paper to which the attributes in this row correspond. This is taken from the ‘Cite Me As’ column from the original WoS download. The reference string is in the following format: last name of first author, publication year, journal, volume, start page, and DOI (if available).

Wos_id, unique Web of Science (WoS) accession number. These can be used to query WoS to find further data on all papers via the ‘UT= ’ field tag.

Title, paper title.

Authors, all named authors.

Journal, journal of publication.

Pub_year, year of publication.

Wos_citations, total number of citations recorded by WoS Core Collection to a given paper as of 13 September 2021

Indegree, the number of within network citations to a given paper, calculated for the network shown in Figure 1 of the manuscript.

Cluster, provides the cluster membership number as discussed within the manuscript (Figure 1). This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.43|7 clusters)

‘OTSOC-edge -list.csv’ is a comma-separate values file that contains all citation links between the 1,892 articles (n=26,019). The columns refer to:

Source, the unique identifier of the citing paper.

Target, the unique identifier of the cited paper.

Type, edges are ‘Directed’, and this column tells Gephi to regard all edges as such.

Syr_date, this contains the date of publication of the citing paper.

Tyr_date, this contains the date of publication of the cited paper.

Software recommended for analysis

Gephi version 0.9.2 was used for the visualisations within the manuscript, and both files can be read and into Gephi without modification.

Notes

[1] Leng, G., Leng, R. I., Ludwig, M. (Submitted). Oxytocin – a social peptide? Deconstructing the evidence.

[2] Edinburgh University’s subscription to Web of Science covers the following databases: (i) Science Citation Index Expanded, 1900-present; (ii) Social Sciences Citation Index, 1900-present; (iii) Arts & Humanities Citation Index, 1975-present; (iv) Conference Proceedings Citation Index- Science, 1990-present; (v) Conference Proceedings Citation Index- Social Science & Humanities, 1990-present; (vi) Book Citation Index– Science, 2005-present; (vii) Book Citation Index– Social Sciences & Humanities, 2005-present; (viii) Emerging Sources Citation Index, 2015-present.

[3] For those interested, the following PMIDs were identified as ‘articles’ by WoS, but as ‘reviews’ by PubMed: ‘34502097’ ‘33400920’ ‘32060678’ ‘31925983’ ‘31734142’ ‘30496762’ ‘30253045’ ‘29660735’ ‘29518698’ ‘29065361’ ‘29048602’ ‘28867943’ ‘28586471’ ‘28301323’ ‘27974283’ ‘27626613’ ‘27603523’ ‘27603327’ ‘27513442’ ‘27273834’ ‘27071789’ ‘26940141’ ‘26932552’ ‘26895254’ ‘26869847’ ‘26788924’ ‘26581735’ ‘26548910’ ‘26317636’ ‘26121678’ ‘26094200’ ‘25997760’ ‘25631363’ ‘25526824’ ‘25446893’ ‘25153535’ ‘25092245’ ‘25086828’ ‘24946432’ ‘24637261’ ‘24588761’ ‘24508579’ ‘24486356’ ‘24462936’ ‘24239932’ ‘24239931’ ‘24231551’ ‘24216134’ ‘23955310’ ‘23856187’ ‘23686025’ ‘23589638’ ‘23575742’ ‘23469841’ ‘23055480’ ‘22981649’ ‘22406388’ ‘22373652’ ‘22141469’ ‘21960250’ ‘21881219’ ‘21802859’ ‘21714746’ ‘21618004’ ‘21150165’ ‘20435805’ ‘20173685’ ‘19840865’ ‘19546570’ ‘19309413’ ‘15288368’ ‘12359512’ ‘9401603’ ‘9213136’ ‘7630585’

[4] Sci2 Team. (2009). Science of Science (Sci2) Tool. Indiana University and SciTech Strategies. Stable URL: https://sci2.cns.iu.edu

[5] Bastian, M., Heymann, S., & Jacomy, M. (2009).
OpenCitations Index N-Triples dataset storing data source information about...
figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2025). OpenCitations Index N-Triples dataset storing data source information about all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.24427051.v5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24427051.v5
Dataset updated
Jul 16, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains data source collection (e.g., COCI, DOCI, POCI, etc) information about all the citation data (in N-Triples format) included in the OpenCitations Index, released on July 10, 2025. In particular, any citation in the dataset, defined as an individual of the class cito:Citation, includes the following information:[property "prov:atLocation"] the data source entity identified by its URL (https://w3id.org/oc/index/[DATA-SOURCE]/);This version of the dataset contains:2,693,728,426 citationsThe size of the zipped archive is 25.7 GB, while the size of the unzipped N-Triples files is 426 GB.
OpenCitations Index CSV dataset storing data source information about all...
figshare.com
zip
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2025). OpenCitations Index CSV dataset storing data source information about all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.28677293.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28677293.v2
Dataset updated
Jul 16, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains data source collection (e.g., COCI, DOCI, POCI, etc) information about all the citation data (in CSV format) included in the OpenCitations Index, released on July 10, 2025. In particular, any citation in the dataset, defined with its corresponding OCI (first column) has a corresponding value that defines the source (second column), e.g. "coci", "doci", "poci", etc.This version of the dataset contains:2,693,728,426 citationsThe size of the zipped archive is 23 GB, while the size of the unzipped CSV files is 104 GB.
OpenCitations Index CSV dataset of the provenance information of all the...
figshare.com
zip
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2025). OpenCitations Index CSV dataset of the provenance information of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.24417733.v5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24417733.v5
Dataset updated
Jul 15, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the provenance information (in CSV format) of all the citation data included in the OpenCitations Index, released on July 13, 2025. In particular, each line of the CSV file defines a citation, and includes the following information:[field "oci"] the Open Citation Identifier (OCI) for the citation;[field "snapshot"] the identifier of the snapshot;[field "agent"] the name of the agent that have created the citation data;[field "source"] the URL of the source dataset from where the citation data have been extracted;[field "created"] the creation time of the citation data.[field "invalidated"] the start of the destruction, cessation, or expiry of an existing entity by an activity;[field "description"] a textual description of the activity made;[field "update"] the UPDATE SPARQL query that keeps track of which metadata have been modified.The size of the zipped archive is 20 GB, while the size of the unzipped CSV files is 454 GB.

Data Citation Corpus V4.1 EUPMC and DataCite

kaggle.com

zip

Updated Sep 16, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Kea Kohv (2025). Data Citation Corpus V4.1 EUPMC and DataCite [Dataset]. https://www.kaggle.com/datasets/keakohv/data-citation-corpus-v4-1-eupmc-and-datacite

Explore at:

zip(256037500 bytes)Available download formats

Dataset updated

Sep 16, 2025

Authors

Kea Kohv

Description

This dataset originates from the Data Citation Corpus V4.1: https://zenodo.org/records/16901115

To recreate this dataset, first download the csv format files from Corpus V4.1: https://zenodo.org/records/16901115

Then run this:

import glob
import pandas as pd

# Read all CSV files from the folder
# Make sure to have the correct folder where your csv files have been unzipped
csv_files = glob.glob('2025-08-15-data-citation-corpus-v4.1/*.csv')

# Read and combine all CSV files
dataframes = []
for file in csv_files:
  df = pd.read_csv(file)
  dataframes.append(df)

df_mdc_combined = pd.concat(dataframes, ignore_index=True).drop_duplicates()

# To save space, drop unnecessary columns
df_mdc_combined = df_mdc_combined.drop(columns=['id', 'subjects', 'affiliations', 'affiliationsROR', 'funders', 'fundersROR'])

# Keep only rows where source is 'datacite' or 'eupmc', we don't need others
df_mdc_combined = df_mdc_combined[df_mdc_combined.source.isin(['datacite','eupmc'])].copy()

# Remove https://doi.org/ from publication
# Replace / with _ in publication
df_mdc_combined['publication'] = df_mdc_combined['publication'].str.replace('https://doi.org/', '', regex=False)
df_mdc_combined['publication'] = df_mdc_combined['publication'].str.replace('/', '_', regex=False)

df_mdc_combined.to_parquet('data_citation_corpus_filtered_v4.1.parquet', index=False)

References: DataCite, & Make Data Count. (2025). Data Citation Corpus Data File (v4.1) [Data set]. DataCite. https://doi.org/10.5281/zenodo.16901115

d
Data from: U-Index, a dataset and an impact metric for informatics tools and...
search.dataone.org
datasetcatalog.nlm.nih.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alison Callahan; Rainer Winnenburg; Nigam H. Shah (2025). U-Index, a dataset and an impact metric for informatics tools and databases [Dataset]. http://doi.org/10.5061/dryad.gj651
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.gj651
Dataset updated
Apr 11, 2025
Dataset provided by
Dryad Digital Repository
Authors
Alison Callahan; Rainer Winnenburg; Nigam H. Shah
Time period covered
Feb 22, 2019
Description
Measuring the usage of informatics resources such as software tools and databases is essential to quantifying their impact, value and return on investment. We have developed a publicly available dataset of informatics resource publications and their citation network, along with an associated metric (u-Index) to measure informatics resourcesâ€™ impact over time. Our dataset differentiates the context in which citations occur to distinguish between â€˜awarenessâ€™ and â€˜usageâ€™, and uses a citing universe of open access publications to derive citation counts for quantifying impact. Resources with a high ratio of usage citations to awareness citations are likely to be widely used by others and have a high u-Index score. We have pre-calculated the u-Index for nearly 100,000 informatics resources. We demonstrate how the u-Index can be used to track informatics resource impact over time. The method of calculating the u-Index metric, the pre-computed u-Index values, and the dataset we compiled to calc...
r
Big Data and Society Abstract & Indexing - ResearchHelpDesk
researchhelpdesk.org
Updated Jun 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Big Data and Society Abstract & Indexing - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/abstract-and-indexing/477/big-data-and-society
Explore at:
Dataset updated
Jun 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Big Data and Society Abstract & Indexing - ResearchHelpDesk - Big Data & Society (BD&S) is open access, peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies. The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business, and government relations, expertise, methods, concepts, and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practice that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, the content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government, and crowdsourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes. BD&S seeks contributions that analyze Big Data practices and/or involve empirical engagements and experiments with innovative methods while also reflecting on the consequences for how societies are represented (epistemologies), realized (ontologies) and governed (politics). Article processing charge (APC) The article processing charge (APC) for this journal is currently 1500 USD. Authors who do not have funding for open access publishing can request a waiver from the publisher, SAGE, once their Original Research Article is accepted after peer review. For all other content (Commentaries, Editorials, Demos) and Original Research Articles commissioned by the Editor, the APC will be waived. Abstract & Indexing Clarivate Analytics: Social Sciences Citation Index (SSCI) Directory of Open Access Journals (DOAJ) Google Scholar Scopus
d
Data from: Data sharing in sociology journals - Dataset - B2FIND
demo-b2find.dkrz.de
Updated Sep 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Data from: Data sharing in sociology journals - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/5c911e2e-4e7d-5609-9f62-a2474a43bdd0
Explore at:
Dataset updated
Sep 21, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Data sharing is key for replication and re-use in empirical research. Scientific journals can play a central role by establishing data policies and providing technologies. In this study factors of influence for data sharing are analyzed by investigating journal data policies and author behavior in sociology. The websites of 140 journals from sociology were consulted to check their data policy. The results are compared with similar studies from political science and economics. For five selected journals with a broad variety all articles from two years are examined to see if authors really cite and share their data, and which factors are related to this. Inhaltscodierung Internetbeobachtung Content Analysis Web-based observation Journals of the 2013 Social Science Citation Index; Articles from 5 selected journals in 2012 and 2013. Full selection of the journals in the 2013 Social Science Citation Index in the category "sociology"; All articles from 5 selected journals in 2012 and 2013.
COUNTRIES Research & Science Dataset - SCImagoJR
kaggle.com
zip
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Jalaali (2025). COUNTRIES Research & Science Dataset - SCImagoJR [Dataset]. https://www.kaggle.com/datasets/alijalali4ai/scimago-country-info-and-rank
Explore at:
zip(54895151 bytes)Available download formats
Dataset updated
Apr 10, 2025
Authors
Ali Jalaali
Description
The SCImago Journal & Country Rank is a publicly available portal that includes the journals and country scientific indicators developed from the information contained in the Scopus® database (Elsevier B.V.). These indicators can be used to assess and analyze scientific domains. Country rankings may also be compared or analysed separately.

✅Collected by: SCImagoJR Country Data Collector Notebook

💬Also have a look at
💡 UNIVERSITIES & Research INSTITUTIONS Rank - SCImagoIR
💡 Scientific JOURNALS Indicators & Info - SCImagoJR

27 major thematic subject areas as well as 309 specific subject categories according to Scopus® Classification.

Citation data is drawn from over 34,100 titles from more than 5,000 international publishers

SCImago is a research group from the Consejo Superior de Investigaciones Científicas (CSIC), University of Granada, Extremadura, Carlos III (Madrid) and Alcalá de Henares, dedicated to information analysis, representation and retrieval by means of visualisation techniques.

☢️❓The entire dataset is obtained from public and open-access data of ScimagoJR (SCImago Journal & Country Rank)
ScimagoJR Country Rank
SCImagoJR About Us

Available indicators:

Documents: Number of documents published during the selected year. It is usually called the country's scientific output.

Citable Documents: Selected year citable documents. Exclusively articles, reviews and conference papers are considered.

Citations: Number of citations by the documents published during the source year, --i.e. citations in years X, X+1, X+2, X+3... to documents published during year X. When referred to the period 1996-2021, all published documents during this period are considered.

Citations per Document: Average citations per document published during the source year, --i.e. citations in years X, X+1, X+2, X+3... to documents published during year X. When referred to the period 1996-2021, all published documents during this period are considered.

Self Citations: Country self-citations. Number of self-citations of all dates received by the documents published during the source year, --i.e. self-citations in years X, X+1, X+2, X+3... to documents published during year X. When referred to the period 1996-2021, all published documents during this period are considered.

H index: The h index is a country's number of articles (h) that have received at least h- citations. It quantifies both country's scientific productivity and scientific impact and it is also applicable to scientists, journals, etc.
arXiv publications dataset with simulated citation relationships
figshare.com
txt
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacek Miecznikowski; Dominik Tomaszuk (2023). arXiv publications dataset with simulated citation relationships [Dataset]. http://doi.org/10.6084/m9.figshare.6449756.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6449756.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jacek Miecznikowski; Dominik Tomaszuk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
arXiv publications dataset with simulated citation relationshipshttps://github.com/jacekmiecznikowski/neo4index App evaluates scientific reasearch impact using author-level metrics (h-index and more)This collection contains data aquired from arXiv.org via OAI2 protocol.arXiv does not provide citations metadata so this data was pseudo-randomly simulated.We evaluated scientific reasearch impact using six popular author-level metrics:* h-index,* m quotient,* e-index,* m-index,* r-index,* ar-index.Sourcehttps://arxiv.org/help/bulk_data (downloaded: 2018-03-23; over 1.3 million publications)Files* arxiv_bulk_metadata_2018-03-23.tar.gz - file downloaded using oai-harvester contains metadata of all arXiv publications to date.* categories.csv - file contains categories from arXiv with category-subcategory division* publications.csv - file contains information about articles like: id, title, abstract, url, categories and date* authors.csv - file contains authors data like first name, last name and id of published article* citations.csv - file contains simulated relationships between all publications using arxivCite* indices.csv - file contains 6 author-level metrics calculated on database using neo4indexStatisticsh-index Average = 3.5836524733724495m quotient Average = 0.5831426366846965e-index Average = 7.9260187734579075m-index Average = 29.436844659143155r-index Average = 8.931101630575293ar-index Average = 3.5439082808721025h-index Median = 1.0m quotient Median = 0.4167e-index Median = 5.3852m-index Median = 17.0r-index Median = 5.831ar-index Median = 2.7928h-index Mode = 1.0m quotient Mode = 1.0e-index Mode = 0.0m-index Mode = 0.0r-index Mode = 0.0ar-index Mode = 0.0
d
National Research Foundation of Korea_KCI Journal Information
data.go.kr
csv
Updated Sep 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). National Research Foundation of Korea_KCI Journal Information [Dataset]. https://www.data.go.kr/en/data/3049043/fileData.do
Explore at:
csvAvailable download formats
Dataset updated
Sep 3, 2025
License
https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
Area covered
South Korea
Description
The KCI Journal Information data provides key information on domestic academic journals registered in the Korea Citation Index (KCI) system. It includes detailed information such as electronic and paper International Standard Serial Numbers (ISSNs), journal titles, indexing categories, research areas, year of publication, publication cycle, language, issuing institutions, affiliated research institutes, affiliated universities, and institutional classification. This data can be used for a variety of research and practical purposes, including assessing the current status of academic journals, comparing journals by field of study, analyzing the academic activities of researchers and institutions, and selecting journals. Updated annually, the most recent information is available based on the revision date.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nicolas Robinson-garcia; Evaristo Jiménez Contreras; Daniel Torres-Salinas (2016). Supplementary material to manuscript: Analyzing data citation practices to the Data Citation Index [Dataset]. http://doi.org/10.6084/m9.figshare.1250031.v1

Supplementary material to manuscript: Analyzing data citation practices to the Data Citation Index

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.1250031.v1

Dataset updated

Jan 19, 2016

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Nicolas Robinson-garcia; Evaristo Jiménez Contreras; Daniel Torres-Salinas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Supplementary material to an analysis on data citation practices based on the Data Citation Index from Thomson Reuters. This database launched in 2012 aims to link data sets and data studies with citation received from the rest of their citation indexes. Funding bodies and research organizations are increasingly demanding the need of researchers to make their scientific data available in a reusable and reproducible manner, aiming to maximize the allocation of funding while providing transparency on the scientific process. The DCI harvests citations to research data from papers indexed in the Web of Knowledge. It relies on the information provided by the data repository as data citation practices are inconsistent or inexistent in many cases. The findings of this study show that data citation practices are far from common in most research fields.. Some differences have been reported on the way researchers cite data: while in the areas of Science and Engineering & Technology data sets were the most cited, in Social Sciences and Arts & Humanities data studies play a greater role. 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. While data citation practices are rare in most fields, they have expanded in disciplines such as Crystallography or Genomics. We conclude by emphasizing the role the DCI may play to encourage consistent and standardized citation of research data which will allow considering its use on following the research process developed by researchers, from data collection to publication.

Clear search

Close search

Google apps

Main menu

Supplementary material to manuscript: Analyzing data citation practices to...

Bibliographic data on datasets (from 2020) affiliated to Most Wiedzy and...

August 2025 data-update for "Updated science-wide author databases of...

Datasets indexed in Data Citation Index in the Astronomy and Astrophysics...

Countries and universities rankings of their research output according to...

Papers Citations VS H-index

CS Researchers H-index and Citation Analysis

A New Method to Evaluate Impact

About this dataset

How to use the dataset

Research Ideas

Columns

OpenCitations Index N-Triples dataset of all the citation data

POCI CSV dataset of all the citation data

Bibliographic data on datasets affiliated to Most Wiedzy and indexed in Data...

Citation network data sets for 'Oxytocin – a social peptide? Deconstructing...

OpenCitations Index N-Triples dataset storing data source information about...

OpenCitations Index CSV dataset storing data source information about all...

OpenCitations Index CSV dataset of the provenance information of all the...

Data Citation Corpus V4.1 EUPMC and DataCite

Data from: U-Index, a dataset and an impact metric for informatics tools and...

Big Data and Society Abstract & Indexing - ResearchHelpDesk

Data from: Data sharing in sociology journals - Dataset - B2FIND

COUNTRIES Research & Science Dataset - SCImagoJR

✅Collected by: SCImagoJR Country Data Collector Notebook

Available indicators:

arXiv publications dataset with simulated citation relationships

National Research Foundation of Korea_KCI Journal Information

Supplementary material to manuscript: Analyzing data citation practices to the Data Citation Index