100+ datasets found
  1. Citation Graph

    • kaggle.com
    zip
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/datasets/harvardlil/citation-graph
    Explore at:
    zip(306688738 bytes)Available download formats
    Dataset updated
    Jun 30, 2020
    Authors
    Caselaw Access Project
    Description

    Context

    The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

    The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

    Learn More: https://case.law/download/citation_graph/

    Access Limits: https://case.law/api/#limits

    Content

    This dataset includes citations and metadata for the CAP citation graph in CSV format.

    Acknowledgements

    The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

    Inspiration

    People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

    Cite Grid is the first visualization we've created based on data from our citation graph.

    Have something to share? We're excited to hear about it.

  2. Z

    Data for "Open Access impact on citations: a case study"

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bordignon, Frédérique; Andro, Mathieu (2020). Data for "Open Access impact on citations: a case study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_60293
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Direction de la Documentation, Ecole des Ponts ParisTech, Champs-sur-Marne, France
    DIST, INRA, Versailles, France
    Authors
    Bordignon, Frédérique; Andro, Mathieu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a list of 347 papers published in 2010 and retrieved from the Web of Science, Scopus and Google Scholar. For each paper, the number of citations and the citation date(s) have been collected. If the full-text is available online, the date of "liberation" and the URL of the file have been retrieved as well. The objective was to assess the impact of Open access on citation rate and more particularly the impact before and after full-text "liberation".

  3. I

    Dataset for "Continued use of retracted papers: Temporal trends in citations...

    • databank.illinois.edu
    Updated Jun 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tzu-Kun Hsiao; Jodi Schneider (2024). Dataset for "Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine" [Dataset]. http://doi.org/10.13012/B2IDB-8255619_V2
    Explore at:
    Dataset updated
    Jun 14, 2024
    Authors
    Tzu-Kun Hsiao; Jodi Schneider
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    Alfred P. Sloan Foundation
    U.S. National Institutes of Health (NIH)
    Description

    This dataset includes five files. Descriptions of the files are given as follows: FILENAME: PubMed_retracted_publication_full_v3.tsv - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. - PMIDs in the "cited_by" column that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. - Citation contexts that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = tables and table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. FILENAME: 724_knowingly_post_retraction_cit.csv (updated) - The 724 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". - Two citation contexts from retraction notices have been excluded from analyses. ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, table_or_figure_caption = tables and table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. FILENAME: Annotation manual.pdf - The manual for annotating the citation purposes in column 11) of the 724_knowingly_post_retraction_cit.tsv. FILENAME: retraction_notice_PMID.csv (new file added for this version) - A list of 8,346 PMIDs of retraction notices indexed in PubMed (retrieved on August 20, 2020, searched with the query "retraction of publication" [PT] ).

  4. s

    Citation Trends for "Supporting Data and Services Access in Digital...

    • shibatadb.com
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2025). Citation Trends for "Supporting Data and Services Access in Digital Government Environments" [Dataset]. https://www.shibatadb.com/article/JWopxJ3C
    Explore at:
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2004
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Supporting Data and Services Access in Digital Government Environments".

  5. I

    Data from: OpCitance: Citation contexts identified from the PubMed Central...

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tzu-Kun Hsiao; Vetle Torvik (2023). OpCitance: Citation contexts identified from the PubMed Central open access articles [Dataset]. http://doi.org/10.13012/B2IDB-4353270_V1
    Explore at:
    Dataset updated
    Feb 15, 2023
    Authors
    Tzu-Kun Hsiao; Vetle Torvik
    Dataset funded by
    U.S. National Institutes of Health (NIH)
    Description

    Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. Files: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE. • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs. • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable. • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component. • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-". • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence. • progression: Text progression of the citation context/sentence. Supplementary Files • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • Supplementary_File_1.zip – This file contains the code for generating the dataset.

  6. d

    August 2024 data-update for "Updated science-wide author databases of...

    • elsevier.digitalcommonsdata.com
    Updated Sep 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7
    Explore at:
    Dataset updated
    Sep 16, 2024
    Authors
    John P.A. Ioannidis
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

  7. Open Access In Africa: Scopus Citation Data

    • search.datacite.org
    • data.niaid.nih.gov
    Updated Jun 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dasapta Erwin Irawan; OB Ojemeni (2017). Open Access In Africa: Scopus Citation Data [Dataset]. http://doi.org/10.5281/zenodo.817600
    Explore at:
    Dataset updated
    Jun 23, 2017
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Zenodohttp://zenodo.org/
    Authors
    Dasapta Erwin Irawan; OB Ojemeni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The following citation dataset was retrieved from Scopus in June 24, 2017 (3am, Western Indonesian time).

    It consists of 3 sets of data based on our searches. Each search was saved both in 'csv' and 'bib':

    OA_Africa_inTitle.xxx: "Open Access" AND Africa IN TITLE
    OA_Africa_inTitle_inAbstract_inKeywords.xxx: "Open Access" AND Africa IN TITLE, IN ABSTRACT, IN KEYWORDS
    OAmovement_Africa_inTitle_inAbstract_inKeywords.xxx: "Open Access movement" AND Africa IN TITLE, IN ABSTRACT, IN KEYWORDS
    

    The access to Scopus was provided by The Central Library of Institut Teknologi Bandung (Indonesia)

  8. f

    Impact of NIH Public Access Policy on Citation Rates - Data from Study

    • figshare.com
    • indigo.uic.edu
    txt
    Updated Nov 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandra L. De Groote (2019). Impact of NIH Public Access Policy on Citation Rates - Data from Study [Dataset]. https://figshare.com/articles/dataset/Impact_of_NIH_Public_Access_Policy_on_Citation_Rates_-_Data_from_Study/10961135
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 23, 2019
    Dataset provided by
    University of Illinois Chicago
    Authors
    Sandra L. De Groote
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    A list of journals across several subject areas was developed from which to collect article citation data. Citation information and cited reference counts of all the articles published in 2006 and 2009 for these journals were obtained.

  9. n

    PLOS ONE publication and citation data

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +2more
    zip
    Updated May 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Petersen (2023). PLOS ONE publication and citation data [Dataset]. http://doi.org/10.6071/M39W8V
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2023
    Dataset provided by
    University of California, Merced
    Authors
    Alexander Petersen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Merged PLOS ONE and Web of Science data compiled in .dta files produced by STATA13. Included is a Do-file for reproducing the regression model estimates reported in the pre-print (Tables I and II) and published version (Table 1). Each observation (.dta line) corresponds to a given PLOS ONE article, with various article-level and editor-level characteristics used as explanatory and control variables. This summary provides a brief description of each variable and its source.

    If you use this data, please cite: A. M. Petersen. Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE. Journal of Informetrics 13, 100974 (2019). DOI: 10.1016/j.joi.2019.100974

    Methods We gathered the citation information for all PLOS ONE articles, indexed by A, from the Web of Science (WOS) Core Collection. From this data we obtained a master list of the unique digital object identifier, DOIA and the number of citations, cA, at the time of the data download (census) date

    (a) For the pre-print this corresponds to December 3, 2016;

    (b) and for the final published article this corresponds to February 25, 2019.

    We then used each DOIA to access the corresponding online XML version of each article at PLOS ONE by visiting the unique web address “http://journals.plos.org/plosone/article?id=” + “DOIA”. After parsing the full-text XML (primarily the author byline data and reference list), we merged the PLOS ONE publication information and WOS citation data by matching on DOIA.

    allofplos: PLOS has since made all full-text XML data freely available: https://www.plos.org/text-and-data-mining ; this option was not available at the moment of our data collection.

  10. Data from: unarXive: A Large Scholarly Data Set with Publications'...

    • zenodo.org
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarek Saier; Tarek Saier; Michael Färber; Michael Färber (2024). unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata [Dataset]. http://doi.org/10.5281/zenodo.3385851
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tarek Saier; Tarek Saier; Michael Färber; Michael Färber
    Description

    Description

    unarXive is a scholarly data set containing publications' full-text, annotated in-text citations, and a citation network.

    The data is generated from all LaTeX sources on arXiv and therefore of higher quality than data generated from PDF files.

    Typical use cases are

    • Citation recommendation
    • Citation context analysis
    • Bibliographic analyses
    • Reference string parsing

    Note: This Zenodo record is an old version of unarXive. You can find the most recent version at https://zenodo.org/record/7752754 and https://zenodo.org/record/7752615

    Access

    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    D O W N L O A D S A M P L E  ┃
    ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛

    To download the whole data set send an access request and note the following:

    Note: this Zenodo record is a "full" version of unarXive, which was generated from all of arXiv.org including non-permissively licensed papers. Make sure that your use of the data is compliant with the paper's licensing terms.¹

    ¹ For information on papers' licenses use arXiv's bulk metadata access.

    The code used for generating the data set is publicly available.

    Usage examples for our data set are provided at here on GitHub.

    Citing

    This initial version of unarXive is described in the following journal article.

    Tarek Saier, Michael Färber: "unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata", Scientometrics, 2020,
    [link to an author copy]

    The updated version is described in the following conference paper.

    Tarek Saier, Michael Färber. "unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network", JCDL 2023.
    [link to an author copy]

  11. Data from: Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and...

    • zenodo.org
    • search.datacite.org
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarek Saier; Michael Färber; Michael Färber; Tarek Saier (2024). Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks [Dataset]. http://doi.org/10.5281/zenodo.2553523
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tarek Saier; Michael Färber; Michael Färber; Tarek Saier
    Description

    Description

    unarXive is a scholarly data set containing publications' full-text, annotated in-text citations, and a citation network.

    The data is generated from all LaTeX sources on arXiv and therefore of higher quality than data generated from PDF files.

    Typical use cases are

    • Citation recommendation
    • Citation context analysis
    • Bibliographic analyses
    • Reference string parsing

    Note: This Zenodo record is an old version of unarXive. You can find the most recent version at https://zenodo.org/record/7752754 and https://zenodo.org/record/7752615

    Access

    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
    D O W N L O A D S A M P L E  ┃
    ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛

    To download the whole data set send an access request and note the following:

    Note: this Zenodo record is a "full" version of unarXive, which was generated from all of arXiv.org including non-permissively licensed papers. Make sure that your use of the data is compliant with the paper's licensing terms.¹

    ¹ For information on papers' licenses use arXiv's bulk metadata access.

    The code used for generating the data set is publicly available.

    Usage examples for our data set are provided at here on GitHub.

    Citing

    This initial version of unarXive is described in the following journal article.

    Tarek Saier, Michael Färber: "unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata", Scientometrics, 2020,
    [link to an author copy]

    The updated version is described in the following conference paper.

    Tarek Saier, Michael Färber. "unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network", JCDL 2023.
    [link to an author copy]

  12. Citation and access data, and journal impact factors for co-published...

    • figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Shanahan (2023). Citation and access data, and journal impact factors for co-published EQUATOR reporting guidelines [Dataset]. http://doi.org/10.6084/m9.figshare.3156211.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Daniel Shanahan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the full citation details and DOIs for 85 co-published reporting guidelines, together with the citation counts, number of article accesses and journal impact factor for each article and journal. This represents a total of nine research reporting statements, published across 58 journals in biomedicine.

  13. n

    Data from: Data reuse and the open data citation advantage

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Oct 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2013
    Dataset provided by
    National Evolutionary Synthesis Center
    Authors
    Heather A. Piwowar; Todd J. Vision
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

  14. D

    Access to Grey Content: An Analysis of Grey Literature based on Citation and...

    • ssh.datastations.nl
    mdb, pdf, tsv, txt +1
    Updated Jan 1, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. J. Farace; J. Frantzen; Dr. J. (INIST-CNRS) Schöpfel; C. (INIST-CNRS) Stock; Dr. A.K. (UvA) Boekhorst; Dr. J. Farace; J. Frantzen; Dr. J. (INIST-CNRS) Schöpfel; C. (INIST-CNRS) Stock; Dr. A.K. (UvA) Boekhorst (2006). Access to Grey Content: An Analysis of Grey Literature based on Citation and Survey Data, A Follow-up Study [Dataset]. http://doi.org/10.17026/DANS-XFQ-MDFG
    Explore at:
    mdb(18948096), zip(24444), pdf(125470), txt(468), tsv(70473), tsv(354353), tsv(276247), tsv(41)Available download formats
    Dataset updated
    Jan 1, 2006
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    Dr. J. Farace; J. Frantzen; Dr. J. (INIST-CNRS) Schöpfel; C. (INIST-CNRS) Stock; Dr. A.K. (UvA) Boekhorst; Dr. J. Farace; J. Frantzen; Dr. J. (INIST-CNRS) Schöpfel; C. (INIST-CNRS) Stock; Dr. A.K. (UvA) Boekhorst
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Grey literature, an area of interest to special librarians and information professionals, can be traced back a half-century. However, grey literature as a specialized field in information studies is less than a decade old. At GL'97 in Luxembourg, grey literature was redefined "as information produced on all levels of government, academics, business and industry in electronic and print formats not controlled by commercial publishers (i.e. where publishing is not the primary activity of the producing body)". The subject area was broadened and the need for continuing research and instruction pursued. The results of an online survey carried out in 2004 compared with survey results a decade prior indicate two changes: (1) a move to more specialization in the field of grey literature and (2) a move to more balance in activities related to research and teaching as compared with the processing and distribution of grey literature. It is not that the activities of processing and distribution are today of less concern, but technological advances and the Internet may have made them less labour intensive. The burden that grey literature poised to human resources and budgets appears to have been reduced enough that the benefits of the content of grey literature is discovered. And this discovery of a wealth of knowledge and information is the onset to further research and instruction in the field of grey literature. This research project is a follow-up or second part of a citation research. The first part was carried out last year and the results were presented in a conference paper at GL6 in New York. Citation analysis is a relatively objective quantitative method and must be carefully implemented (Moed, 2002). Thus, in an effort to expand the results of our initial analysis beyond the realm of the GL Conference Series, an Author Survey will also be implemented in this follow-up study. The empirical data gathered from the online questionnaire will be compared with the updated data from the Citation Database to which the citations in the GL6 Conference Proceedings will have been added. Comparative data from the comprehensive citation database (estimated 1650 records) and the data from the online author survey would then allow for a clearer demonstration of the impact of this research. Where only part of the impact of research is covered by citation analysis alone (Thelwall, 2002). This research will allow for tracking the life of a conference paper as well as the application and use of its content within and outside the grey circuit. Further gain would be a better profile of the GL authors, who are the source of GreyNet's knowledge and information base. This in turn could lead to the subsequent development of services that are in line with the needs of authors and researchers in the field of grey literature. For example, a citation style for grey literature, where special analysis of hyperlinked citations would provide an opportunity to address the problem of the disparity of web-based grey literature in the context of open archives.

  15. I

    Global News Index and Extracted Features Repository

    • databank.illinois.edu
    Updated Jun 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Global News Index and Extracted Features Repository [Dataset]. http://doi.org/10.13012/B2IDB-5649852_V1
    Explore at:
    Dataset updated
    Jun 15, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, you can fill out the Archer User Information Form. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to Archer Users Group. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [codebook]. Champaign, IL: University of Illinois. doi:10.13012/B2IDB-5649852_V1 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [database]. Champaign, IL: University of Illinois. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V1

  16. d

    Replication Data for: Mapping the landscape of geospatial data citations

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leahey, Amber; Genzinger, Peter (2024). Replication Data for: Mapping the landscape of geospatial data citations [Dataset]. http://doi.org/10.5683/SP2/JDLRJP
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Borealis
    Authors
    Leahey, Amber; Genzinger, Peter
    Time period covered
    Jan 1, 2015 - Jan 1, 2018
    Description

    This data supports the paper entitled "Mapping the landscape of geospatial data citations". The dataset covers geospatial data-intensive research papers published between 2015-2018 retrieved using Scopus. The article's citations were assessed for data citation occurances, and coded using a data citation classification. Data were enhanced and linked to subject coverage and journal policy status information using Excel & SPSS. For more information about how the data were created and coded please review the 'Methodology' section of the paper. More information is provided below, including supplemental documentation and related publications. Abstract (paper) ABSTRACT Data citations, similar to article and other research citations, are important references to research data that underlie published research results. In support of open science directives, these citations must adhere to specific conventions in terms of consistency of both placement within an article, and the actual availability or access to research data. To better understand the level to which geospatial research data are currently cited, we undertook a study to analyse the rate of data citation within a set of data-intensive geospatial research articles. After analysing 1717 scholarly articles published between 2015 and 2018, we found that very few, or 78 (5%), meaningfully cited primary or secondary geospatial data sources in the cited references section of the article. Even fewer researchers, only 25 or 1.5%, were found to have cited data using a DOI. Given the relatively low data citation rate, a focus on contributing factors including barriers to citing geospatial data is needed. And while open sharing requirements for geospatial data may change over time, driving data citation as a result, understanding benchmarks for data citation for monitoring purposes is useful.

  17. o

    Career promotions, research publications, Open Access dataset

    • ordo.open.ac.uk
    zip
    Updated Feb 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matteo Cancellieri; Nancy Pontika; David Pride; Petr Knoth; Hannah Metzler; Antonia Correia; Helene Brinken; Bikash Gyawali (2022). Career promotions, research publications, Open Access dataset [Dataset]. http://doi.org/10.21954/ou.rd.19228785.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2022
    Dataset provided by
    The Open University
    Authors
    Matteo Cancellieri; Nancy Pontika; David Pride; Petr Knoth; Hannah Metzler; Antonia Correia; Helene Brinken; Bikash Gyawali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.

  18. d

    Data from: U.S. Geological Survey Data Citation Analysis, 2016-2022

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). U.S. Geological Survey Data Citation Analysis, 2016-2022 [Dataset]. https://catalog.data.gov/dataset/u-s-geological-survey-data-citation-analysis-2016-2022
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    In 2022, publication and data linkages were evaluated using two methods in an effort to understand how a data citation workflow has been implemented by the U.S. Geological Survey (USGS) since the 2016 USGS instructional memorandum, Public Access to Results of Federally Funded Research at the U.S. Geological Survey: Scholarly Publications and Digital Data (USGS OSQI, 2016), went into effect, requiring USGS data be assigned a DOI, be accompanied by a citation, and be referenced from the associated publication (USGS OSQI, 2017). This data release includes data and publication structural metadata results retrieved from the USGS DOI Tool and Crossref APIs and Jupyter notebooks used to process and analyze the results.

  19. r

    Author survey data about bibliometrics and altmetrics for open access...

    • researchdata.se
    • demo.researchdata.se
    • +1more
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofie Wennström; Gabor Schubert; Jeroen Sondervan; Graham Stone (2019). Author survey data about bibliometrics and altmetrics for open access monographs – including data about online usage and citations of academic books from Stockholm University Press [Dataset]. http://doi.org/10.17045/STHLMUNI.8051717
    Explore at:
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Stockholm University
    Authors
    Sofie Wennström; Gabor Schubert; Jeroen Sondervan; Graham Stone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes a file with results from a survey sent to authors of open access monographs. The survey was available during March–April 2019 and the results are analysed in a paper presented at the 2019 Elpub conference on Jun 2–4 in Marseille, France entitled 'The significant difference in impact – an exploratory study about the meaning and value of metrics for open access monographs'. Version 2 of the dataset has been updated with the slides presented at the conference and the link to the full paper published in the French open archive HAL.

    The respondents of the survey were asked to comment on assumptions about bibliometrics and altmetrics currently in practice, and to think about the meaning of such data in relation to their experiences as authors of books published in a digital format and with an open license (i.e. a creative commons license). The survey questionnaire is included as a separate text document. The dataset also includes measures about the usage of open access books published by Stockholm University Press, including information about online usage, mentions in social media and citations. This data is collected from the publisher's platform, the Altmetric.com database, and citation data was collected from Dimensions, Google Scholar, Web of Science and CrossRef. The data was collected in February 2019, except for the figures from the OAPEN Library database, which was collected in November 2018. The paper, including the analysis of these data, is to be published in the Elpub Digital Library. The tables included in the dataset may vary slightly from those in the published paper, due to space restraints in the published version.

  20. OA vs NOA Citation Ratios

    • zenodo.org
    zip
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Fröse; Stefan Fröse (2025). OA vs NOA Citation Ratios [Dataset]. http://doi.org/10.5281/zenodo.15820075
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stefan Fröse; Stefan Fröse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Automated monthly plot of citation ratios using OpenAlex data. This includes a PDF visualization and supporting CSV generated from OpenAlex (CC0) data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/datasets/harvardlil/citation-graph
Organization logo

Citation Graph

CAP Citation Graph Citations and Metadata

Explore at:
zip(306688738 bytes)Available download formats
Dataset updated
Jun 30, 2020
Authors
Caselaw Access Project
Description

Context

The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

Learn More: https://case.law/download/citation_graph/

Access Limits: https://case.law/api/#limits

Content

This dataset includes citations and metadata for the CAP citation graph in CSV format.

Acknowledgements

The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

Inspiration

People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

Cite Grid is the first visualization we've created based on data from our citation graph.

Have something to share? We're excited to hear about it.

Search
Clear search
Close search
Google apps
Main menu