100+ datasets found
  1. d

    Data from: Data reuse and the open data citation advantage

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Updated Oct 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2013
    Dataset provided by
    Dryad
    Authors
    Heather A. Piwowar; Todd J. Vision
    Time period covered
    2013
    Description

    Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...

  2. COCI CSV dataset of all the citation data

    • figshare.com
    bin
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2023). COCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.6741422.v19
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all the citation data (in CSV format) included in COCI, released on 23 January 2023. In particular, each line of the CSV file defines a citation, and includes the following information:

    [field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the DOI of the citing entity; [field "cited"] the DOI of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

    This version of the dataset contains:

    1,463,920,523 citations; 77,045,952 bibliographic resources.

    The size of the zipped archive is 37.5 GB, while the size of the unzipped CSV file is 238.5 GB.

    Additional information about COCI can be found at the official webpage.

  3. Citations to software and data in Zenodo via open sources

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.

  4. d

    Police Data: Traffic Citations

    • catalog.data.gov
    • data.somervillema.gov
    • +1more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.somervillema.gov (2025). Police Data: Traffic Citations [Dataset]. https://catalog.data.gov/dataset/police-data-traffic-citations
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.somervillema.gov
    Description

    This complete version of the dataset contains traffic citations issued in Somerville by Somerville police officers since 2017. Citations include both written warnings and those with a monetary fine. Every citation is composed of one or more violations. Each row in the dataset represents a violation. This data set should be refreshed daily with data appearing with a one-month delay (e.g. citations issued on 1/1 will appear on 2/1). If a daily update does not refresh, please email data@somervillema.gov.

  5. o

    Citation Knowledge with Section and Context

    • ordo.open.ac.uk
    zip
    Updated May 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 5, 2020
    Dataset provided by
    The Open University
    Authors
    Anita Khadka
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).

  6. P

    Data from: Citeseer Dataset

    • paperswithcode.com
    • huggingface.co
    Updated Mar 4, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C. Lee Giles; Kurt D. Bollacker; Steve Lawrence (2007). Citeseer Dataset [Dataset]. https://paperswithcode.com/dataset/citeseer
    Explore at:
    Dataset updated
    Mar 4, 2007
    Authors
    C. Lee Giles; Kurt D. Bollacker; Steve Lawrence
    Description

    The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.

  7. OpenCitations Index N-Triples dataset of all the citation data

    • figshare.com
    zip
    Updated Mar 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2025). OpenCitations Index N-Triples dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.24369136.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all the citation data (in N-Triples format) included in the OpenCitations Index, released on March 24, 2025. In particular, any citation in the dataset, defined as an individual of the class cito:Citation, includes the following information:[citation IRI] the Open Citation Identifier (OCI) for the citation, defined in the final part of the URL identifying the citation (https://w3id.org/oc/index/ci/[OCI]);[property "cito:hasCitingEntity"] the citing entity identified by its OMID URL (https://https://opencitations.net/meta/[OMID]);[property "cito:hasCitedEntity"] the cited entity identified by its OMID URL (https://https://opencitations.net/meta/[OMID]);[property "cito:hasCitationCreationDate"] the creation date of the citation (i.e. the publication date of the citing entity);[property "cito:hasCitationTimeSpan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity);[type "cito:JournalSelfCitation"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal);[type "cito:AuthorSelfCitation"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).Note: the information for each citation is sourced from OpenCitations Meta (https://opencitations.net/meta), a database that stores and delivers bibliographic metadata for all bibliographic resources included in the OpenCitations Indexes. The data provided in this dump is therefore based on the state of OpenCitations Meta at the time this collection was generated.This version of the dataset contains:2,155,497,918 citationsThe size of the zipped archive is 80.6 GB, while the size of the unzipped N-Triples files is 1.9 TB.

  8. Data Citation Corpus Data File

    • redivis.com
    application/jsonl +7
    Updated Nov 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2024). Data Citation Corpus Data File [Dataset]. https://redivis.com/datasets/am5t-e9jvcn6s5
    Explore at:
    sas, avro, parquet, spss, csv, arrow, application/jsonl, stataAvailable download formats
    Dataset updated
    Nov 8, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 1, 1839 - Oct 1, 2115
    Description

    Abstract

    Data file for the first release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust.

    Methodology

    The original data files includes 10,006,058 data citation records in JSON and CSV formats. The JSON file is the version of record.

    All CSVs from version 1.1 of the corpus have been uploaded to Redivis and combined into a single table.

    The data citations in the file originate from DataCite Event Data and a project by Chan Zuckerberg Initiative (CZI) to identify mentions to datasets in the full text of articles.

    Each data citation record is comprised of:

    • A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication object (journal article or preprint) in which the dataset is cited

    • Metadata for the cited dataset and for the citing publication object

    %3C!-- --%3E

  9. A dataset from a survey investigating disciplinary differences in data...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, pdf, txt
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. http://doi.org/10.5281/zenodo.7853477
    Explore at:
    txt, pdf, bin, csvAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GENERAL INFORMATION

    Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

    Date of data collection: January to March 2022

    Collection instrument: SurveyMonkey

    Funding: Alfred P. Sloan Foundation


    SHARING/ACCESS INFORMATION

    Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

    Links to publications that cite or use the data:

    Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

    Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
    A survey investigating disciplinary differences in data citation.
    Zenodo. https://doi.org/10.5281/zenodo.7555266


    DATA & FILE OVERVIEW

    File List

    • Filename: MDCDatacitationReuse2021Codebookv2.pdf
      Codebook
    • Filename: MDCDataCitationReuse2021surveydatav2.csv
      Dataset format in csv
    • Filename: MDCDataCitationReuse2021surveydatav2.sav
      Dataset format in SPSS
    • Filename: MDCDataCitationReuseSurvey2021QNR.pdf
      Questionnaire

    Additional related data collected that was not included in the current data package: Open ended questions asked to respondents


    METHODOLOGICAL INFORMATION

    Description of methods used for collection/generation of data:

    The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

    Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

    Methods for processing the data:

    Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

    Instrument- or software-specific information needed to interpret the data:

    The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.


    DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

    Number of variables: 95

    Number of cases/rows: 2,492

    Missing data codes: 999 Not asked

    Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.

  10. Methodology data of "A qualitative and quantitative citation analysis toward...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Heibi; Ivan Heibi; Silvio Peroni; Silvio Peroni (2022). Methodology data of "A qualitative and quantitative citation analysis toward retracted articles: a case of study" [Dataset]. http://doi.org/10.5281/zenodo.4323221
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ivan Heibi; Ivan Heibi; Silvio Peroni; Silvio Peroni
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This document contains the datasets and visualizations generated after the application of the methodology defined in our work: "A qualitative and quantitative citation analysis toward retracted articles: a case of study". The methodology defines a citation analysis of the Wakefield et al. [1] retracted article from a quantitative and qualitative point of view. The data contained in this repository are based on the first two steps of the methodology. The first step of the methodology (i.e. “Data gathering”) builds an annotated dataset of the citing entities, this step is largely discussed also in [2]. The second step (i.e. "Topic Modelling") runs a topic modeling analysis on the textual features contained in the dataset generated by the first step.

    Note: the data are all contained inside the "method_data.zip" file. You need to unzip the file to get access to all the files and directories listed below.

    Data gathering

    The data generated by this step are stored in "data/":

    1. "cits_features.csv": a dataset containing all the entities (rows in the CSV) which have cited the Wakefield et al. retracted article, and a set of features characterizing each citing entity (columns in the CSV). The features included are: DOI ("doi"), year of publication ("year"), the title ("title"), the venue identifier ("source_id"), the title of the venue ("source_title"), yes/no value in case the entity is retracted as well ("retracted"), the subject area ("area"), the subject category ("category"), the sections of the in-text citations ("intext_citation.section"), the value of the reference pointer ("intext_citation.pointer"), the in-text citation function ("intext_citation.intent"), the in-text citation perceived sentiment ("intext_citation.sentiment"), and a yes/no value to denote whether the in-text citation context mentions the retraction of the cited entity ("intext_citation.section.ret_mention").
      Note: this dataset is licensed under a Creative Commons public domain dedication (CC0).
    2. "cits_text.csv": this dataset stores the abstract ("abstract") and the in-text citations context ("intext_citation.context") for each citing entity identified using the DOI value ("doi").
      Note: the data keep their original license (the one provided by their publisher). This dataset is provided in order to favor the reproducibility of the results obtained in our work.

    Topic modeling
    We run a topic modeling analysis on the textual features gathered (i.e. abstracts and citation contexts). The results are stored inside the "topic_modeling/" directory. The topic modeling has been done using MITAO, a tool for mashing up automatic text analysis tools, and creating a completely customizable visual workflow [3]. The topic modeling results for each textual feature are separated into two different folders, "abstracts/" for the abstracts, and "intext_cit/" for the in-text citation contexts. Both the directories contain the following directories/files:

    1. "mitao_workflows/": the workflows of MITAO. These are JSON files that could be reloaded in MITAO to reproduce the results following the same workflows.

    2. "corpus_and_dictionary/": it contains the dictionary and the vectorized corpus given as inputs for the LDA topic modeling.

    3. "coherence/coherence.csv": the coherence score of several topic models trained on a number of topics from 1 - 40.

    4. "datasets_and_views/": the datasets and visualizations generated using MITAO.

    References

    1. Wakefield, A., Murch, S., Anthony, A., Linnell, J., Casson, D., Malik, M., Berelowitz, M., Dhillon, A., Thomson, M., Harvey, P., Valentine, A., Davies, S., & Walker-Smith, J. (1998). RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet, 351(9103), 637–641. https://doi.org/10.1016/S0140-6736(97)11096-0
    2. Heibi, I., & Peroni, S. (2020). A methodology for gathering and annotating the raw-data/characteristics of the documents citing a retracted article v1 (protocols.io.bdc4i2yw) [Data set]. In protocols.io. ZappyLab, Inc. https://doi.org/10.17504/protocols.io.bdc4i2yw

    3. Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A User Friendly and Modular Software for Topic Modelling [JD]. PuntOorg International Journal, 5(2), 135–149. https://doi.org/10.19245/25.05.pij.5.2.3

  11. d

    Environmental Citations

    • catalog.data.gov
    • data.baltimorecity.gov
    • +3more
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baltimore City (2024). Environmental Citations [Dataset]. https://catalog.data.gov/dataset/environmental-citations-01d16
    Explore at:
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Baltimore City
    Description

    This dataset represent Environmental citations that are often issued for violations of Baltimore City codes related to sanitation, health, safety, and other quality-of-life issues.

  12. Bibliographic data from the GBIF literature tracking service

    • zenodo.org
    zip
    Updated Mar 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GBIF; GBIF (2020). Bibliographic data from the GBIF literature tracking service [Dataset]. http://doi.org/10.35000/mp8t-d323
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 3, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    GBIF; GBIF
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the result of a call to the GBIF resource API to gather bibliographic items citing GBIF or GBIF mobilised datasets, monitored through the GBIF literature tracking service.

    The zip contains a single JSON format datafile, which contains 9018 records.

    The URL used is: https://www.gbif.org/api/resource/search?contentType=literature&limit=10000

    It was accessed on: 2020-03-03.

  13. Citation Graph

    • kaggle.com
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/harvardlil/citation-graph/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Caselaw Access Project
    Description

    Context

    The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

    The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

    Learn More: https://case.law/download/citation_graph/

    Access Limits: https://case.law/api/#limits

    Content

    This dataset includes citations and metadata for the CAP citation graph in CSV format.

    Acknowledgements

    The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

    Inspiration

    People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

    Cite Grid is the first visualization we've created based on data from our citation graph.

    Have something to share? We're excited to hear about it.

  14. S

    ESSD Data Citation Statistics (2014-2023)

    • scidb.cn
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lili ZHANG; Jiayi Hui; Ruilin LIU; Yike HU (2025). ESSD Data Citation Statistics (2014-2023) [Dataset]. http://doi.org/10.57760/sciencedb.12736
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Lili ZHANG; Jiayi Hui; Ruilin LIU; Yike HU
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is the supporting materials to a paper entitled "Characterizing Data Reusability through A Data Citation Framework- A Case Study on Earth System Science Data". The dataset is mainly used to support different analysis within this manuscript, particularly on citation demographics, citation intensity, citation aging, and citation neworks.

  15. Z

    Coronavirus Open Citations Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peroni, SIlvio (2020). Coronavirus Open Citations Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3756802
    Explore at:
    Dataset updated
    May 16, 2020
    Dataset authored and provided by
    Peroni, SIlvio
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Coronavirus Open Citations Dataset curated by OpenCitations currently contains (as of 16 May 2020) information about 189,697 citations and about the 49,719 citing or cited articles involved in these citations. A subset of these data (stored in the "_partial.json" files introduced below) is used for creating the visualization available at https://opencitations.github.io/coronavirus/.

    Each item in both the JSON files storing citations ('citations_full.json' and 'citations_partial.json') contains the following fields:

    "id", a numeric identifier of the citation;

    "source", the citing entity;

    "target", the cited entity.

    Each item in both the JSON files storing article metadata ('metadata_full.json' and 'metadata_partial.json') contains the following fields:

    "id", the DOI of the article;

    "author", the surname of all the authors of the article;

    "year", the year of publication;

    "title", the title of the article;

    "source_title", the title of the venue where the article has been published.

    In addition, any item in the file 'metadata_partial.json' contains another field:

    "count": the overall number of citations that the article received.

  16. Data Citation Corpus

    • redivis.com
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Data Citation Corpus [Dataset]. https://redivis.com/datasets/am5t-e9jvcn6s5
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 1, 1839 - Oct 1, 2115
    Description

    The table Data Citation Corpus is part of the dataset Data Citation Corpus Data File, available at https://redivis.com/datasets/am5t-e9jvcn6s5. It contains 5256114 rows across 14 variables.

  17. Z

    Data from: BIP! NDR (NoDoiRefs): a dataset of citations from papers without...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vergoulis, Thanasis (2024). BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7962019
    Explore at:
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Vergoulis, Thanasis
    Tryfonopoulos, Christos
    Chatzopoulos, Serafeim
    Koloveas, Paris
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data.

    BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~3.6M citations made by approximately 192.6K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI. The DBLP snapshot used for this version was the one released on November 2024.

    File Structure:

    The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming.

    Each JSON object has three main fields:

    “_id”: a unique identifier,

    “citing_paper”, the “dblp_id” of the citing paper,

    “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields:

    “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present.

    “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present.

    “bibliographic_reference”: the raw citation string as it appears in the citing paper.

    Changes from previous version:

    Added more papers from DBLP.

  18. Number of NIH Funded papers with a link to a dataset - based on Data...

    • figshare.com
    txt
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Hahnel (2024). Number of NIH Funded papers with a link to a dataset - based on Data Citation Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.26649703.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 14, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mark Hahnel
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We at Digital Science have been looking at the Data Citation Corpus, to dig deeper into data citation counts.The first release is based on a seed file that includes data citations from the following sources:Data citations from DataCite and Crossref DOI metadata, via Event Data.Data citations from the CZI Science Knowledge Graph, identified via a Named Entity Recognition model algorithm that searches for mentions to datasets in the full text of journal articles and preprints in Europe PMC.So we are basically looking at papers that have a link to a DataCite DOI or accession number.By combining this dataset with Dimensions.ai data in Google Big Query, we we're able to add more dimensions to the dataset (pardon the pun), such as funder or institution. The Data Citation Corpus only gave us about 70% of the paper links that were resolvable DOIs. This should improve over time.This allows us to track how well things like the NIH open data policy is encouraging linking to datasets from papers.

  19. P

    DBLP Dataset

    • paperswithcode.com
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2021). DBLP Dataset [Dataset]. https://paperswithcode.com/dataset/dblp
    Explore at:
    Dataset updated
    Apr 13, 2021
    Authors
    Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su
    Description

    The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.

  20. I

    Cline Center Coup d’État Project Dataset

    • databank.illinois.edu
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
    Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
    Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv

Data from: Data reuse and the open data citation advantage

Related Article
Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Oct 1, 2013
Dataset provided by
Dryad
Authors
Heather A. Piwowar; Todd J. Vision
Time period covered
2013
Description

Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...

Search
Clear search
Close search
Google apps
Main menu