100+ datasets found
  1. Citations to software and data in Zenodo via open sources

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.

  2. POCI CSV dataset of all the citation data

    • figshare.com
    zip
    Updated Dec 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

    [field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

    This version of the dataset contains:

    717,654,703 citations; 26,024,862 bibliographic resources.

    The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.

  3. d

    PLOS ONE publication and citation data

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Petersen (2018). PLOS ONE publication and citation data [Dataset]. http://doi.org/10.6071/M39W8V
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2018
    Dataset provided by
    Dryad
    Authors
    Alexander Petersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2018
    Description

    Data enclosed in a single zipped folder:

    A) DASH-V2 : Data files for final published analysis (J. Informetrics, 2019)

    File A1: PubData_DOI_141986_Nc_0_2019.dta

    File A2: PubData_DOI_141986_Nc_0_2019_DOFILE

    B) DASH-V1 : Data files for preprint version (https://ssrn.com/abstract=2901272)

    File B1: PubData_Obs_102741_Nc_10_No2015_CitationsAnalysis.dta

    File B2: PubData_Obs_128734_Nc_10_AcceptanceTimeAnalysis.dta

    File B3: STATA13_DOFILE

    C) Data description common to all .dta files, which contain parsed and merged PLOS ONE and Web of Science metadata:

    File A3: UC-DASH_DataDescription_Petersen_V2.pdf

    File B4: UC-DASH_DataDescription_Petersen_V1.pdf

  4. d

    Louisville Metro KY - Uniform Citation Data 2020

    • catalog.data.gov
    • data.lojic.org
    • +3more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  5. d

    Louisville Metro KY - Uniform Citation Data 2023

    • catalog.data.gov
    • data.louisvilleky.gov
    • +4more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2023 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2023
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  6. iCite Database Snapshot 2021-11

    • nih.figshare.com
    bin
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque (2023). iCite Database Snapshot 2021-11 [Dataset]. http://doi.org/10.35092/yhjc.17108351.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov

  7. d

    Louisville Metro KY - Uniform Citation Data 2022

    • catalog.data.gov
    • data.lojic.org
    • +3more
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2022
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  8. Data from: BIP! NDR (NoDoiRefs): a dataset of citations from papers without...

    • zenodo.org
    application/gzip
    Updated Feb 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paris Koloveas; Paris Koloveas; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Christos Tryfonopoulos; Christos Tryfonopoulos; Thanasis Vergoulis; Thanasis Vergoulis (2024). BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops [Dataset]. http://doi.org/10.5281/zenodo.10651965
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paris Koloveas; Paris Koloveas; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Christos Tryfonopoulos; Christos Tryfonopoulos; Thanasis Vergoulis; Thanasis Vergoulis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data.

    BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~2.9M citations made by approximately 171K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.

    File Structure:

    The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming.

    Each JSON object has three main fields:

    • “_id”: a unique identifier,

    • “citing_paper”, the “dblp_id” of the citing paper,

    • “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields:

      • “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present.

      • “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present.

      • “bibliographic_reference”: the raw citation string as it appears in the citing paper.

    Changes from previous version:

    • Added more papers from DBLP.
  9. H

    GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Grennan; Martin Schibel; Andrew Collins; Joeran Beel (2019). GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [Data] [Dataset]. http://doi.org/10.7910/DVN/LXQXAO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Mark Grennan; Martin Schibel; Andrew Collins; Joeran Beel
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/LXQXAOhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/LXQXAO

    Description

    Extracting and parsing reference strings from research articles is a challenging task. State-of-the-art tools like GROBID apply rather simple machine learning models such as conditional random fields (CRF). Recent research has shown a high potential of deep-learning for reference string parsing. The challenge with deep learning is, however, that the training step requires enormous amounts of labeled data – which does not exist for reference string parsing. Creating such a large dataset manually, through human labor, seems hardly feasible. Therefore, we created GIANT. GIANT is a large dataset with 991,411,100 XML labeled reference strings. The strings were automatically created based on 677,000 entries from CrossRef, 1,500 citation styles in the citation-style language, and the citation processor citeproc-js. GIANT can be used to train machine learning models, particularly deep learning models, for citation parsing. While we have not yet tested GIANT for training such models, we hypothesise that the dataset will be able to significantly improve the accuracy of citation parsing. The dataset and code to create it, are freely available at https://github.com/BeelGroup/.

  10. Source Reference File

    • catalog.data.gov
    • data.wu.ac.at
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Source Reference File [Dataset]. https://catalog.data.gov/dataset/source-reference-file
    Explore at:
    Dataset updated
    Mar 8, 2025
    Dataset provided by
    Social Security Administrationhttp://www.ssa.gov/
    Description

    This file contains a national set of names and contact information for doctors, hospitals, clinics, and other facilities (known collectively as sources) from which medical evidence of record (MER) may be requested to support a claimant's disability application.

  11. Z

    Data from: On the Use of Context for Predicting Citation Worthiness of...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Haimin (2021). On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4651553
    Explore at:
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Mahata, Debanjan
    Gosangi, Rakesh
    Gheisarieha, Mohsen
    Zhang, Haimin
    Arora, Ravneet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ACL-cite dataset was created for the paper: “On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles” published in NAACL 2021. This dataset contains over 2.7 million sentences extracted from scholarly articles (from ACL Anthology [Bird et al.]) and their corresponding citation worthiness labels. The goal of the citation worthiness task is to determine whether a given sentence requires a citation.

    There are three CSV files in the dataset:

    train.csv: 1,625,268 rows

    dev.csv: 539,085 rows

    test.csv: 542,081 rows

    Each CSV file contains the following columns:

    document_id: identifier of the paper the sentence was extracted from

    section: name of the section the sentence was extracted from, (e.g. Abstract, Introduction, etc.)

    section_id: sequential identifier of the section in the paper

    paragraph_id: sequential identifier of the paragraph the sentence was extracted from

    sentence: the sentence with the citations removed

    raw_sentence: the raw sentence including the citations

    sentence_id: sequential identifier of the sentence in the paper

    label: citation worthiness label

    Note: The train/dev/test splits are done at the document_id level.

  12. l

    Louisville Metro KY - Uniform Citation Data (2016-2019)

    • data.lojic.org
    • s.cnmilf.com
    • +5more
    Updated Jun 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2022). Louisville Metro KY - Uniform Citation Data (2016-2019) [Dataset]. https://data.lojic.org/datasets/louisville-metro-ky-uniform-citation-data-2016-2019/about
    Explore at:
    Dataset updated
    Jun 2, 2022
    Dataset authored and provided by
    Louisville/Jefferson County Information Consortium
    License

    https://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license

    Area covered
    Kentucky, Louisville
    Description

    A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/The Louisville Metro Police Department (LMPD) began operations on January 6, 2003, as part of the creation of the consolidated city-county government in Louisville, Kentucky. It was formed by the merger of the Jefferson County Police Department and the Louisville Division of Police. The Louisville Metro Police Department is headed by Chief Jacquelyn Gwinn-Villaroel. LMPD divides Jefferson County into eight patrol divisions and operates a number of special investigative and support units.

  13. f

    iCite Database Snapshot 2024-11

    • nih.figshare.com
    bin
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iCite; B. Ian Hutchins; George Santangelo (2024). iCite Database Snapshot 2024-11 [Dataset]. http://doi.org/10.35092/yhjc28027859.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    The NIH Figshare Archive
    Authors
    iCite; B. Ian Hutchins; George Santangelo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov

  14. Reference Manager Data Citation Analysis

    • zenodo.org
    txt, zip
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristina Vrouwenvelder; Kristina Vrouwenvelder; Natalie Raia; Natalie Raia (2025). Reference Manager Data Citation Analysis [Dataset]. http://doi.org/10.5281/zenodo.14058399
    Explore at:
    txt, zipAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kristina Vrouwenvelder; Kristina Vrouwenvelder; Natalie Raia; Natalie Raia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DESCRIPTION:

    This package contains data used to analyze citation metadata completeness and correctness for several common reference managers used in scholarly research and several common repositories in the Earth, space, and environmental sciences.

    METHODS:

    Metadata fields for import and export methods and for 8 metadata fields (authors/creators, publisher, DOI, dataset title, version, access date, publication date, and resource type) were collected from reference managers via all import methods available (app or wizard and plugin) during summer 2024 from most recent software versions of all. To encode data, citation information for each dataset as imported by Reference Manager was compared to that registered for the DOI with DataCite. Correct metadata for each of 8 fields for both import and export was encoded as 0, incorrect as 1, and missing as '' or nan. See publication and software package for more information.

    FILES:

    FOLDER 'coded-data' contains files that include information (DOIs) about the data examined in this study, preserved copies of exported data citations used in the data interpretation and processing, and the processed data itself encoded in columns.

    FOLDER 'datacite-metadata-profiles' includes the raw metadata from each dataset DOI at the time of analysis, included for reproducibility purposes.

    See README file for more information.

  15. Data from: VIS - Fishes in estuarine waters in Flanders, Belgium

    • gbif.org
    • gimi9.com
    • +3more
    Updated Apr 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet; Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet (2021). VIS - Fishes in estuarine waters in Flanders, Belgium [Dataset]. http://doi.org/10.15468/estwpt
    Explore at:
    Dataset updated
    Apr 13, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Research Institute for Nature and Forest (INBO)
    Authors
    Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet; Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Apr 1, 1995 - Nov 27, 2012
    Area covered
    Description

    VIS - Fishes in estuarine waters in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO) and described in Brosens et al. 2015 (https://doi.org/10.3897/zookeys.475.8556). The dataset contains over 70,000 fish occurrences sampled between 1992 and 2012 from almost 50 locations in the estuaries of the river Yser and the river Scheldt, in Flanders, Belgium. The dataset includes 69 fish species, as well as a number of non-target crustacean species. The data are retrieved from the Fish Information System (VIS), a database set up to monitor the status of fishes and their habitats in Flanders and are collected in support of the Water Framework Directive, the Habitat Directive, certain red lists, and biodiversity research. Additional information, such as measurements, absence information and abiotic data are available upon request. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/vis-estuarine-occurrences

    Length and weight measurement data of the individual fishes, absence information, occurrence data since 2013, as well as abiotic data of the sampling points (pH, temperature, etc.) are not included in the Darwin Core Archive and are available upon request.

    To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/estwpt) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.

  16. Global Reference Tables Services

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Global Reference Tables Services [Dataset]. https://catalog.data.gov/dataset/global-reference-tables-services
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Social Security Administrationhttp://www.ssa.gov/
    Description

    This database is a collection of reference tables that store common information used throughout SSA but require an application or data service to access the data because of the complexity of the data or business logic required to utilize the data appropriately.

  17. Data from: Loopkevers Grensmaas - Ground beetles near the river Meuse in...

    • gbif.org
    • metadata.vlaanderen.be
    • +2more
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet (2021). Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium [Dataset]. http://doi.org/10.15468/hy3pzl
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Research Institute for Nature and Forest (INBO)
    Authors
    Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Aug 25, 1998 - Oct 4, 1999
    Area covered
    Description

    Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 5,800 beetle occurrences sampled between 1998 and 1999 from 28 locations on the left bank (Belgium) of the river Meuse on the border between Belgium and the Netherlands. The dataset includes over 100 ground beetles species (Carabidae) and some non-target species. The data were used to assess the dynamics of the Grensmaas area and to help river management. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/kevers-grensmaas-occurrences

    To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/hy3pzl) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.

  18. S

    Core Reference Data

    • six-group.com
    Updated Apr 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIX Group (2020). Core Reference Data [Dataset]. https://www.six-group.com/en/products-services/financial-information/display-and-delivery-capabilities/sixflex/core-reference-data.html
    Explore at:
    Dataset updated
    Apr 19, 2020
    Dataset provided by
    SIX Group
    Area covered
    Global
    Description

    The need for accurate, timely and complete Reference Data is vital for the efficient functioning of the financial ecosystem. With our Core Reference Data service, receive extensive information on a predefined set of data points across a broad range of asset classes to support the maintenance of your securities master database. The information on key data attributes enables compliance with regulatory & risk management requirements while maximizing operation efficiency.

  19. d

    Automated Reference Toolset (ART)—Data

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Automated Reference Toolset (ART)—Data [Dataset]. https://catalog.data.gov/dataset/automated-reference-toolset-artdata
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    These environmental raster covariate, geospatial vector data, and tabular data were compiled as input data for the Automated Reference Toolset (ART) algorithm.

  20. DrugBank Database Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). DrugBank Database Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/drugbank-database-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    DrugBank Vocabulary contains information on DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project. DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. DrugBank is widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
Organization logo

Citations to software and data in Zenodo via open sources

Explore at:
csvAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.

Search
Clear search
Close search
Google apps
Main menu