100+ datasets found

Citations to software and data in Zenodo via open sources
zenodo.org
data.niaid.nih.gov
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3482927
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.
POCI CSV dataset of all the citation data
figshare.com
zip
Updated Dec 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21776351.v1
Dataset updated
Dec 27, 2022
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

This version of the dataset contains:

717,654,703 citations; 26,024,862 bibliographic resources.

The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.
d
PLOS ONE publication and citation data
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Jul 23, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Petersen (2018). PLOS ONE publication and citation data [Dataset]. http://doi.org/10.6071/M39W8V
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6071/M39W8V
Dataset updated
Jul 23, 2018
Dataset provided by
Dryad
Authors
Alexander Petersen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2018
Description
Data enclosed in a single zipped folder:

A) DASH-V2 : Data files for final published analysis (J. Informetrics, 2019)

File A1: PubData_DOI_141986_Nc_0_2019.dta

File A2: PubData_DOI_141986_Nc_0_2019_DOFILE

B) DASH-V1 : Data files for preprint version (https://ssrn.com/abstract=2901272)

File B1: PubData_Obs_102741_Nc_10_No2015_CitationsAnalysis.dta

File B2: PubData_Obs_128734_Nc_10_AcceptanceTimeAnalysis.dta

File B3: STATA13_DOFILE

C) Data description common to all .dta files, which contain parsed and merged PLOS ONE and Web of Science metadata:

File A3: UC-DASH_DataDescription_Petersen_V2.pdf

File B4: UC-DASH_DataDescription_Petersen_V1.pdf
d
Louisville Metro KY - Uniform Citation Data 2020
catalog.data.gov
data.lojic.org
+3more
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
d
Louisville Metro KY - Uniform Citation Data 2023
catalog.data.gov
data.louisvilleky.gov
+4more
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2023 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2023
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
iCite Database Snapshot 2021-11
nih.figshare.com
bin
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque (2023). iCite Database Snapshot 2021-11 [Dataset]. http://doi.org/10.35092/yhjc.17108351.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.35092/yhjc.17108351.v2
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov
d
Louisville Metro KY - Uniform Citation Data 2022
catalog.data.gov
data.lojic.org
+3more
Updated Apr 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2022
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Data from: BIP! NDR (NoDoiRefs): a dataset of citations from papers without...
zenodo.org
application/gzip
Updated Feb 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paris Koloveas; Paris Koloveas; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Christos Tryfonopoulos; Christos Tryfonopoulos; Thanasis Vergoulis; Thanasis Vergoulis (2024). BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops [Dataset]. http://doi.org/10.5281/zenodo.10651965
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10651965
Dataset updated
Feb 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paris Koloveas; Paris Koloveas; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Christos Tryfonopoulos; Christos Tryfonopoulos; Thanasis Vergoulis; Thanasis Vergoulis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation has created a void in available data.

BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~2.9M citations made by approximately 171K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI.

File Structure:

The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming.

Each JSON object has three main fields:

“_id”: a unique identifier,

“citing_paper”, the “dblp_id” of the citing paper,

“cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields:

“dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present.

“doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present.

“bibliographic_reference”: the raw citation string as it appears in the citing paper.

Changes from previous version:

Added more papers from DBLP.
H
GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String...
dataverse.harvard.edu
search.dataone.org
Updated Dec 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Grennan; Martin Schibel; Andrew Collins; Joeran Beel (2019). GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [Data] [Dataset]. http://doi.org/10.7910/DVN/LXQXAO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LXQXAO
Dataset updated
Dec 9, 2019
Dataset provided by
Harvard Dataverse
Authors
Mark Grennan; Martin Schibel; Andrew Collins; Joeran Beel
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/LXQXAOhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/LXQXAO
Description
Extracting and parsing reference strings from research articles is a challenging task. State-of-the-art tools like GROBID apply rather simple machine learning models such as conditional random fields (CRF). Recent research has shown a high potential of deep-learning for reference string parsing. The challenge with deep learning is, however, that the training step requires enormous amounts of labeled data – which does not exist for reference string parsing. Creating such a large dataset manually, through human labor, seems hardly feasible. Therefore, we created GIANT. GIANT is a large dataset with 991,411,100 XML labeled reference strings. The strings were automatically created based on 677,000 entries from CrossRef, 1,500 citation styles in the citation-style language, and the citation processor citeproc-js. GIANT can be used to train machine learning models, particularly deep learning models, for citation parsing. While we have not yet tested GIANT for training such models, we hypothesise that the dataset will be able to significantly improve the accuracy of citation parsing. The dataset and code to create it, are freely available at https://github.com/BeelGroup/.
Source Reference File
catalog.data.gov
data.wu.ac.at
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Source Reference File [Dataset]. https://catalog.data.gov/dataset/source-reference-file
Explore at:
Dataset updated
Mar 8, 2025
Dataset provided by
Social Security Administrationhttp://www.ssa.gov/
Description
This file contains a national set of names and contact information for doctors, hospitals, clinics, and other facilities (known collectively as sources) from which medical evidence of record (MER) may be requested to support a claimant's disability application.
Z
Data from: On the Use of Context for Predicting Citation Worthiness of...
data.niaid.nih.gov
zenodo.org
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Haimin (2021). On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4651553
Explore at:
Dataset updated
Apr 9, 2021
Dataset provided by
Mahata, Debanjan
Gosangi, Rakesh
Gheisarieha, Mohsen
Zhang, Haimin
Arora, Ravneet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ACL-cite dataset was created for the paper: “On the Use of Context for Predicting Citation Worthiness of Sentences in Scholarly Articles” published in NAACL 2021. This dataset contains over 2.7 million sentences extracted from scholarly articles (from ACL Anthology [Bird et al.]) and their corresponding citation worthiness labels. The goal of the citation worthiness task is to determine whether a given sentence requires a citation.

There are three CSV files in the dataset:

train.csv: 1,625,268 rows

dev.csv: 539,085 rows

test.csv: 542,081 rows

Each CSV file contains the following columns:

document_id: identifier of the paper the sentence was extracted from

section: name of the section the sentence was extracted from, (e.g. Abstract, Introduction, etc.)

section_id: sequential identifier of the section in the paper

paragraph_id: sequential identifier of the paragraph the sentence was extracted from

sentence: the sentence with the citations removed

raw_sentence: the raw sentence including the citations

sentence_id: sequential identifier of the sentence in the paper

label: citation worthiness label

Note: The train/dev/test splits are done at the document_id level.
l
Louisville Metro KY - Uniform Citation Data (2016-2019)
data.lojic.org
s.cnmilf.com
+5more
Updated Jun 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2022). Louisville Metro KY - Uniform Citation Data (2016-2019) [Dataset]. https://data.lojic.org/datasets/louisville-metro-ky-uniform-citation-data-2016-2019/about
Explore at:
Dataset updated
Jun 2, 2022
Dataset authored and provided by
Louisville/Jefferson County Information Consortium
License
https://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license
Area covered
Kentucky, Louisville
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/The Louisville Metro Police Department (LMPD) began operations on January 6, 2003, as part of the creation of the consolidated city-county government in Louisville, Kentucky. It was formed by the merger of the Jefferson County Police Department and the Louisville Division of Police. The Louisville Metro Police Department is headed by Chief Jacquelyn Gwinn-Villaroel. LMPD divides Jefferson County into eight patrol divisions and operates a number of special investigative and support units.
f
iCite Database Snapshot 2024-11
nih.figshare.com
bin
Updated Dec 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iCite; B. Ian Hutchins; George Santangelo (2024). iCite Database Snapshot 2024-11 [Dataset]. http://doi.org/10.35092/yhjc28027859.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.35092/yhjc28027859.v1
Dataset updated
Dec 16, 2024
Dataset provided by
The NIH Figshare Archive
Authors
iCite; B. Ian Hutchins; George Santangelo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov
Reference Manager Data Citation Analysis
zenodo.org
txt, zip
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristina Vrouwenvelder; Kristina Vrouwenvelder; Natalie Raia; Natalie Raia (2025). Reference Manager Data Citation Analysis [Dataset]. http://doi.org/10.5281/zenodo.14058399
Explore at:
txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14058399
Dataset updated
Mar 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kristina Vrouwenvelder; Kristina Vrouwenvelder; Natalie Raia; Natalie Raia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DESCRIPTION:

This package contains data used to analyze citation metadata completeness and correctness for several common reference managers used in scholarly research and several common repositories in the Earth, space, and environmental sciences.

METHODS:

Metadata fields for import and export methods and for 8 metadata fields (authors/creators, publisher, DOI, dataset title, version, access date, publication date, and resource type) were collected from reference managers via all import methods available (app or wizard and plugin) during summer 2024 from most recent software versions of all. To encode data, citation information for each dataset as imported by Reference Manager was compared to that registered for the DOI with DataCite. Correct metadata for each of 8 fields for both import and export was encoded as 0, incorrect as 1, and missing as '' or nan. See publication and software package for more information.

FILES:

FOLDER 'coded-data' contains files that include information (DOIs) about the data examined in this study, preserved copies of exported data citations used in the data interpretation and processing, and the processed data itself encoded in columns.

FOLDER 'datacite-metadata-profiles' includes the raw metadata from each dataset DOI at the time of analysis, included for reproducibility purposes.

See README file for more information.
Data from: VIS - Fishes in estuarine waters in Flanders, Belgium
gbif.org
gimi9.com
+3more
Updated Apr 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet; Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet (2021). VIS - Fishes in estuarine waters in Flanders, Belgium [Dataset]. http://doi.org/10.15468/estwpt
Explore at:
Unique identifier
https://doi.org/10.15468/estwpt
Dataset updated
Apr 13, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Research Institute for Nature and Forest (INBO)
Authors
Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet; Jan Breine; Hugo Verreycken; Tom De Boeck; Dimitri Brosens; Peter Desmet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Apr 1, 1995 - Nov 27, 2012
Area covered

Description
VIS - Fishes in estuarine waters in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO) and described in Brosens et al. 2015 (https://doi.org/10.3897/zookeys.475.8556). The dataset contains over 70,000 fish occurrences sampled between 1992 and 2012 from almost 50 locations in the estuaries of the river Yser and the river Scheldt, in Flanders, Belgium. The dataset includes 69 fish species, as well as a number of non-target crustacean species. The data are retrieved from the Fish Information System (VIS), a database set up to monitor the status of fishes and their habitats in Flanders and are collected in support of the Water Framework Directive, the Habitat Directive, certain red lists, and biodiversity research. Additional information, such as measurements, absence information and abiotic data are available upon request. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/vis-estuarine-occurrences

Length and weight measurement data of the individual fishes, absence information, occurrence data since 2013, as well as abiotic data of the sampling points (pH, temperature, etc.) are not included in the Darwin Core Archive and are available upon request.

To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/estwpt) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
Global Reference Tables Services
catalog.data.gov
data.amerigeoss.org
+1more
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Global Reference Tables Services [Dataset]. https://catalog.data.gov/dataset/global-reference-tables-services
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Social Security Administrationhttp://www.ssa.gov/
Description
This database is a collection of reference tables that store common information used throughout SSA but require an application or data service to access the data because of the complexity of the data or business logic required to utilize the data appropriately.
Data from: Loopkevers Grensmaas - Ground beetles near the river Meuse in...
gbif.org
metadata.vlaanderen.be
+2more
Updated Apr 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet (2021). Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium [Dataset]. http://doi.org/10.15468/hy3pzl
Explore at:
Unique identifier
https://doi.org/10.15468/hy3pzl
Dataset updated
Apr 1, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Research Institute for Nature and Forest (INBO)
Authors
Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Aug 25, 1998 - Oct 4, 1999
Area covered

Description
Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 5,800 beetle occurrences sampled between 1998 and 1999 from 28 locations on the left bank (Belgium) of the river Meuse on the border between Belgium and the Netherlands. The dataset includes over 100 ground beetles species (Carabidae) and some non-target species. The data were used to assess the dynamics of the Grensmaas area and to help river management. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/kevers-grensmaas-occurrences

To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/hy3pzl) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
S
Core Reference Data
six-group.com
Updated Apr 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SIX Group (2020). Core Reference Data [Dataset]. https://www.six-group.com/en/products-services/financial-information/display-and-delivery-capabilities/sixflex/core-reference-data.html
Explore at:
Dataset updated
Apr 19, 2020
Dataset provided by
SIX Group
Area covered
Global
Description
The need for accurate, timely and complete Reference Data is vital for the efficient functioning of the financial ecosystem. With our Core Reference Data service, receive extensive information on a predefined set of data points across a broad range of asset classes to support the maintenance of your securities master database. The information on key data attributes enables compliance with regulatory & risk management requirements while maximizing operation efficiency.
d
Automated Reference Toolset (ART)—Data
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Automated Reference Toolset (ART)—Data [Dataset]. https://catalog.data.gov/dataset/automated-reference-toolset-artdata
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
These environmental raster covariate, geospatial vector data, and tabular data were compiled as input data for the Automated Reference Toolset (ART) algorithm.
DrugBank Database Data Package
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). DrugBank Database Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/drugbank-database-data-package/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Description
DrugBank Vocabulary contains information on DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project. DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. DrugBank is widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927

Citations to software and data in Zenodo via open sources

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3482927

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.

Clear search

Close search

Google apps

Main menu

Citations to software and data in Zenodo via open sources

POCI CSV dataset of all the citation data

PLOS ONE publication and citation data

Louisville Metro KY - Uniform Citation Data 2020

Louisville Metro KY - Uniform Citation Data 2023

iCite Database Snapshot 2021-11

Louisville Metro KY - Uniform Citation Data 2022

Data from: BIP! NDR (NoDoiRefs): a dataset of citations from papers without...

GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String...

Source Reference File

Data from: On the Use of Context for Predicting Citation Worthiness of...

Louisville Metro KY - Uniform Citation Data (2016-2019)

iCite Database Snapshot 2024-11

Reference Manager Data Citation Analysis

Data from: VIS - Fishes in estuarine waters in Flanders, Belgium

Global Reference Tables Services

Data from: Loopkevers Grensmaas - Ground beetles near the river Meuse in...

Core Reference Data

Automated Reference Toolset (ART)—Data

DrugBank Database Data Package

Citations to software and data in Zenodo via open sources