100+ datasets found

Using Open Citation Databases for Snowballing in Software Engineering...
zenodo.org
data.niaid.nih.gov
bin, csv, pdf, zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leif Bonorden; Leif Bonorden (2024). Using Open Citation Databases for Snowballing in Software Engineering Research [Dataset]. http://doi.org/10.5281/zenodo.7938497
Explore at:
csv, bin, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7938497
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Leif Bonorden; Leif Bonorden
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for our study on the coverage of software engineering articles in open citation databases:

a list of the 23 sampled venues with their respective CORE ranks and publishers,

01-venues.csv,

a list of the 204 sampled articles with their respective number of references/citations per citation database,

02-articles.csv (articles with publication information),

03-references-absolute.csv (number of references in published PDF & absolute numbers for reference coverage in databases),

04-references-relative.csv (relative numbers for reference coverage in databases),

05-citations-absolute.csv (absolute numbers for citation coverage in databases),

06-citations relative.csv (relative numbers for citation coverage in databases),

a list of the 8 articles analyzed in more detail with complete references data from the citation databases,

07-selected-articles.csv (articles with publication information),

08A–08H (comparison of references found in databases for each article),

and additional statistical measures and plots

09-Statistics.{pdf,xlsx} (statistical measures – i.e., minimum, maximum, median, average, variance – for the whole dataset and for subsets by publisher, CORE rank, or year of publication),

10-Figures.zip (figures for references as shown in the study and additional figures for citations – each in EPS and PNG format).
Citation impact of linking to data
figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bertil Fabricius Dorch (2023). Citation impact of linking to data [Dataset]. http://doi.org/10.6084/m9.figshare.105151.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.105151.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Bertil Fabricius Dorch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Graph from 2012 preprint covering only years 2000 - 2010 (use newer version 2000-2015): The Citation Advantage of papers that links to data as a function of the year of publication as registered in ADS (defined as the ratio of the average number of citations per year to papers with links to data, and the average number of citations per year to papers without such links).
Toxicity Reference Database
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+2more
Updated Dec 3, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) - National Center for Computational Toxicology (NCCT) (2020). Toxicity Reference Database [Dataset]. https://catalog.data.gov/dataset/toxicity-reference-database
Explore at:
Dataset updated
Dec 3, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
National Center for Computational Toxicology
Description
The Toxicity Reference Database (ToxRefDB) contains approximately 30 years and $2 billion worth of animal studies. ToxRefDB allows scientists and the interested public to search and download thousands of animal toxicity testing results for hundreds of chemicals that were previously found only in paper documents. Currently, there are 474 chemicals in ToxRefDB, primarily the data rich pesticide active ingredients, but the number will continue to expand.
iCite Database Snapshot 2025-01
nih.figshare.com
bin
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque (2025). iCite Database Snapshot 2025-01 [Dataset]. http://doi.org/10.35092/yhjc28360103.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.35092/yhjc28360103.v1
Dataset updated
Feb 7, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov
m
World’s Top 2% of Scientists list by Stanford University: An Analysis of its...
data.mendeley.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JOHN Philip (2023). World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness [Dataset]. http://doi.org/10.17632/td6tdp4m6t.1
Explore at:
Unique identifier
https://doi.org/10.17632/td6tdp4m6t.1
Dataset updated
Nov 17, 2023
Authors
JOHN Philip
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.

To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below: Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.
Data from: Standards Incorporated by Reference (SIBR) Database
catalog.data.gov
data.nist.gov
+2more
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Standards Incorporated by Reference (SIBR) Database [Dataset]. https://catalog.data.gov/dataset/standards-incorporated-by-reference-sibr-database
Explore at:
Dataset updated
Sep 30, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a searchable historical collection of standards referenced in regulations - Voluntary consensus standards, government-unique standards, industry standards, and international standards referenced in the Code of Federal Regulations (CFR).
d
Louisville Metro KY - Uniform Citation Data 2022
catalog.data.gov
data.lojic.org
+5more
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2025). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2022
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates on or around July 17th, 2024.A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
f
iCite Database Snapshot 2024-04
nih.figshare.com
bin
Updated May 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iCite; B. Ian Hutchins; George Santangelo (2024). iCite Database Snapshot 2024-04 [Dataset]. http://doi.org/10.35092/yhjc25765794.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.35092/yhjc25765794.v1
Dataset updated
May 9, 2024
Dataset provided by
The NIH Figshare Archive
Authors
iCite; B. Ian Hutchins; George Santangelo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov
Patent citation data for USPTO utility patents granted between 1976-2015 and...
zenodo.org
csv
Updated Jun 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giorgio Triulzi; Giorgio Triulzi (2020). Patent citation data for USPTO utility patents granted between 1976-2015 and for patents belonging to 30 technology domains [Dataset]. http://doi.org/10.5281/zenodo.3902550
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3902550
Dataset updated
Jun 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giorgio Triulzi; Giorgio Triulzi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These two data file contains information on patent citations for USPTO utility patents granted between 1976 and 2015 and for patents that have been classified in 30 specific technology domains.

The file 'CITATION_INFO_no_neg_citlag.csv' is generated combining raw data freely dowloadable from patentsview.org from which citations where the filing year of the citing patent is younger than the filing year of the cited one have been removed.

The file 'CITATIONS_DOMAINS.csv' is a sample of the previous file that only includes citations made by patents belonging to one of 30 domains defined in the paper 'Estimating technology performance improvement rates by mining patent data' by Giorgio Triulzi, Jeff Alstott and Chris Magee.

These two files complement another dataset published on Mendeley Data. The two datasets can be used, together with the code published on GitHub, to replicate the main results from the paper.
s
Citation Trends for "Extending the SAND Spatial Database System for the...
shibatadb.com
Updated Dec 20, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Dec 20, 2005
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2007 - 2020
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Extending the SAND Spatial Database System for the Visualization of Three‐Dimensional Scientific Data".
Z
Types, open citations, closed citations, publishers, and participation...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hiebi, Ivan (2020). Types, open citations, closed citations, publishers, and participation reports of Crossref entities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2558257
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Peroni, Silvio
Hiebi, Ivan
Shotton, David
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/.

Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at https://github.com/sosgang/pushing-open-citations-issi2019/blob/master/script/croci_nb.ipynb. The datasets contain the following information.

non_open.zip: it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).

The columns of the CSV file are the following ones:

doi: the DOI of the publication in Crossref;

type: the type of the publication as indicated in Crossref;

cited_by: the number of open citations received by the publication according to COCI;

non_open: the number of closed citations received by the publication according to Crossref + COCI.

croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).

The columns of the CSV file are the following ones:

type: the type publication between "journal", "book", "proceedings", "dataset", "other";

label: the label assigned to the type for visualisation purposes;

coci_open_cit: the number of open citations received by the publication type according to COCI;

crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.

publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:

publisher: the name of the publisher;

doi_prefix: the list of DOI prefixes used assigned by the publisher;

coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;

crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;

total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).

20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.

The columns of the CSV file are the following ones:

publisher: the name of the publisher;

open: the number of publications in Crossref with an 'open' visibility for their reference lists;

limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;

closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;

overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.
F
Full Text Database Report
datainsightsmarket.com
doc, pdf, ppt
Updated Feb 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Full Text Database Report [Dataset]. https://www.datainsightsmarket.com/reports/full-text-database-1964932
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Feb 12, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global full-text database market is projected to grow from XXX million in 2025 to XXX million by 2033, at a CAGR of XX% during the forecast period. The growth is attributed to increasing demand for information retrieval, advancements in technology, and rising need for efficient research and development. Key drivers of the market include growing adoption of digital libraries, rising demand for personalized content, and increasing focus on research and development. Key trends in the full-text database market include the emergence of artificial intelligence (AI) and machine learning (ML) technologies, the growth of open access publishing, and the increasing adoption of cloud-based solutions. The market is segmented by application (academic research, corporate research, legal research, and others) and by type (bibliographic, full-text, and abstract). Major players in the market include John Wiely & Sons, ICPSR, IEEE, EBSCO, UMI, Blackwell, Springer Link, Elsevier Science, Apache Solr, Elastic N.V., CNKI, China Science and Technology Journal Database, Wanfang Data Knowledge Service Platform, China Science Citation Database, and Chinese, Western, Japanese and Russian Journals Joint Directory Database. The market is expected to witness significant growth in emerging economies, such as China and India, due to rising literacy rates and increasing demand for information access.
Data from: The Brill Knowledge Graph: A Database of Bibliographic References...
zenodo.org
bin, txt
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natallia Kokash; Natallia Kokash; Matteo Romanello; Matteo Romanello; Ernest Suyver; Ernest Suyver; Giovanni Colavizza; Giovanni Colavizza (2024). The Brill Knowledge Graph: A Database of Bibliographic References and Index Terms extracted from Books in Humanities and Social Sciences [Dataset]. http://doi.org/10.5281/zenodo.7691771
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7691771
Dataset updated
Mar 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Natallia Kokash; Natallia Kokash; Matteo Romanello; Matteo Romanello; Ernest Suyver; Ernest Suyver; Giovanni Colavizza; Giovanni Colavizza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a complete dataset of linked bibliography and index data, partially disambiguated and augmented with references to external resources, extracted from the Brill’s archive in the field of Classics. Processed book identifiers are listed in a separate text file. Text fragments extracted from different books via this process are then parsed and compared using a string-based similarity metric to form clusters of bibliographic references to the same published work or (variants of) the same subjects discussed in these books. The entire set of references was then disambiguated using Google Books and Crossref APIs.

Paper about extraction pipeline

Paper about extracted KG
B
Citing online references
borealisdata.ca
dataone.org
Updated May 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Topps; Corey Wirun; Nishan Sharma (2019). Citing online references [Dataset]. http://doi.org/10.5683/SP2/80VX7U
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/80VX7U
Dataset updated
May 7, 2019
Dataset provided by
Borealis
Authors
David Topps; Corey Wirun; Nishan Sharma
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Citation of reference material is well established for most traditional sources but remains inconsistent in its application for online resources such as web pages, blog posts and materials generated from underlying database queries. We present some tips on how authors can more effectively cite and archive such resources so they are persistent and sustainable.
Z
Source Data for Manuscript: Identifying genomic data use with the Data...
data.niaid.nih.gov
zenodo.org
Updated Sep 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fagnan, Kjiersten (2024). Source Data for Manuscript: Identifying genomic data use with the Data Citation Explorer [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12802876
Explore at:
Dataset updated
Sep 23, 2024
Dataset provided by
Reddy, TBK
Garrity, George
Salamon, Hugh
Beecroft, Chris
Parker, Charles
Fagnan, Kjiersten
Byers, Neil
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
This page contains the source data for the manuscript describing the Data Citation Explorer, currently in review for publication. The preprint version can be found on this page.

Files:

DCE_manual_eval_sample.xlsx:

This file was used to manually evaluate hits generated by the Data Citation Explorer. There are two separate sheets: one with publications returned by searches in PubMed and PubMed Central and another with publications returned by searches in Dimensions. Column descriptions can be found in the file itself. Each row in each evaluation sheet refers to a pair between a JAMO record and a linked publication.

DCE_citation_report.csv

Contains JAMO record IDs and PubMed IDs from the initial 2020 DCE trial run. There are 238,994 unique JAMO IDs and 30,641 unique PubMed IDs. 78,104 JAMO records are linked with publications.

Columns:

jamo_id - unique JAMO record ID

sample_group - Sample strata from which manually evaluated records were pulled

citation_count - Number of citations associated with each record

citations - comma-delimited PubMed IDs for linked publications

sampled - True/False, denoting which records were included in the initial evaluation sample

notes - descriptions for why certain sampled records were excluded from manual evaluation

unprocessed - True/False. These 7,890 records contained anomalous fields that caused them to be rejected for processing. They are represented as zero-length files in the archive.

DCE_source_files.zip:

This folder contains 3 files for each JAMO record in DCE_citation_report.tsv. For each JAMO record listed in the citation report, three files are provided:

JAMO_ID_source.yaml - The fields extracted from the JAMO record that were relevant to the citation search, including any previously known PMIDs (manually curated).

JAMO_ID_expand.yaml - The source record augmented with additional metadata discovered in other resources, including the citations that were discovered based on querying PubMed Central for the values in those metadata fields.

JAMO_ID_audit.json - The audit path as a directed acyclic graph, in JSON.
o
Data from: Google Scholar as a source for citation and impact analysis for a...
explore.openaire.eu
Updated Jan 1, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. A. Sanni; A. N. Zainab (2010). Google Scholar as a source for citation and impact analysis for a non-ISI indexed medical journal [Dataset]. https://explore.openaire.eu/search/other?orpId=od_124::c09a7638b07750d68773c1f5f9f7b686
Explore at:
Dataset updated
Jan 1, 2010
Authors
S. A. Sanni; A. N. Zainab
Description
It is difficult to determine the influence and impact of journals which are not covered by the ISI databases and Journal Citation Report. However, with the availability of databases such as MyAIS (Malaysian Abstracting and Indexing System), which offers sufficient information to support bibliometric analysis as well as being indexed by Google Scholar which provides citation information, it has become possible to obtain productivity, citation and impact information for non-ISI indexed journals. The bibliometric tool Harzing's Publish and Perish was used to collate citation information from Google scholar. The study examines article productivity, the citations obtained by articles and calculates the impact factor of Medical Journal of Malaysia (MJM) published between 2004 and 2008. MJM is the oldest medical journal in Malaysia and the unit of analysis is 580 articles. The results indicate that once a journal is covered by MyAIS it becomes visible and accessible on the Web because Google Scholarindexes MyAIS. The results show that contributors to MJM were mainly Malaysian (91) and the number of Malaysian-Foreign collaborated papers were very small (28 articles, 4.8). However, citation information from Google scholar indicates that out of the 580 articles, 76.8 (446) have been cited over the 5-year period. The citations were received from both mainstrean foreign as well as Malaysian journals and the top three citors were from China, Malaysia and the United States. In general more citations were received from East Asian countries, Europe, and Southeast Asia. The 2-yearly impact factor calculated for MJM is 0.378 in 2009, 0.367 in 2008, 0.616 in 2007 and 0.456 in 2006. The 5-year impact factor is calculated as 0.577. The results show that although MJM is a Malaysian journal and not ISI indexed its contents have some international significance based on the citations and impact score it receives, indicating the importance of being visible especially in Google scholar.
s
Citation Trends for "The Immune Epitope Database and Analysis Resource...
shibatadb.com
Updated Nov 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2019). Citation Trends for "The Immune Epitope Database and Analysis Resource Program 2003–2018: reflections and outlook" [Dataset]. https://www.shibatadb.com/article/xjTa2NAW
Explore at:
Dataset updated
Nov 25, 2019
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2019 - 2025
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "The Immune Epitope Database and Analysis Resource Program 2003–2018: reflections and outlook".
l
Louisville Metro KY - Uniform Citation Data 2023
data.louisvilleky.gov
s.cnmilf.com
+5more
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2023 [Dataset]. https://data.louisvilleky.gov/datasets/louisville-metro-ky-uniform-citation-data-2023-1/about
Explore at:
Dataset updated
Jan 4, 2023
Dataset authored and provided by
Louisville/Jefferson County Information Consortium
License
https://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license
Area covered
Kentucky, Louisville
Description
Note: Due to a system migration, this data will cease to update on March 14th, 2023. At this time we are updating this dataset manually once per month as resources allow. For real time crime data please utilize communitycrimemap.comA list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
s
Citation Trends for "PDEStrIAn: A Phosphodiesterase Structure and Ligand...
shibatadb.com
Updated Mar 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2016). Citation Trends for "PDEStrIAn: A Phosphodiesterase Structure and Ligand Interaction Annotated Database As a Tool for Structure-Based Drug Design" [Dataset]. https://www.shibatadb.com/article/AfySCbLz
Explore at:
Dataset updated
Mar 18, 2016
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2016 - 2025
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "PDEStrIAn: A Phosphodiesterase Structure and Ligand Interaction Annotated Database As a Tool for Structure-Based Drug Design".
d
Louisville Metro KY - Uniform Citation Data 2020
catalog.data.gov
data.louisvilleky.gov
+3more
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

Facebook

Twitter

Click to copy link

Link copied

Cite

Leif Bonorden; Leif Bonorden (2024). Using Open Citation Databases for Snowballing in Software Engineering Research [Dataset]. http://doi.org/10.5281/zenodo.7938497

Using Open Citation Databases for Snowballing in Software Engineering Research

Explore at:

csv, bin, zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7938497

Dataset updated

Jul 12, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Leif Bonorden; Leif Bonorden

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset for our study on the coverage of software engineering articles in open citation databases:

a list of the 23 sampled venues with their respective CORE ranks and publishers,
- 01-venues.csv,
a list of the 204 sampled articles with their respective number of references/citations per citation database,
- 02-articles.csv (articles with publication information),
- 03-references-absolute.csv (number of references in published PDF & absolute numbers for reference coverage in databases),
- 04-references-relative.csv (relative numbers for reference coverage in databases),
- 05-citations-absolute.csv (absolute numbers for citation coverage in databases),
- 06-citations relative.csv (relative numbers for citation coverage in databases),
a list of the 8 articles analyzed in more detail with complete references data from the citation databases,
- 07-selected-articles.csv (articles with publication information),
- 08A–08H (comparison of references found in databases for each article),
and additional statistical measures and plots
- 09-Statistics.{pdf,xlsx} (statistical measures – i.e., minimum, maximum, median, average, variance – for the whole dataset and for subsets by publisher, CORE rank, or year of publication),
- 10-Figures.zip (figures for references as shown in the study and additional figures for citations – each in EPS and PNG format).

Clear search

Close search

Google apps

Main menu

Using Open Citation Databases for Snowballing in Software Engineering...

Citation impact of linking to data

Toxicity Reference Database

iCite Database Snapshot 2025-01

World’s Top 2% of Scientists list by Stanford University: An Analysis of its...

Data from: Standards Incorporated by Reference (SIBR) Database

Louisville Metro KY - Uniform Citation Data 2022

iCite Database Snapshot 2024-04

Patent citation data for USPTO utility patents granted between 1976-2015 and...

Citation Trends for "Extending the SAND Spatial Database System for the...

Types, open citations, closed citations, publishers, and participation...

Full Text Database Report

Data from: The Brill Knowledge Graph: A Database of Bibliographic References...

Citing online references

Source Data for Manuscript: Identifying genomic data use with the Data...

Data from: Google Scholar as a source for citation and impact analysis for a...

Citation Trends for "The Immune Epitope Database and Analysis Resource...

Louisville Metro KY - Uniform Citation Data 2023

Citation Trends for "PDEStrIAn: A Phosphodiesterase Structure and Ligand...

Louisville Metro KY - Uniform Citation Data 2020

Using Open Citation Databases for Snowballing in Software Engineering Research