100+ datasets found
  1. Using Open Citation Databases for Snowballing in Software Engineering...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, pdf, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leif Bonorden; Leif Bonorden (2024). Using Open Citation Databases for Snowballing in Software Engineering Research [Dataset]. http://doi.org/10.5281/zenodo.7938497
    Explore at:
    csv, bin, zip, pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Leif Bonorden; Leif Bonorden
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for our study on the coverage of software engineering articles in open citation databases:

    • a list of the 23 sampled venues with their respective CORE ranks and publishers,
      • 01-venues.csv,
    • a list of the 204 sampled articles with their respective number of references/citations per citation database,
      • 02-articles.csv (articles with publication information),
      • 03-references-absolute.csv (number of references in published PDF & absolute numbers for reference coverage in databases),
      • 04-references-relative.csv (relative numbers for reference coverage in databases),
      • 05-citations-absolute.csv (absolute numbers for citation coverage in databases),
      • 06-citations relative.csv (relative numbers for citation coverage in databases),
    • a list of the 8 articles analyzed in more detail with complete references data from the citation databases,
      • 07-selected-articles.csv (articles with publication information),
      • 08A–08H (comparison of references found in databases for each article),
    • and additional statistical measures and plots
      • 09-Statistics.{pdf,xlsx} (statistical measures – i.e., minimum, maximum, median, average, variance – for the whole dataset and for subsets by publisher, CORE rank, or year of publication),
      • 10-Figures.zip (figures for references as shown in the study and additional figures for citations – each in EPS and PNG format).
  2. Citation impact of linking to data

    • figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bertil Fabricius Dorch (2023). Citation impact of linking to data [Dataset]. http://doi.org/10.6084/m9.figshare.105151.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Bertil Fabricius Dorch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Graph from 2012 preprint covering only years 2000 - 2010 (use newer version 2000-2015): The Citation Advantage of papers that links to data as a function of the year of publication as registered in ADS (defined as the ratio of the average number of citations per year to papers with links to data, and the average number of citations per year to papers without such links).

  3. Toxicity Reference Database

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +2more
    Updated Dec 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) - National Center for Computational Toxicology (NCCT) (2020). Toxicity Reference Database [Dataset]. https://catalog.data.gov/dataset/toxicity-reference-database
    Explore at:
    Dataset updated
    Dec 3, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    National Center for Computational Toxicology
    Description

    The Toxicity Reference Database (ToxRefDB) contains approximately 30 years and $2 billion worth of animal studies. ToxRefDB allows scientists and the interested public to search and download thousands of animal toxicity testing results for hundreds of chemicals that were previously found only in paper documents. Currently, there are 474 chemicals in ToxRefDB, primarily the data rich pesticide active ingredients, but the number will continue to expand.

  4. iCite Database Snapshot 2025-01

    • nih.figshare.com
    bin
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque (2025). iCite Database Snapshot 2025-01 [Dataset]. http://doi.org/10.35092/yhjc28360103.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    iCite; B. Ian Hutchins; George Santangelo; Ehsanul Haque
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov

  5. m

    World’s Top 2% of Scientists list by Stanford University: An Analysis of its...

    • data.mendeley.com
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JOHN Philip (2023). World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness [Dataset]. http://doi.org/10.17632/td6tdp4m6t.1
    Explore at:
    Dataset updated
    Nov 17, 2023
    Authors
    JOHN Philip
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.

    To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below: Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
    In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.

  6. Data from: Standards Incorporated by Reference (SIBR) Database

    • catalog.data.gov
    • data.nist.gov
    • +2more
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). Standards Incorporated by Reference (SIBR) Database [Dataset]. https://catalog.data.gov/dataset/standards-incorporated-by-reference-sibr-database
    Explore at:
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This is a searchable historical collection of standards referenced in regulations - Voluntary consensus standards, government-unique standards, industry standards, and international standards referenced in the Code of Federal Regulations (CFR).

  7. d

    Louisville Metro KY - Uniform Citation Data 2022

    • catalog.data.gov
    • data.lojic.org
    • +5more
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2025). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2022
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates on or around July 17th, 2024.A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  8. f

    iCite Database Snapshot 2024-04

    • nih.figshare.com
    bin
    Updated May 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iCite; B. Ian Hutchins; George Santangelo (2024). iCite Database Snapshot 2024-04 [Dataset]. http://doi.org/10.35092/yhjc25765794.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 9, 2024
    Dataset provided by
    The NIH Figshare Archive
    Authors
    iCite; B. Ian Hutchins; George Santangelo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articlesOpen Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation CollectionDefinitions for individual data fields:pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicinedoi: Digital Object Identifier, if availableyear: Year the article was publishedtitle: Title of the articleauthors: List of author namesjournal: Journal name (ISO abbreviation)is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research articlerelative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.citation_count: Number of unique articles that have cited this onecitations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)x_coord: X coordinate of the article on the Triangle of Biomediciney_coord: Y Coordinate of the article on the Triangle of Biomedicineis_clinical: Flag indicating that this paper meets the definition of a clinical article.cited_by_clin: PMIDs of clinical articles that this article has been cited by.apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.cited_by: PMIDs of articles that have cited this one.references: PMIDs of articles in this article's reference list.Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.Comments and questions can be addressed to iCite@mail.nih.gov

  9. Patent citation data for USPTO utility patents granted between 1976-2015 and...

    • zenodo.org
    csv
    Updated Jun 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giorgio Triulzi; Giorgio Triulzi (2020). Patent citation data for USPTO utility patents granted between 1976-2015 and for patents belonging to 30 technology domains [Dataset]. http://doi.org/10.5281/zenodo.3902550
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 21, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giorgio Triulzi; Giorgio Triulzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These two data file contains information on patent citations for USPTO utility patents granted between 1976 and 2015 and for patents that have been classified in 30 specific technology domains.

    The file 'CITATION_INFO_no_neg_citlag.csv' is generated combining raw data freely dowloadable from patentsview.org from which citations where the filing year of the citing patent is younger than the filing year of the cited one have been removed.

    The file 'CITATIONS_DOMAINS.csv' is a sample of the previous file that only includes citations made by patents belonging to one of 30 domains defined in the paper 'Estimating technology performance improvement rates by mining patent data' by Giorgio Triulzi, Jeff Alstott and Chris Magee.

    These two files complement another dataset published on Mendeley Data. The two datasets can be used, together with the code published on GitHub, to replicate the main results from the paper.

  10. s

    Citation Trends for "Extending the SAND Spatial Database System for the...

    • shibatadb.com
    Updated Dec 20, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Dec 20, 2005
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2007 - 2020
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Extending the SAND Spatial Database System for the Visualization of Three‐Dimensional Scientific Data".

  11. Z

    Types, open citations, closed citations, publishers, and participation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hiebi, Ivan (2020). Types, open citations, closed citations, publishers, and participation reports of Crossref entities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2558257
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Peroni, Silvio
    Hiebi, Ivan
    Shotton, David
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/.

    Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at https://github.com/sosgang/pushing-open-citations-issi2019/blob/master/script/croci_nb.ipynb. The datasets contain the following information.

    non_open.zip: it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).

    The columns of the CSV file are the following ones:

    doi: the DOI of the publication in Crossref;

    type: the type of the publication as indicated in Crossref;

    cited_by: the number of open citations received by the publication according to COCI;

    non_open: the number of closed citations received by the publication according to Crossref + COCI.

    croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).

    The columns of the CSV file are the following ones:

    type: the type publication between "journal", "book", "proceedings", "dataset", "other";

    label: the label assigned to the type for visualisation purposes;

    coci_open_cit: the number of open citations received by the publication type according to COCI;

    crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.

    publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:

    publisher: the name of the publisher;

    doi_prefix: the list of DOI prefixes used assigned by the publisher;

    coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;

    crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;

    total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).

    20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.

    The columns of the CSV file are the following ones:

    publisher: the name of the publisher;

    open: the number of publications in Crossref with an 'open' visibility for their reference lists;

    limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;

    closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;

    overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.

  12. F

    Full Text Database Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Full Text Database Report [Dataset]. https://www.datainsightsmarket.com/reports/full-text-database-1964932
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global full-text database market is projected to grow from XXX million in 2025 to XXX million by 2033, at a CAGR of XX% during the forecast period. The growth is attributed to increasing demand for information retrieval, advancements in technology, and rising need for efficient research and development. Key drivers of the market include growing adoption of digital libraries, rising demand for personalized content, and increasing focus on research and development. Key trends in the full-text database market include the emergence of artificial intelligence (AI) and machine learning (ML) technologies, the growth of open access publishing, and the increasing adoption of cloud-based solutions. The market is segmented by application (academic research, corporate research, legal research, and others) and by type (bibliographic, full-text, and abstract). Major players in the market include John Wiely & Sons, ICPSR, IEEE, EBSCO, UMI, Blackwell, Springer Link, Elsevier Science, Apache Solr, Elastic N.V., CNKI, China Science and Technology Journal Database, Wanfang Data Knowledge Service Platform, China Science Citation Database, and Chinese, Western, Japanese and Russian Journals Joint Directory Database. The market is expected to witness significant growth in emerging economies, such as China and India, due to rising literacy rates and increasing demand for information access.

  13. Data from: The Brill Knowledge Graph: A Database of Bibliographic References...

    • zenodo.org
    bin, txt
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natallia Kokash; Natallia Kokash; Matteo Romanello; Matteo Romanello; Ernest Suyver; Ernest Suyver; Giovanni Colavizza; Giovanni Colavizza (2024). The Brill Knowledge Graph: A Database of Bibliographic References and Index Terms extracted from Books in Humanities and Social Sciences [Dataset]. http://doi.org/10.5281/zenodo.7691771
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Natallia Kokash; Natallia Kokash; Matteo Romanello; Matteo Romanello; Ernest Suyver; Ernest Suyver; Giovanni Colavizza; Giovanni Colavizza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a complete dataset of linked bibliography and index data, partially disambiguated and augmented with references to external resources, extracted from the Brill’s archive in the field of Classics. Processed book identifiers are listed in a separate text file. Text fragments extracted from different books via this process are then parsed and compared using a string-based similarity metric to form clusters of bibliographic references to the same published work or (variants of) the same subjects discussed in these books. The entire set of references was then disambiguated using Google Books and Crossref APIs.

    Paper about extraction pipeline

    Paper about extracted KG

  14. B

    Citing online references

    • borealisdata.ca
    • dataone.org
    Updated May 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Topps; Corey Wirun; Nishan Sharma (2019). Citing online references [Dataset]. http://doi.org/10.5683/SP2/80VX7U
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2019
    Dataset provided by
    Borealis
    Authors
    David Topps; Corey Wirun; Nishan Sharma
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Citation of reference material is well established for most traditional sources but remains inconsistent in its application for online resources such as web pages, blog posts and materials generated from underlying database queries. We present some tips on how authors can more effectively cite and archive such resources so they are persistent and sustainable.

  15. Z

    Source Data for Manuscript: Identifying genomic data use with the Data...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fagnan, Kjiersten (2024). Source Data for Manuscript: Identifying genomic data use with the Data Citation Explorer [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12802876
    Explore at:
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Reddy, TBK
    Garrity, George
    Salamon, Hugh
    Beecroft, Chris
    Parker, Charles
    Fagnan, Kjiersten
    Byers, Neil
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    This page contains the source data for the manuscript describing the Data Citation Explorer, currently in review for publication. The preprint version can be found on this page.

    Files:

    DCE_manual_eval_sample.xlsx:

    This file was used to manually evaluate hits generated by the Data Citation Explorer. There are two separate sheets: one with publications returned by searches in PubMed and PubMed Central and another with publications returned by searches in Dimensions. Column descriptions can be found in the file itself. Each row in each evaluation sheet refers to a pair between a JAMO record and a linked publication.

    DCE_citation_report.csv

    Contains JAMO record IDs and PubMed IDs from the initial 2020 DCE trial run. There are 238,994 unique JAMO IDs and 30,641 unique PubMed IDs. 78,104 JAMO records are linked with publications.

    Columns:

    jamo_id - unique JAMO record ID

    sample_group - Sample strata from which manually evaluated records were pulled

    citation_count - Number of citations associated with each record

    citations - comma-delimited PubMed IDs for linked publications

    sampled - True/False, denoting which records were included in the initial evaluation sample

    notes - descriptions for why certain sampled records were excluded from manual evaluation

    unprocessed - True/False. These 7,890 records contained anomalous fields that caused them to be rejected for processing. They are represented as zero-length files in the archive.

    DCE_source_files.zip:

    This folder contains 3 files for each JAMO record in DCE_citation_report.tsv. For each JAMO record listed in the citation report, three files are provided:

    JAMO_ID_source.yaml - The fields extracted from the JAMO record that were relevant to the citation search, including any previously known PMIDs (manually curated).

    JAMO_ID_expand.yaml - The source record augmented with additional metadata discovered in other resources, including the citations that were discovered based on querying PubMed Central for the values in those metadata fields.

    JAMO_ID_audit.json - The audit path as a directed acyclic graph, in JSON.

  16. o

    Data from: Google Scholar as a source for citation and impact analysis for a...

    • explore.openaire.eu
    Updated Jan 1, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. A. Sanni; A. N. Zainab (2010). Google Scholar as a source for citation and impact analysis for a non-ISI indexed medical journal [Dataset]. https://explore.openaire.eu/search/other?orpId=od_124::c09a7638b07750d68773c1f5f9f7b686
    Explore at:
    Dataset updated
    Jan 1, 2010
    Authors
    S. A. Sanni; A. N. Zainab
    Description

    It is difficult to determine the influence and impact of journals which are not covered by the ISI databases and Journal Citation Report. However, with the availability of databases such as MyAIS (Malaysian Abstracting and Indexing System), which offers sufficient information to support bibliometric analysis as well as being indexed by Google Scholar which provides citation information, it has become possible to obtain productivity, citation and impact information for non-ISI indexed journals. The bibliometric tool Harzing's Publish and Perish was used to collate citation information from Google scholar. The study examines article productivity, the citations obtained by articles and calculates the impact factor of Medical Journal of Malaysia (MJM) published between 2004 and 2008. MJM is the oldest medical journal in Malaysia and the unit of analysis is 580 articles. The results indicate that once a journal is covered by MyAIS it becomes visible and accessible on the Web because Google Scholarindexes MyAIS. The results show that contributors to MJM were mainly Malaysian (91) and the number of Malaysian-Foreign collaborated papers were very small (28 articles, 4.8). However, citation information from Google scholar indicates that out of the 580 articles, 76.8 (446) have been cited over the 5-year period. The citations were received from both mainstrean foreign as well as Malaysian journals and the top three citors were from China, Malaysia and the United States. In general more citations were received from East Asian countries, Europe, and Southeast Asia. The 2-yearly impact factor calculated for MJM is 0.378 in 2009, 0.367 in 2008, 0.616 in 2007 and 0.456 in 2006. The 5-year impact factor is calculated as 0.577. The results show that although MJM is a Malaysian journal and not ISI indexed its contents have some international significance based on the citations and impact score it receives, indicating the importance of being visible especially in Google scholar.

  17. s

    Citation Trends for "The Immune Epitope Database and Analysis Resource...

    • shibatadb.com
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2019). Citation Trends for "The Immune Epitope Database and Analysis Resource Program 2003–2018: reflections and outlook" [Dataset]. https://www.shibatadb.com/article/xjTa2NAW
    Explore at:
    Dataset updated
    Nov 25, 2019
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2019 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "The Immune Epitope Database and Analysis Resource Program 2003–2018: reflections and outlook".

  18. l

    Louisville Metro KY - Uniform Citation Data 2023

    • data.louisvilleky.gov
    • s.cnmilf.com
    • +5more
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2023 [Dataset]. https://data.louisvilleky.gov/datasets/louisville-metro-ky-uniform-citation-data-2023-1/about
    Explore at:
    Dataset updated
    Jan 4, 2023
    Dataset authored and provided by
    Louisville/Jefferson County Information Consortium
    License

    https://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-licensehttps://louisville-metro-opendata-lojic.hub.arcgis.com/pages/terms-of-use-and-license

    Area covered
    Kentucky, Louisville
    Description

    Note: Due to a system migration, this data will cease to update on March 14th, 2023. At this time we are updating this dataset manually once per month as resources allow. For real time crime data please utilize communitycrimemap.comA list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  19. s

    Citation Trends for "PDEStrIAn: A Phosphodiesterase Structure and Ligand...

    • shibatadb.com
    Updated Mar 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2016). Citation Trends for "PDEStrIAn: A Phosphodiesterase Structure and Ligand Interaction Annotated Database As a Tool for Structure-Based Drug Design" [Dataset]. https://www.shibatadb.com/article/AfySCbLz
    Explore at:
    Dataset updated
    Mar 18, 2016
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2016 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "PDEStrIAn: A Phosphodiesterase Structure and Ligand Interaction Annotated Database As a Tool for Structure-Based Drug Design".

  20. d

    Louisville Metro KY - Uniform Citation Data 2020

    • catalog.data.gov
    • data.louisvilleky.gov
    • +3more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Leif Bonorden; Leif Bonorden (2024). Using Open Citation Databases for Snowballing in Software Engineering Research [Dataset]. http://doi.org/10.5281/zenodo.7938497
Organization logo

Using Open Citation Databases for Snowballing in Software Engineering Research

Explore at:
csv, bin, zip, pdfAvailable download formats
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Leif Bonorden; Leif Bonorden
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset for our study on the coverage of software engineering articles in open citation databases:

  • a list of the 23 sampled venues with their respective CORE ranks and publishers,
    • 01-venues.csv,
  • a list of the 204 sampled articles with their respective number of references/citations per citation database,
    • 02-articles.csv (articles with publication information),
    • 03-references-absolute.csv (number of references in published PDF & absolute numbers for reference coverage in databases),
    • 04-references-relative.csv (relative numbers for reference coverage in databases),
    • 05-citations-absolute.csv (absolute numbers for citation coverage in databases),
    • 06-citations relative.csv (relative numbers for citation coverage in databases),
  • a list of the 8 articles analyzed in more detail with complete references data from the citation databases,
    • 07-selected-articles.csv (articles with publication information),
    • 08A–08H (comparison of references found in databases for each article),
  • and additional statistical measures and plots
    • 09-Statistics.{pdf,xlsx} (statistical measures – i.e., minimum, maximum, median, average, variance – for the whole dataset and for subsets by publisher, CORE rank, or year of publication),
    • 10-Figures.zip (figures for references as shown in the study and additional figures for citations – each in EPS and PNG format).
Search
Clear search
Close search
Google apps
Main menu