Facebook
TwitterThe Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.
The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".
Learn More: https://case.law/download/citation_graph/
Access Limits: https://case.law/api/#limits
This dataset includes citations and metadata for the CAP citation graph in CSV format.
The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.
People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.
Cite Grid is the first visualization we've created based on data from our citation graph.
Have something to share? We're excited to hear about it.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a list of 347 papers published in 2010 and retrieved from the Web of Science, Scopus and Google Scholar. For each paper, the number of citations and the citation date(s) have been collected. If the full-text is available online, the date of "liberation" and the URL of the file have been retrieved as well. The objective was to assess the impact of Open access on citation rate and more particularly the impact before and after full-text "liberation".
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset includes five files. Descriptions of the files are given as follows: FILENAME: PubMed_retracted_publication_full_v3.tsv - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. - PMIDs in the "cited_by" column that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. - Citation contexts that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = tables and table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. FILENAME: 724_knowingly_post_retraction_cit.csv (updated) - The 724 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". - Two citation contexts from retraction notices have been excluded from analyses. ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, table_or_figure_caption = tables and table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. FILENAME: Annotation manual.pdf - The manual for annotating the citation purposes in column 11) of the 724_knowingly_post_retraction_cit.tsv. FILENAME: retraction_notice_PMID.csv (new file added for this version) - A list of 8,346 PMIDs of retraction notices indexed in PubMed (retrieved on August 20, 2020, searched with the query "retraction of publication" [PT] ).
Facebook
Twitterhttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Yearly citation counts for the publication titled "Supporting Data and Services Access in Digital Government Environments".
Facebook
TwitterSentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. Files: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE. • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs. • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable. • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component. • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-". • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence. • progression: Text progression of the citation context/sentence. Supplementary Files • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • Supplementary_File_1.zip – This file contains the code for generating the dataset.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following citation dataset was retrieved from Scopus in June 24, 2017 (3am, Western Indonesian time).
It consists of 3 sets of data based on our searches. Each search was saved both in 'csv' and 'bib':
OA_Africa_inTitle.xxx: "Open Access" AND Africa IN TITLE
OA_Africa_inTitle_inAbstract_inKeywords.xxx: "Open Access" AND Africa IN TITLE, IN ABSTRACT, IN KEYWORDS
OAmovement_Africa_inTitle_inAbstract_inKeywords.xxx: "Open Access movement" AND Africa IN TITLE, IN ABSTRACT, IN KEYWORDS
The access to Scopus was provided by The Central Library of Institut Teknologi Bandung (Indonesia)
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
A list of journals across several subject areas was developed from which to collect article citation data. Citation information and cited reference counts of all the articles published in 2006 and 2009 for these journals were obtained.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Merged PLOS ONE and Web of Science data compiled in .dta files produced by STATA13. Included is a Do-file for reproducing the regression model estimates reported in the pre-print (Tables I and II) and published version (Table 1). Each observation (.dta line) corresponds to a given PLOS ONE article, with various article-level and editor-level characteristics used as explanatory and control variables. This summary provides a brief description of each variable and its source.
If you use this data, please cite: A. M. Petersen. Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE. Journal of Informetrics 13, 100974 (2019). DOI: 10.1016/j.joi.2019.100974
Methods We gathered the citation information for all PLOS ONE articles, indexed by A, from the Web of Science (WOS) Core Collection. From this data we obtained a master list of the unique digital object identifier, DOIA and the number of citations, cA, at the time of the data download (census) date
(a) For the pre-print this corresponds to December 3, 2016;
(b) and for the final published article this corresponds to February 25, 2019.
We then used each DOIA to access the corresponding online XML version of each article at PLOS ONE by visiting the unique web address “http://journals.plos.org/plosone/article?id=” + “DOIA”. After parsing the full-text XML (primarily the author byline data and reference list), we merged the PLOS ONE publication information and WOS citation data by matching on DOIA.
Facebook
TwitterunarXive is a scholarly data set containing publications' full-text, annotated in-text citations, and a citation network.
The data is generated from all LaTeX sources on arXiv and therefore of higher quality than data generated from PDF files.
Typical use cases are
Note: This Zenodo record is an old version of unarXive. You can find the most recent version at https://zenodo.org/record/7752754 and https://zenodo.org/record/7752615
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ D O W N L O A D S A M P L E ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛
To download the whole data set send an access request and note the following:
Note: this Zenodo record is a "full" version of unarXive, which was generated from all of arXiv.org including non-permissively licensed papers. Make sure that your use of the data is compliant with the paper's licensing terms.¹
¹ For information on papers' licenses use arXiv's bulk metadata access.
The code used for generating the data set is publicly available.
Usage examples for our data set are provided at here on GitHub.
This initial version of unarXive is described in the following journal article.
Tarek Saier, Michael Färber: "unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata", Scientometrics, 2020,
[link to an author copy]
The updated version is described in the following conference paper.
Tarek Saier, Michael Färber. "unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network", JCDL 2023.
[link to an author copy]
Facebook
TwitterunarXive is a scholarly data set containing publications' full-text, annotated in-text citations, and a citation network.
The data is generated from all LaTeX sources on arXiv and therefore of higher quality than data generated from PDF files.
Typical use cases are
Note: This Zenodo record is an old version of unarXive. You can find the most recent version at https://zenodo.org/record/7752754 and https://zenodo.org/record/7752615
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ D O W N L O A D S A M P L E ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛
To download the whole data set send an access request and note the following:
Note: this Zenodo record is a "full" version of unarXive, which was generated from all of arXiv.org including non-permissively licensed papers. Make sure that your use of the data is compliant with the paper's licensing terms.¹
¹ For information on papers' licenses use arXiv's bulk metadata access.
The code used for generating the data set is publicly available.
Usage examples for our data set are provided at here on GitHub.
This initial version of unarXive is described in the following journal article.
Tarek Saier, Michael Färber: "unarXive: A Large Scholarly Data Set with Publications' Full-Text, Annotated In-Text Citations, and Links to Metadata", Scientometrics, 2020,
[link to an author copy]
The updated version is described in the following conference paper.
Tarek Saier, Michael Färber. "unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network", JCDL 2023.
[link to an author copy]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the full citation details and DOIs for 85 co-published reporting guidelines, together with the citation counts, number of article accesses and journal impact factor for each article and journal. This represents a total of nine research reporting statements, published across 58 journals in biomedicine.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Grey literature, an area of interest to special librarians and information professionals, can be traced back a half-century. However, grey literature as a specialized field in information studies is less than a decade old. At GL'97 in Luxembourg, grey literature was redefined "as information produced on all levels of government, academics, business and industry in electronic and print formats not controlled by commercial publishers (i.e. where publishing is not the primary activity of the producing body)". The subject area was broadened and the need for continuing research and instruction pursued. The results of an online survey carried out in 2004 compared with survey results a decade prior indicate two changes: (1) a move to more specialization in the field of grey literature and (2) a move to more balance in activities related to research and teaching as compared with the processing and distribution of grey literature. It is not that the activities of processing and distribution are today of less concern, but technological advances and the Internet may have made them less labour intensive. The burden that grey literature poised to human resources and budgets appears to have been reduced enough that the benefits of the content of grey literature is discovered. And this discovery of a wealth of knowledge and information is the onset to further research and instruction in the field of grey literature. This research project is a follow-up or second part of a citation research. The first part was carried out last year and the results were presented in a conference paper at GL6 in New York. Citation analysis is a relatively objective quantitative method and must be carefully implemented (Moed, 2002). Thus, in an effort to expand the results of our initial analysis beyond the realm of the GL Conference Series, an Author Survey will also be implemented in this follow-up study. The empirical data gathered from the online questionnaire will be compared with the updated data from the Citation Database to which the citations in the GL6 Conference Proceedings will have been added. Comparative data from the comprehensive citation database (estimated 1650 records) and the data from the online author survey would then allow for a clearer demonstration of the impact of this research. Where only part of the impact of research is covered by citation analysis alone (Thelwall, 2002). This research will allow for tracking the life of a conference paper as well as the application and use of its content within and outside the grey circuit. Further gain would be a better profile of the GL authors, who are the source of GreyNet's knowledge and information base. This in turn could lead to the subsequent development of services that are in line with the needs of authors and researchers in the field of grey literature. For example, a citation style for grey literature, where special analysis of hyperlinked citations would provide an opportunity to address the problem of the disparity of web-based grey literature in the context of open archives.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, you can fill out the Archer User Information Form. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to Archer Users Group. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [codebook]. Champaign, IL: University of Illinois. doi:10.13012/B2IDB-5649852_V1 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2020. Global News Index and Extracted Features Repository [database]. Champaign, IL: University of Illinois. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V1
Facebook
TwitterThis data supports the paper entitled "Mapping the landscape of geospatial data citations". The dataset covers geospatial data-intensive research papers published between 2015-2018 retrieved using Scopus. The article's citations were assessed for data citation occurances, and coded using a data citation classification. Data were enhanced and linked to subject coverage and journal policy status information using Excel & SPSS. For more information about how the data were created and coded please review the 'Methodology' section of the paper. More information is provided below, including supplemental documentation and related publications. Abstract (paper) ABSTRACT Data citations, similar to article and other research citations, are important references to research data that underlie published research results. In support of open science directives, these citations must adhere to specific conventions in terms of consistency of both placement within an article, and the actual availability or access to research data. To better understand the level to which geospatial research data are currently cited, we undertook a study to analyse the rate of data citation within a set of data-intensive geospatial research articles. After analysing 1717 scholarly articles published between 2015 and 2018, we found that very few, or 78 (5%), meaningfully cited primary or secondary geospatial data sources in the cited references section of the article. Even fewer researchers, only 25 or 1.5%, were found to have cited data using a DOI. Given the relatively low data citation rate, a focus on contributing factors including barriers to citing geospatial data is needed. And while open sharing requirements for geospatial data may change over time, driving data citation as a result, understanding benchmarks for data citation for monitoring purposes is useful.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
Facebook
TwitterIn 2022, publication and data linkages were evaluated using two methods in an effort to understand how a data citation workflow has been implemented by the U.S. Geological Survey (USGS) since the 2016 USGS instructional memorandum, Public Access to Results of Federally Funded Research at the U.S. Geological Survey: Scholarly Publications and Digital Data (USGS OSQI, 2016), went into effect, requiring USGS data be assigned a DOI, be accompanied by a citation, and be referenced from the associated publication (USGS OSQI, 2017). This data release includes data and publication structural metadata results retrieved from the USGS DOI Tool and Crossref APIs and Jupyter notebooks used to process and analyze the results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes a file with results from a survey sent to authors of open access monographs. The survey was available during March–April 2019 and the results are analysed in a paper presented at the 2019 Elpub conference on Jun 2–4 in Marseille, France entitled 'The significant difference in impact – an exploratory study about the meaning and value of metrics for open access monographs'. Version 2 of the dataset has been updated with the slides presented at the conference and the link to the full paper published in the French open archive HAL.
The respondents of the survey were asked to comment on assumptions about bibliometrics and altmetrics currently in practice, and to think about the meaning of such data in relation to their experiences as authors of books published in a digital format and with an open license (i.e. a creative commons license). The survey questionnaire is included as a separate text document. The dataset also includes measures about the usage of open access books published by Stockholm University Press, including information about online usage, mentions in social media and citations. This data is collected from the publisher's platform, the Altmetric.com database, and citation data was collected from Dimensions, Google Scholar, Web of Science and CrossRef. The data was collected in February 2019, except for the figures from the OAPEN Library database, which was collected in November 2018. The paper, including the analysis of these data, is to be published in the Elpub Digital Library. The tables included in the dataset may vary slightly from those in the published paper, due to space restraints in the published version.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Automated monthly plot of citation ratios using OpenAlex data. This includes a PDF visualization and supporting CSV generated from OpenAlex (CC0) data.
Facebook
TwitterThe Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.
The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".
Learn More: https://case.law/download/citation_graph/
Access Limits: https://case.law/api/#limits
This dataset includes citations and metadata for the CAP citation graph in CSV format.
The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.
People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.
Cite Grid is the first visualization we've created based on data from our citation graph.
Have something to share? We're excited to hear about it.