100+ datasets found

o
Citation Knowledge with Section and Context
ordo.open.ac.uk
zip
Updated May 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.21954/ou.rd.11346848.v1
Dataset updated
May 5, 2020
Dataset provided by
The Open University
Authors
Anita Khadka
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).
COCI CSV dataset of all the citation data
figshare.com
bin
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2023). COCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.6741422.v19
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6741422.v19
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in COCI, released on 23 January 2023. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the DOI of the citing entity; [field "cited"] the DOI of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

This version of the dataset contains:

1,463,920,523 citations; 77,045,952 bibliographic resources.

The size of the zipped archive is 37.5 GB, while the size of the unzipped CSV file is 238.5 GB.

Additional information about COCI can be found at the official webpage.
citing-dataset-elements
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ian Mulvany (2023). citing-dataset-elements [Dataset]. http://doi.org/10.6084/m9.figshare.1088363.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1088363.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ian Mulvany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repo contains a csv file that summarises guidance provided by a number of organisations around what features or information should be provided for citing data. By inferring across these sources we hope to find the most commonly suggested features that can be useful for citing data. We hope to use this resource to form the basis of a recommendation on how to use JATS to cite data, and there is a piece of work remaining This work was produced as a part of the NISO data-citation workshop that took place in London in June 2014.
Data Citation Corpus Data File
redivis.com
application/jsonl +7
Updated Nov 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redivis Demo Organization (2024). Data Citation Corpus Data File [Dataset]. https://redivis.com/datasets/am5t-e9jvcn6s5
Explore at:
sas, avro, parquet, spss, csv, arrow, application/jsonl, stataAvailable download formats
Dataset updated
Nov 8, 2024
Dataset provided by
Redivis Inc.
Authors
Redivis Demo Organization
Time period covered
Jan 1, 1839 - Oct 1, 2115
Description
Abstract

Data file for the first release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust.

Methodology

The original data files includes 10,006,058 data citation records in JSON and CSV formats. The JSON file is the version of record.

All CSVs from version 1.1 of the corpus have been uploaded to Redivis and combined into a single table.

The data citations in the file originate from DataCite Event Data and a project by Chan Zuckerberg Initiative (CZI) to identify mentions to datasets in the full text of articles.

Each data citation record is comprised of:

A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication object (journal article or preprint) in which the dataset is cited

Metadata for the cited dataset and for the citing publication object

%3C!-- --%3E
I
Curated Open Citations Dataset
databank.illinois.edu
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitriy Korobskiy; George Chacko (2025). Curated Open Citations Dataset [Dataset]. http://doi.org/10.13012/B2IDB-6389862_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-6389862_V1
Dataset updated
Jan 22, 2025
Authors
Dmitriy Korobskiy; George Chacko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is derived from the COCI, the OpenCitations Index of Crossref open DOI-to-DOI references (opencitations.net). Silvio Peroni, David Shotton (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies, 1(1): 428-444. https://doi.org/10.1162/qss_a_00023 We have curated it to remove duplicates, self-loops, and parallel edges. These data were copied from the Open Citations website on May 6, 2023 and subsequently processed to produce a node list and an edge-list. Integer_ids have been assigned to the DOIs to reduce memory and storage needs when working with these data. As noted on the Open Citation website, each record is a citing-cited pair that uses DOIs as persistent identifiers.
Data from: PatCit: A Comprehensive Dataset of Patent Citations
search.datacite.org
Updated Dec 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cyril Verluise; Gabriele Cristelli; Kyle Higham; Lucas Violon; Gaétan De Rassenfosse (2020). PatCit: A Comprehensive Dataset of Patent Citations [Dataset]. http://doi.org/10.5281/zenodo.4391095
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4391095
Dataset updated
Dec 23, 2020
Dataset provided by
DataCitehttps://www.datacite.org/
Zenodohttp://zenodo.org/
Authors
Cyril Verluise; Gabriele Cristelli; Kyle Higham; Lucas Violon; Gaétan De Rassenfosse
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
patCit: A Comprehensive Dataset of Patent Citations [Newsletter, GitHub] Patents are at the crossroads of many innovation nodes: science, industry, products, competition, etc. Such interactions can be identified through citations in a broad sense. It is now common to use front-page patent citations to study some aspects of the innovation system. However, there is much more buried in the Non Patent Literature (NPL) citations and in the patent text itself. patCit extracts and structures these citations. Want to know more? Read patCit academic presentation or dive into usage and technical guides on patCit documentation website. IN PRACTICE At patCit, we are building a comprehensive dataset of patent citations to help the community explore this terra incognita. patCit has the following features: global coverage front-page and in-text citations all categories of NPL documents Front-page patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories (bibliographical reference, database, norm & standard, etc). Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases (e.g. Crossref for bibliographical references). In-text patCit builds on Google Patents corpus of USPTO full-text patents. First, we extract patent and bibliographical reference citations. Then, we parse detected in-text citations into a series of category dependent attributes using grobid. Patent citations are matched with a standard publication number using the Google Patents matching API and bibliographical references are matched with a DOI using biblio-glutton. Eventually, when possible, we enrich the data using external domain specific high quality databases (e.g. Crossref for bibliographical references). FAIR Find - The patCit dataset is available on BigQuery in an interactive environment. For those who have a smattering of SQL, this is the perfect place to explore the data. It can also be downloaded on Zenodo. Interoperate - Interoperability is at the core of patCit ambition. We take care to extract unique identifiers whenever it is possible to enable data enrichment for domain specific high quality databases. This includes the DOI, PMID and PMCID for bibliographical references, the Technical Doc Number for standards, the Accession Number for Genetic databases, the publication number for PATSTAT and Claims, etc. See specific table for more details. Reproduce - Our gitHub repository is the project factory. You can learn more about data recipes and models on the patCit documentation website.
Z
Dataset Citation and Re-use Data
data.niaid.nih.gov
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krause, Geoff (2023). Dataset Citation and Re-use Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7853026
Explore at:
Dataset updated
Aug 9, 2023
Dataset provided by
Mongeon, Philippe
Krause, Geoff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes processed citation data for datasets recorded in OpenAlex as of May 2022. It identifies self-citations to these datasets at the individual, institutional, and country level, and includes domain classifications of the citing works using the Science-Metrix classifications.
OpenCitations Index CSV dataset of all the citation data
figshare.com
zip
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2025). OpenCitations Index CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.24356626.v6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24356626.v6
Dataset updated
Jul 15, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in the OpenCitation Index (https://opencitations.net/index), released on July 10, 2025. In particular, each line of the CSV file defines a citation, and includes the following information:[field "oci"] the Open Citation Identifier (OCI) for the citation;[field "citing"] the OMID of the citing entity;[field "cited"] the OMID of the cited entity;[field "creation"] the creation date of the citation (i.e. the publication date of the citing entity);[field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity);[field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal);[field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).Note: the information for each citation is sourced from OpenCitations Meta (https://opencitations.net/meta), a database that stores and delivers bibliographic metadata for all bibliographic resources included in the OpenCitations Index. The data provided in this dump is therefore based on the state of OpenCitations Meta at the time this collection was generated.This version of the dataset contains:2,216,426,689 citationsThe size of the zipped archive is 38.8 GB, while the size of the unzipped CSV file is 242 GB.
Citations to software and data in Zenodo via open sources
zenodo.org
explore.openaire.eu
+1more
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3482927
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.
Data Citation Corpus Data File
data.niaid.nih.gov
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Make Data Count (2025). Data Citation Corpus Data File [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11196858
Explore at:
Dataset updated
Feb 20, 2025
Dataset provided by
DataCitehttps://www.datacite.org/
Make Data Count
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data file for the third release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust. Read more about the project.

The data file includes 5,322,388 data citation records in JSON and CSV formats. The JSON file is the version of record.

For convenience, the data is provided in batches of approximately 1 million records each. The publication date and batch number are included in the file name, ex: 2025-02-01-data-citation-corpus-01-v3.0.json.

The data citations in the file originate from the following sources:

DataCite Event Data

A project by Chan Zuckerberg Initiative (CZI) to identify mentions to datasets in the full text of articles

Data citations identified Aligning Science Across Parkinson’s (ASAP)

Each data citation record is comprised of:

A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication (journal article or preprint) in which the dataset is cited

Metadata for the cited dataset and for the citing publication

The data file includes the following fields:

Field

Description

Required?

id

Internal identifier for the citation

Yes

created

Date of item's incorporation into the corpus

Yes

updated

Date of item's most recent update in corpus

Yes

repository

Repository where cited data is stored

No

publisher

Publisher for the article citing the data

No

journal

Journal for the article citing the data

No

title

Title of cited data

No

publication

DOI of article where data is cited

Yes

dataset

DOI or accession number of cited data

Yes

publishedDate

Date when citing article was published

No

source

Source where citation was harvested

Yes

subjects

Subject information for cited data

No

affiliations

Affiliation information for creator of cited data

No

funders

Funding information for cited data

No

Additional documentation about the citations and metadata in the file is available on the Make Data Count website.

Notes on v3.0:

The third release of the Data Citation Corpus data file reflects a few changes made to add new citations, including those from a new data source (ASAP), update and enhance citation metadata, and improve the overall usability of the file. These changes are as follows:

Add and update Event Data citations:

Add 65,524 new data citations created in DataCite Event Data between August 2024 and December 2024

Add ASAP citations:

Add 750 new data citations provided by Aligning Science Across Parkinson’s (ASAP), identified through processes to evaluate compliance with ASAP’s for open science practices, which involve a partnership with DataSeer and internal curation (described here).

Citations with provenance from ASAP are identified as “asap” in the source field

Metadata enhancements:

Reconcile and normalize organization names for affiliations and funders in a subset of records with the Research Organization Registry (ROR)

Add ror_name and ror_id subfields for affiliations and funders in JSON files. Unreconciled affiliation and funder strings are identified with values of null

Add new columns affiliationsROR and fundersROR in CSV files. Unreconciled affiliation and funder strings are identified with values of NONE NONE (this is to ensure consistency in number and order of values in cases where some strings have been reconciled and others have not)

Normalize DOI formats for articles and papers as full URLs

Additional details about the above changes, including scripts used to perform the above tasks, are available in GitHub.

Additional enhancements to the corpus are ongoing and will be addressed in the course of subsequent releases. Users are invited to submit feedback via GitHub. For general questions, email info@makedatacount.org.
d
Louisville Metro KY - Uniform Citation Data (2016-2019)
catalog.data.gov
data.lojic.org
+3more
Updated Jul 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2025). Louisville Metro KY - Uniform Citation Data (2016-2019) [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2016-2019
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Louisville, Kentucky
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/The Louisville Metro Police Department (LMPD) began operations on January 6, 2003, as part of the creation of the consolidated city-county government in Louisville, Kentucky. It was formed by the merger of the Jefferson County Police Department and the Louisville Division of Police. The Louisville Metro Police Department is headed by Chief Jacquelyn Gwinn-Villaroel. LMPD divides Jefferson County into eight patrol divisions and operates a number of special investigative and support units.

Citations of datasets published by Barcode of Life Data Systems (BOLD)

zenodo.org

csv

Updated Jul 6, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Roderic Page; Roderic Page (2025). Citations of datasets published by Barcode of Life Data Systems (BOLD) [Dataset]. http://doi.org/10.5281/zenodo.15824274

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15824274

Dataset updated

Jul 6, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Roderic Page; Roderic Page

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is a list of datasets published by Barcode of Life Data Systems (BOLD) that have DataCite DOIs and have also been cited in the scientific literature. Many of these citations represent the publication of the corresponding dataset, but in other cases an existing dataset has been reused.

This dataset was created by searching Google Scholar for the dataset identifier ("DS-*") followed by manual cleaning of the results, and adding citations that were missed.

The data is formatted following the requirements of the Data Citation Corpus.

Field	Description
repository	Data repository name. Title case.
publisher	Name of the publisher of the journal the article appeared in. Title case.
journal	Title of the journal the article appeared in. Title case.
title	Dataset title (NOT journal article title). Title case.
dataset	Dataset identifier, from the repository listed in repository column. If the dataset identifier is a DOI, full URL string with protocol and domain preferred, ex https://doi.org/10.1093/toxsci/kfq395
publication	Article DOI. Full URL string with protocol and domain preferred, ex https://doi.org/10.1093/toxsci/kfq395 . Identifiers that can be mapped to DOIs (ex, PubMed IDs) can be accepted, but DOIs are strongly preferred.
publishedDate	Article publication date. ISO 8601 YYYY-MM-DDThh:mm:ssTZD
subjects	Dataset subject terms. Lowercase. Separate multiple items with ; char.
affiliations	Dataset creator/contributor affiliations. Title case. Separate multiple items with ; char. If organization ID is available, include it after the name, with a space between the name and ID, ex Oregon State University https://ror.org/00ysfqy60 . If organization ID is a ROR ID, full URL string with protocol and domain preferred, ex https://ror.org/00ysfqy60.
funders	Dataset creator/contributor affiliations. Title case. Separate multiple items with ; char. If organization ID is available, include it after the name, with a space between the name and ID, ex National Science Foundation https://doi.org/10.13039/100000001 . If organization ID is a ROR ID or Funder Registry ID, full URL string with protocol and domain preferred, ex https://ror.org/00ysfqy60 or https://doi.org/10.13039/100000001.

citations
huggingface.co
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Common Crawl Foundation (2024). citations [Dataset]. https://huggingface.co/datasets/commoncrawl/citations
Explore at:
Dataset updated
Jul 30, 2024
Dataset provided by
Common Crawlhttp://commoncrawl.org/
Authors
Common Crawl Foundation
Description
Common Crawl Citations Overview

This dataset contains citations referencing Common Crawl Foundation and its datasets, pulled from Google Scholar. Please note that these citations are not curated, so they will include some false positives. For an annotated subset of these citations with additional fields, please see citations-annotated.
d
Louisville Metro KY - Uniform Citation Data 2020
catalog.data.gov
data.louisvilleky.gov
+3more
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Louisville, Kentucky
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
d
Louisville Metro KY - Uniform Citation Data 2021
catalog.data.gov
data.lojic.org
+3more
Updated Apr 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2021 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2021
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Louisville, Kentucky
Description
Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
g
Louisville Metro KY - Uniform Citation Data 2022
gimi9.com
s.cnmilf.com
+5more
Updated Feb 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://gimi9.com/dataset/data-gov_louisville-metro-ky-uniform-citation-data-2022-1e968/
Explore at:
Dataset updated
Feb 1, 2022
Area covered
Louisville, Kentucky
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
c
Louisville Metro KY - Uniform Citation Data 2023
s.cnmilf.com
data.lojic.org
+4more
Updated Apr 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2023 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2023
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Louisville, Kentucky
Description
Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, _location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the _location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Citation data of arXiv eprints and the associated...
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Jan 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keisuke Okamura; Keisuke Okamura; Hitoshi Koshiba; Hitoshi Koshiba (2024). Citation data of arXiv eprints and the associated quantitatively-and-temporally normalised impact metrics [Dataset]. http://doi.org/10.5281/zenodo.5803962
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5803962
Dataset updated
Jan 7, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Keisuke Okamura; Keisuke Okamura; Hitoshi Koshiba; Hitoshi Koshiba
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 25, 2021
Description
Data collection

This dataset contains information on the eprints posted on arXiv from its launch in 1991 until the end of 2019 (1,589,006 unique eprints), plus the data on their citations and the associated impact metrics. Here, eprints include preprints, conference proceedings, book chapters, data sets and commentary, i.e. every electronic material that has been posted on arXiv.

The content and metadata of the arXiv eprints were retrieved from the arXiv API (https://arxiv.org/help/api/) as of 21st January 2020, where the metadata included data of the eprint’s title, author, abstract, subject category and the arXiv ID (the arXiv’s original eprint identifier). In addition, the associated citation data were derived from the Semantic Scholar API (https://api.semanticscholar.org/) from 24th January 2020 to 7th February 2020, containing the citation information in and out of the arXiv eprints and their published versions (if applicable). Here, whether an eprint has been published in a journal or other means is assumed to be inferrable, albeit indirectly, from the status of the digital object identifier (DOI) assignment. It is also assumed that if an arXiv eprint received c_pre and c_pub citations until the data retrieval date (7th February 2020) before and after it is assigned a DOI, respectively, then the citation count of this eprint is recorded in the Semantic Scholar dataset as c_pre + c_pub. Both the arXiv API and the Semantic Scholar datasets contained the arXiv ID as metadata, which served as a key variable to merge the two datasets.

The classification of research disciplines is based on that described in the arXiv.org website (https://arxiv.org/help/stats/2020_by_area/). There, the arXiv subject categories are aggregated into several disciplines, of which we restrict our attention to the following six disciplines: Astrophysics (‘astro-ph’), Computer Science (‘comp-sci’), Condensed Matter Physics (‘cond-mat’), High Energy Physics (‘hep’), Mathematics (‘math’) and Other Physics (‘oth-phys’), which collectively accounted for 98% of all the eprints. Those eprints tagged to multiple arXiv disciplines were counted independently for each discipline. Due to this overlapping feature, the current dataset contains a cumulative total of 2,011,216 eprints.

Some general statistics and visualisations per research discipline are provided in the original article (Okamura, to appear), where the validity and limitations associated with the dataset are also discussed.

Description of columns (variables)

arxiv_id : arXiv ID

category : Research discipline

pre_year : Year of posting v1 on arXiv

pub_year : Year of DOI acquisition

c_tot : No. of citations acquired during 1991–2019

c_pre : No. of citations acquired before and including the year of DOI acquisition

c_pub : No. of citations acquired after the year of DOI acquisition

c_yyyy (yyyy = 1991, …, 2019) : No. of citations acquired in the year yyyy (with ‘yyyy’ running from 1991 to 2019)

gamma : The quantitatively-and-temporally normalised citation index

gamma_star : The quantitatively-and-temporally standardised citation index

Note: The definition of the quantitatively-and-temporally normalised citation index (γ; ‘gamma’) and that of the standardised citation index (γ*; ‘gamma_star’) are provided in the original article (Okamura, to appear). Both indices can be used to compare the citational impact of papers/eprints published in different research disciplines at different times.

Data files

A comma-separated values file (‘arXiv_impact.csv’) and a Stata file (‘arXiv_impact.dta’) are provided, both containing the same information.
a
CIFAR-10 (Canadian Institute for Advanced Research)
academictorrents.com
bittorrent
Updated Oct 11, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Krizhevsky and Vinod Nair and Geoffrey Hinton (2015). CIFAR-10 (Canadian Institute for Advanced Research) [Dataset]. https://academictorrents.com/details/463ba7ec7f37ed414c12fbb71ebf6431eada2d7a
Explore at:
bittorrent(170052171)Available download formats
Dataset updated
Oct 11, 2015
Dataset authored and provided by
Alex Krizhevsky and Vinod Nair and Geoffrey Hinton
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
d
Data from: The location of the citation: changing practices in how...
datadryad.org
search.dataone.org
+1more
zip
Updated Oct 31, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Mayo; Todd J. Vision; Elizabeth A. Hull (2016). The location of the citation: changing practices in how publications cite original data in the Dryad Digital Repository [Dataset]. http://doi.org/10.5061/dryad.8q931
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.8q931
Dataset updated
Oct 31, 2016
Dataset provided by
Dryad
Authors
Christine Mayo; Todd J. Vision; Elizabeth A. Hull
Time period covered
Jan 19, 2016
Description
Citation location resultsDetails of the articles analyzed to determine the location of original data identifiers within the article. Only includes results for articles that were Open Access and full-text indexed in EPMC.citation_locations.txtCitation location scriptThis Python script imports lists of Dryad metadata and uses it to search Europe PMC and to locate and classify data references appearing in the literature.citation_locations.py

Facebook

Twitter

Click to copy link

Link copied

Cite

Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1

Citation Knowledge with Section and Context

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.21954/ou.rd.11346848.v1

Dataset updated

May 5, 2020

Dataset provided by

The Open University

Authors

Anita Khadka

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).

Clear search

Close search

Google apps

Main menu

Citation Knowledge with Section and Context

COCI CSV dataset of all the citation data

citing-dataset-elements

Data Citation Corpus Data File

Abstract

Methodology

Curated Open Citations Dataset

Data from: PatCit: A Comprehensive Dataset of Patent Citations

Dataset Citation and Re-use Data

OpenCitations Index CSV dataset of all the citation data

Citations to software and data in Zenodo via open sources

Data Citation Corpus Data File

Louisville Metro KY - Uniform Citation Data (2016-2019)

Citations of datasets published by Barcode of Life Data Systems (BOLD)

citations

Louisville Metro KY - Uniform Citation Data 2020

Louisville Metro KY - Uniform Citation Data 2021

Louisville Metro KY - Uniform Citation Data 2022

Louisville Metro KY - Uniform Citation Data 2023

Citation data of arXiv eprints and the associated...

CIFAR-10 (Canadian Institute for Advanced Research)

Data from: The location of the citation: changing practices in how...

Citation Knowledge with Section and Context