100+ datasets found

o
Citation Knowledge with Section and Context
ordo.open.ac.uk
zip
Updated May 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.21954/ou.rd.11346848.v1
Dataset updated
May 5, 2020
Dataset provided by
The Open University
Authors
Anita Khadka
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).
POCI CSV dataset of all the citation data
figshare.com
zip
Updated Dec 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21776351.v1
Dataset updated
Dec 27, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

[field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

This version of the dataset contains:

717,654,703 citations; 26,024,862 bibliographic resources.

The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.
P
CITE Dataset
paperswithcode.com
opendatalab.com
Updated Feb 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malihe Alikhani; Sreyasi Nag Chowdhury; Gerard de Melo; Matthew Stone (2021). CITE Dataset [Dataset]. https://paperswithcode.com/dataset/cite
Explore at:
Dataset updated
Feb 7, 2021
Authors
Malihe Alikhani; Sreyasi Nag Chowdhury; Gerard de Melo; Matthew Stone
Description
CITE is a crowd-sourced resource for multimodal discourse: this resource characterises inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations.
Citations to software and data in Zenodo via open sources
zenodo.org
explore.openaire.eu
+1more
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3482927
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.
P
DBLP Dataset
paperswithcode.com
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2021). DBLP Dataset [Dataset]. https://paperswithcode.com/dataset/dblp
Explore at:
Dataset updated
Apr 13, 2021
Authors
Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su
Description
The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.
P
PMOA-CITE Dataset
paperswithcode.com
figshare.com
Updated May 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tong Zeng; Daniel E. Acuna (2024). PMOA-CITE Dataset [Dataset]. https://paperswithcode.com/dataset/pmoa-cite
Explore at:
Dataset updated
May 19, 2024
Authors
Tong Zeng; Daniel E. Acuna
Description
The dataset used in the experiments on the paper "Modeling citation worthiness by using attention‑based bidirectional long short‑term memory networks and interpretable models"

There are one million sentences in total, and further splitted into trainning, validation and testing by 60%, 20% and 20%, respectively.

For the pre-processing of the dataset, please refer to the paper.

The data are stored in jsonl format (each row is an json object), we list a couple of rows as example: {"sec_name":"introduction","cur_sent_id":"12213838@0#3$0","next_sent_id":"12213838@0#3$1","cur_sent":"All three spectrin subunits are essential for normal development.","next_sent":"βH, encoded by the karst locus, is an essential protein that is required for epithelial morphogenesis .","cur_scaled_len_features":{"type":1,"values":[0.17716535433070865,0.13513513513513514]},"next_scaled_len_features":{"type":1,"values":[0.32677165354330706,0.35135135135135137]},"cur_has_citation":0,"next_has_citation":1} {"sec_name":"results","prev_sent_id":"12230634@1@1#0$2","cur_sent_id":"12230634@1@1#0$3","next_sent_id":"12230634@1@1#0$4","prev_sent":"μIU/ml at the 2.0-h postprandial time point.","cur_sent":"Statistically significant differences between the mean plasma insulin levels of dogs treated with 50 mg/kg of GSNO, and those treated with 50 mg/kg GSNO and vitamin C (50 mg/kg) were observed at the 1.0-h and 1.5-h time points (P < 0.05).","next_sent":"The mean plasma insulin concentrations in the dogs treated with 50 mg/kg of vitamin C and 50 mg/kg of GSNO, or 50 mg/kg of GSNO was significantly altered compared to those of controls or captopril-treated dogs (P < 0.05).","prev_scaled_len_features":{"type":1,"values":[0.09448818897637795,0.08108108108108109]},"cur_scaled_len_features":{"type":1,"values":[0.8582677165354331,1.0]},"next_scaled_len_features":{"type":1,"values":[0.7913385826771654,0.9459459459459459]},"prev_has_citation":0,"cur_has_citation":0,"next_has_citation":0}

{"sec_name":"results","prev_sent_id":"12213837@1@0#3$3","cur_sent_id":"12213837@1@0#3$4","next_sent_id":"12213837@1@0#3$5","prev_sent":"Cleavage of VAMP2 by BoNT/D releases the NH2-terminal 59 amino acids from the protein and eliminates exocytosis.","cur_sent":"However, in this case, exocytosis cannot be recovered by addition of the cleaved fragment .","next_sent":"Peptides that exactly correspond to the BoNT/D cleavage site (VAMP2 aa 25–59 and 60–94-cys) were equally efficient at mediating liposome fusion (unpublished data).","prev_scaled_len_features":{"type":1,"values":[0.36220472440944884,0.35135135135135137]},"cur_scaled_len_features":{"type":1,"values":[0.2795275590551181,0.2972972972972973]},"next_scaled_len_features":{"type":1,"values":[0.562992125984252,0.5135135135135135]},"prev_has_citation":0,"cur_has_citation":1,"next_has_citation":0}

For the code using this dataset to modeling citation worthiness, please refer to https://github.com/sciosci/cite-worthiness
B
Citing online references
borealisdata.ca
dataone.org
Updated May 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Topps; Corey Wirun; Nishan Sharma (2019). Citing online references [Dataset]. http://doi.org/10.5683/SP2/80VX7U
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/80VX7U
Dataset updated
May 7, 2019
Dataset provided by
Borealis
Authors
David Topps; Corey Wirun; Nishan Sharma
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Citation of reference material is well established for most traditional sources but remains inconsistent in its application for online resources such as web pages, blog posts and materials generated from underlying database queries. We present some tips on how authors can more effectively cite and archive such resources so they are persistent and sustainable.
P
MultiCite Dataset
paperswithcode.com
opendatalab.com
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anne Lauscher; Brandon Ko; Bailey Kuehl; Sophie Johnson; David Jurgens; Arman Cohan; Kyle Lo (2021). MultiCite Dataset [Dataset]. https://paperswithcode.com/dataset/multicite
Explore at:
Dataset updated
Jun 30, 2021
Authors
Anne Lauscher; Brandon Ko; Bailey Kuehl; Sophie Johnson; David Jurgens; Arman Cohan; Kyle Lo
Description
MultiCite is a dataset of 12,653 citation contexts from over 1,200 computational linguistics papers used for Citation context analysis (CCA). MultiCite contains multi-sentence, multi-label citation contexts within full paper texts.
Data from: CRAWDAD wireless network data citation bibliography
figshare.com
txt
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tristan Henderson; David Kotz (2016). CRAWDAD wireless network data citation bibliography [Dataset]. http://doi.org/10.6084/m9.figshare.1203646.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1203646.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tristan Henderson; David Kotz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This BibTeX file contains the corpus of papers that cite CRAWDAD wireless network datasets, as used in the paper: Tristan Henderson and David Kotz. Data citation practices in the CRAWDAD wireless network data archive. Proceedings of the Second Workshop on Linking and Contextualizing Publications and Datasets, London, UK, September 2014. Most of the fields are standard BibTeX fields. There are two that require further explanation. "citations" - this field contains the citations for a paper as countedby Google Scholar as of 24 September 2014. "keywords" - this field contains a set of tags indicating data citation practice. These are as follows:- "uses_crawdad_data" - this paper uses a CRAWDAD dataset- "cites_insufficiently" - this paper does not meet our sufficiency criteria- "cites_by_description" - this paper cites a dataset by description rather than dataset identifier- "cites_canonical_paper" - this paper cites the original ("canonical") paper that collected a dataset, rather than pointing to the dataset- "cites_by_name" - this paper cites a dataset by a colloquial name rather than dataset identifier- "cites_crawdad_url" - this paper cites the main CRAWDAD URL rather than a particular dataset- "cites_without_url" - this paper does not provide a URL for dataset access- "cites_wrong_attribution" - this paper attributes a dataset to CRAWDAD, Dartmouth etc rather than the dataset authors- "cites_vaguely" - this paper cites the used datasets (if any) too vaguely to be sufficient If you have any questions about the data, please contact us atcrawdad@crawdad.org
Z
Methodology data of "A qualitative and quantitative citation analysis toward...
data.niaid.nih.gov
zenodo.org
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peroni, Silvio (2024). Methodology data of "A qualitative and quantitative citation analysis toward retracted articles: a case of study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024337
Explore at:
Dataset updated
Aug 2, 2024
Dataset provided by
Heibi, Ivan
Peroni, Silvio
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This document contains the datasets and visualizations generated after the application of the methodology defined in our work: "A qualitative and quantitative citation analysis toward retracted articles: a case of study". The methodology defines a citation analysis of the Wakefield et al. [1] retracted article from a quantitative and qualitative point of view. The data contained in this repository are based on the first two steps of the methodology. The first step of the methodology (i.e. “Data gathering”) builds an annotated dataset of the citing entities, this step is largely discussed also in [2]. The second step (i.e. "Topic Modelling") runs a topic modeling analysis on the textual features contained in the dataset generated by the first step.

Note: the data are all contained inside the "method_data.zip" file. You need to unzip the file to get access to all the files and directories listed below.

Data gathering

The data generated by this step are stored in "data/":

"cits_features.csv": a dataset containing all the entities (rows in the CSV) which have cited the Wakefield et al. retracted article, and a set of features characterizing each citing entity (columns in the CSV). The features included are: DOI ("doi"), year of publication ("year"), the title ("title"), the venue identifier ("source_id"), the title of the venue ("source_title"), yes/no value in case the entity is retracted as well ("retracted"), the subject area ("area"), the subject category ("category"), the sections of the in-text citations ("intext_citation.section"), the value of the reference pointer ("intext_citation.pointer"), the in-text citation function ("intext_citation.intent"), the in-text citation perceived sentiment ("intext_citation.sentiment"), and a yes/no value to denote whether the in-text citation context mentions the retraction of the cited entity ("intext_citation.section.ret_mention"). Note: this dataset is licensed under a Creative Commons public domain dedication (CC0).

"cits_text.csv": this dataset stores the abstract ("abstract") and the in-text citations context ("intext_citation.context") for each citing entity identified using the DOI value ("doi"). Note: the data keep their original license (the one provided by their publisher). This dataset is provided in order to favor the reproducibility of the results obtained in our work.

Topic modeling We run a topic modeling analysis on the textual features gathered (i.e. abstracts and citation contexts). The results are stored inside the "topic_modeling/" directory. The topic modeling has been done using MITAO, a tool for mashing up automatic text analysis tools, and creating a completely customizable visual workflow [3]. The topic modeling results for each textual feature are separated into two different folders, "abstracts/" for the abstracts, and "intext_cit/" for the in-text citation contexts. Both the directories contain the following directories/files:

"mitao_workflows/": the workflows of MITAO. These are JSON files that could be reloaded in MITAO to reproduce the results following the same workflows.

"corpus_and_dictionary/": it contains the dictionary and the vectorized corpus given as inputs for the LDA topic modeling.

"coherence/coherence.csv": the coherence score of several topic models trained on a number of topics from 1 - 40.

"datasets_and_views/": the datasets and visualizations generated using MITAO.

References

Wakefield, A., Murch, S., Anthony, A., Linnell, J., Casson, D., Malik, M., Berelowitz, M., Dhillon, A., Thomson, M., Harvey, P., Valentine, A., Davies, S., & Walker-Smith, J. (1998). RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet, 351(9103), 637–641. https://doi.org/10.1016/S0140-6736(97)11096-0

Heibi, I., & Peroni, S. (2020). A methodology for gathering and annotating the raw-data/characteristics of the documents citing a retracted article v1 (protocols.io.bdc4i2yw) [Data set]. In protocols.io. ZappyLab, Inc. https://doi.org/10.17504/protocols.io.bdc4i2yw

Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A User Friendly and Modular Software for Topic Modelling [JD]. PuntOorg International Journal, 5(2), 135–149. https://doi.org/10.19245/25.05.pij.5.2.3
Z
Uncovering the Citation Landscape: Exploring OpenCitations COCI,...
data.niaid.nih.gov
Updated Sep 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenzo Paolini (2023). Uncovering the Citation Landscape: Exploring OpenCitations COCI, OpenCitations Meta, and ERIH-PLUS in Social Sciences and Humanities Journals - DATA PRODUCED [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7974815
Explore at:
Dataset updated
Sep 7, 2023
Dataset provided by
Sara Vellone
Marta Soricetti
Olga Pagnotta
Lorenzo Paolini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This zipped folders contain all the data produced for the research "Uncovering the Citation Landscape: Exploring OpenCitations COCI, OpenCitations Meta, and ERIH-PLUS in Social Sciences and Humanities Journals": the results datasets (dataset_map_disciplines, dataset_no_SSH, dataset_SSH, erih_meta_with_disciplines and erih_meta_without_disciplines).

dataset_map_disciplines.zip contains CSV files with four columns ("id", "citing", "cited", "disciplines") giving information about publications stored in OpenCitations META (version 3 released on February 2023) and part of SSH journals, according to ERIH PLUS (version downloaded on 2023-04-27), specifying the disciplines associated to them and a boolean value stating if they cite or are cited, according to the OpenCitations COCI dataset (version 19 released on January 2023).

dataset_no_SSH.zip and dataset_SSH.zip contain CSV files with the same structure. Each dataset has four columns: "citing", "is_citing_SSH", "cited", and "is_cited_SSH". ”Citing” and “cited” columns are filled with DOIs of publications stored in OpenCitations META that according to OpenCitations COCI are involved in a citation. The "is_citing_SSH" and "is_cited_SSH" columns contain boolean values: "True" if the corresponding publication is associated with a SSH (Social Sciences and Humanities) discipline, according to ERIH PLUS, and "False" otherwise. The two datasets are built starting from the two different subsets obtained as a result of the union between OpenCitations META and ERIH PLUS: dataset_SSH comes from erih_meta_with_disciplines and dataset_no_SSH from erih_meta_without_disciplines. dataset_no_SSH comes from erih_meta_with_disciplines.zip and erih_meta_without_disciplines.zip, as explained before, contain CSV files originating from ERIH PLUS and META. erih_meta_without_disciplines has just one column “id” and contains the DOIs of all the publications in META that do not have any discipline associated, that is, have not been published on a SSH journal, while erih_meta_with_disciplines derives from all the publications in META that have at least one linked discipline and has two columns: “id” and “erih_disciplines”, containing a string with all the disciplines linked to that publication like "History, Interdisciplinary research in the Humanities, Interdisciplinary research in the Social Sciences, Sociology".

Software: https://doi.org/10.5281/zenodo.8326023

Data preprocessed: https://doi.org/10.5281/zenodo.7973159

Article: https://zenodo.org/record/8326044

DMP: https://zenodo.org/record/8324973

Protocol: https://doi.org/10.17504/protocols.io.n92ldpeenl5b/v5
I
Data from: OpCitance: Citation contexts identified from the PubMed Central...
databank.illinois.edu
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tzu-Kun Hsiao; Vetle Torvik (2023). OpCitance: Citation contexts identified from the PubMed Central open access articles [Dataset]. http://doi.org/10.13012/B2IDB-4353270_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-4353270_V1
Dataset updated
Feb 15, 2023
Authors
Tzu-Kun Hsiao; Vetle Torvik
Dataset funded by
U.S. National Institutes of Health (NIH)
Description
Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. Files: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE. • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs. • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable. • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component. • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-". • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence. • progression: Text progression of the citation context/sentence. Supplementary Files • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • Supplementary_File_1.zip – This file contains the code for generating the dataset.
d
Louisville Metro KY - Uniform Citation Data (2016-2019)
catalog.data.gov
data.louisvilleky.gov
+4more
Updated Apr 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data (2016-2019) [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2016-2019
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Data from: Invasive species - American bullfrog (Lithobates catesbeianus) in...
gbif.org
data.biodiversity.be
+4more
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet; Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet (2025). Invasive species - American bullfrog (Lithobates catesbeianus) in Flanders, Belgium [Dataset]. http://doi.org/10.15468/2hqkqn
Explore at:
Unique identifier
https://doi.org/10.15468/2hqkqn
Dataset updated
May 15, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Research Institute for Nature and Forest (INBO)
Authors
Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet; Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Apr 27, 2010 - Dec 31, 2018
Area covered

Description
Invasive species - American bullfrog (Lithobates catesbeianus) in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 7,500 occurrences (25% of which are American bullfrogs) sampled between 2010 until now, in the months April to October. The data are compiled from different sources at the INBO, but most of the occurrences were collected through fieldwork for the EU co-funded Interreg project INVEXO (http://www.invexo.eu). In this project, research was conducted on different methods for the management of American bullfrog populations, an alien invasive species in Belgium. Captured bullfrogs were almost always removed from the environment and humanely killed, while the other occurrences are recorded bycatch, which were released upon catch (see bibliography for detailed descriptions of the methods). Therefore, caution is advised when using these data for trend analysis, distribution range calculation, or other. Issues with the dataset can be reported at https://github.com/inbo/data-publication/tree/master/datasets/invasive-bullfrog-occurrences
We strongly believe an open attitude is essential for tackling the IAS problem (Groom et al. 2015). To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate it however if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/2hqkqn) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
d
Louisville Metro KY - Uniform Citation Data 2020
catalog.data.gov
data.lojic.org
+2more
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
Explore at:
Dataset updated
Apr 13, 2023
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Kentucky, Louisville
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Data from: InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés...
gbif.org
data.europa.eu
Updated May 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Els De Bie; Dimitri Brosens; Els De Bie; Dimitri Brosens (2021). InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés for Flanders, Belgium [Dataset]. http://doi.org/10.15468/gouexm
Explore at:
Unique identifier
https://doi.org/10.15468/gouexm
Dataset updated
May 4, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Research Institute for Nature and Forest (INBO)
Authors
Els De Bie; Dimitri Brosens; Els De Bie; Dimitri Brosens
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
May 21, 2002 - Jul 7, 2005
Area covered

Description
The NICHE-Vlaanderen project had the goal to develop an hydro-ecological prediction model, used in ecological impact assessment studies. The data in this dataset is part of the vegetation-plot data used to feed the model and contains groundwater depending terrestrial vegetation relevées in relation to groundwater levels. Vegetation plot relevés were performed near selected piezometers (WATINA database, groundwater network Flanders) between May and August in 2002, 2004 and 2005. Initially the vegetation surveys were recorded in Turboveg (Hennekens, 1998) and later on moved to INBOVEG, the INBO vegetation plot database. The dataset contains 569 vegetation relevées, recorded during the fieldwork of the NICHE-Vlaanderen project. Relevées contain species coverage data, coverage data for layers, vegetation height and the date of recording. All the vegetation relevées were classified as vegetation types. Issues related to the dataset can by submitted here: https://github.com/inbo/data-publication/tree/master/datasets/inboveg-niche-vlaanderen-events

To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/gouexm) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
Bibliometric dataset: list of highly cited papers in bibliometric
zenodo.org
data.niaid.nih.gov
bin, png, txt
Updated Jul 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dasapta Erwin Irawan; Dasapta Erwin Irawan; Dini Sofiani Permatasari; Dini Sofiani Permatasari; Lusia Marliana Nurani; Lusia Marliana Nurani (2024). Bibliometric dataset: list of highly cited papers in bibliometric [Dataset]. http://doi.org/10.5281/zenodo.2544533
Explore at:
png, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2544533
Dataset updated
Jul 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dasapta Erwin Irawan; Dasapta Erwin Irawan; Dini Sofiani Permatasari; Dini Sofiani Permatasari; Lusia Marliana Nurani; Lusia Marliana Nurani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Motivation

My motivation in providing this dataset is to invite more interests from Indonesia's librarian to understand their diverse field of study.

Method

This dataset is harvested in 19 January 2019 from Scopus database provided by The University of Sydney. I used the keyword "bibliometric" in title, sort the search results by total citation, then download the first 2000 papers as RIS file. This file can be converted to other formats like bibtex or csv using available reference manager, like Zotero.

Visualisations

I did two small visualisations using the following options:

"create a map based on bibliographic data"

"create a map based on text data"

Both mappings are done using VosViewer open source app from CWTS Leiden University.
Data from: Loopkevers Grensmaas - Ground beetles near the river Meuse in...
gbif.org
metadata.vlaanderen.be
+2more
Updated Apr 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet (2021). Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium [Dataset]. http://doi.org/10.15468/hy3pzl
Explore at:
Unique identifier
https://doi.org/10.15468/hy3pzl
Dataset updated
Apr 1, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Research Institute for Nature and Forest (INBO)
Authors
Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Aug 25, 1998 - Oct 4, 1999
Area covered

Description
Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 5,800 beetle occurrences sampled between 1998 and 1999 from 28 locations on the left bank (Belgium) of the river Meuse on the border between Belgium and the Netherlands. The dataset includes over 100 ground beetles species (Carabidae) and some non-target species. The data were used to assess the dynamics of the Grensmaas area and to help river management. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/kevers-grensmaas-occurrences

To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/hy3pzl) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
NIST Statistical Reference Datasets - SRD 140
datasets.ai
data.nist.gov
+2more
21
Updated Aug 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). NIST Statistical Reference Datasets - SRD 140 [Dataset]. https://datasets.ai/datasets/nist-statistical-reference-datasets-srd-140-df30c
Explore at:
21Available download formats
Dataset updated
Aug 27, 2024
Dataset authored and provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.
g
Louisville Metro KY - Uniform Citation Data 2022
gimi9.com
s.cnmilf.com
+5more
Updated Feb 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://gimi9.com/dataset/data-gov_louisville-metro-ky-uniform-citation-data-2022-1e968/
Explore at:
Dataset updated
Feb 1, 2022
Area covered
Kentucky, Louisville
Description
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

Facebook

Twitter

Click to copy link

Link copied

Cite

Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1

Citation Knowledge with Section and Context

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.21954/ou.rd.11346848.v1

Dataset updated

May 5, 2020

Dataset provided by

The Open University

Authors

Anita Khadka

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).

Clear search

Close search

Google apps

Main menu

Citation Knowledge with Section and Context

POCI CSV dataset of all the citation data

CITE Dataset

Citations to software and data in Zenodo via open sources

DBLP Dataset

PMOA-CITE Dataset

Citing online references

MultiCite Dataset

Data from: CRAWDAD wireless network data citation bibliography

Methodology data of "A qualitative and quantitative citation analysis toward...

Uncovering the Citation Landscape: Exploring OpenCitations COCI,...

Data from: OpCitance: Citation contexts identified from the PubMed Central...

Louisville Metro KY - Uniform Citation Data (2016-2019)

Data from: Invasive species - American bullfrog (Lithobates catesbeianus) in...

Louisville Metro KY - Uniform Citation Data 2020

Data from: InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés...

Bibliometric dataset: list of highly cited papers in bibliometric

Data from: Loopkevers Grensmaas - Ground beetles near the river Meuse in...

NIST Statistical Reference Datasets - SRD 140

Louisville Metro KY - Uniform Citation Data 2022

Citation Knowledge with Section and Context