100+ datasets found
  1. o

    Citation Knowledge with Section and Context

    • ordo.open.ac.uk
    zip
    Updated May 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 5, 2020
    Dataset provided by
    The Open University
    Authors
    Anita Khadka
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).

  2. POCI CSV dataset of all the citation data

    • figshare.com
    zip
    Updated Dec 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2022). POCI CSV dataset of all the citation data [Dataset]. http://doi.org/10.6084/m9.figshare.21776351.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains all the citation data (in CSV format) included in POCI, released on 27 December 2022. In particular, each line of the CSV file defines a citation, and includes the following information:

    [field "oci"] the Open Citation Identifier (OCI) for the citation; [field "citing"] the PMID of the citing entity; [field "cited"] the PMID of the cited entity; [field "creation"] the creation date of the citation (i.e. the publication date of the citing entity); [field "timespan"] the time span of the citation (i.e. the interval between the publication date of the cited entity and the publication date of the citing entity); [field "journal_sc"] it records whether the citation is a journal self-citations (i.e. the citing and the cited entities are published in the same journal); [field "author_sc"] it records whether the citation is an author self-citation (i.e. the citing and the cited entities have at least one author in common).

    This version of the dataset contains:

    717,654,703 citations; 26,024,862 bibliographic resources.

    The size of the zipped archive is 9.6 GB, while the size of the unzipped CSV file is 50 GB. Additional information about POCI at official webpage.

  3. P

    CITE Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malihe Alikhani; Sreyasi Nag Chowdhury; Gerard de Melo; Matthew Stone (2021). CITE Dataset [Dataset]. https://paperswithcode.com/dataset/cite
    Explore at:
    Dataset updated
    Feb 7, 2021
    Authors
    Malihe Alikhani; Sreyasi Nag Chowdhury; Gerard de Melo; Matthew Stone
    Description

    CITE is a crowd-sourced resource for multimodal discourse: this resource characterises inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations.

  4. Citations to software and data in Zenodo via open sources

    • zenodo.org
    • explore.openaire.eu
    • +1more
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.

  5. P

    DBLP Dataset

    • paperswithcode.com
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2021). DBLP Dataset [Dataset]. https://paperswithcode.com/dataset/dblp
    Explore at:
    Dataset updated
    Apr 13, 2021
    Authors
    Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su
    Description

    The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.

  6. P

    PMOA-CITE Dataset

    • paperswithcode.com
    • figshare.com
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tong Zeng; Daniel E. Acuna (2024). PMOA-CITE Dataset [Dataset]. https://paperswithcode.com/dataset/pmoa-cite
    Explore at:
    Dataset updated
    May 19, 2024
    Authors
    Tong Zeng; Daniel E. Acuna
    Description

    The dataset used in the experiments on the paper "Modeling citation worthiness by using attention‑based bidirectional long short‑term memory networks and interpretable models"

    There are one million sentences in total, and further splitted into trainning, validation and testing by 60%, 20% and 20%, respectively.

    For the pre-processing of the dataset, please refer to the paper.

    The data are stored in jsonl format (each row is an json object), we list a couple of rows as example: {"sec_name":"introduction","cur_sent_id":"12213838@0#3$0","next_sent_id":"12213838@0#3$1","cur_sent":"All three spectrin subunits are essential for normal development.","next_sent":"βH, encoded by the karst locus, is an essential protein that is required for epithelial morphogenesis .","cur_scaled_len_features":{"type":1,"values":[0.17716535433070865,0.13513513513513514]},"next_scaled_len_features":{"type":1,"values":[0.32677165354330706,0.35135135135135137]},"cur_has_citation":0,"next_has_citation":1} {"sec_name":"results","prev_sent_id":"12230634@1@1#0$2","cur_sent_id":"12230634@1@1#0$3","next_sent_id":"12230634@1@1#0$4","prev_sent":"μIU/ml at the 2.0-h postprandial time point.","cur_sent":"Statistically significant differences between the mean plasma insulin levels of dogs treated with 50 mg/kg of GSNO, and those treated with 50 mg/kg GSNO and vitamin C (50 mg/kg) were observed at the 1.0-h and 1.5-h time points (P < 0.05).","next_sent":"The mean plasma insulin concentrations in the dogs treated with 50 mg/kg of vitamin C and 50 mg/kg of GSNO, or 50 mg/kg of GSNO was significantly altered compared to those of controls or captopril-treated dogs (P < 0.05).","prev_scaled_len_features":{"type":1,"values":[0.09448818897637795,0.08108108108108109]},"cur_scaled_len_features":{"type":1,"values":[0.8582677165354331,1.0]},"next_scaled_len_features":{"type":1,"values":[0.7913385826771654,0.9459459459459459]},"prev_has_citation":0,"cur_has_citation":0,"next_has_citation":0}

    {"sec_name":"results","prev_sent_id":"12213837@1@0#3$3","cur_sent_id":"12213837@1@0#3$4","next_sent_id":"12213837@1@0#3$5","prev_sent":"Cleavage of VAMP2 by BoNT/D releases the NH2-terminal 59 amino acids from the protein and eliminates exocytosis.","cur_sent":"However, in this case, exocytosis cannot be recovered by addition of the cleaved fragment .","next_sent":"Peptides that exactly correspond to the BoNT/D cleavage site (VAMP2 aa 25–59 and 60–94-cys) were equally efficient at mediating liposome fusion (unpublished data).","prev_scaled_len_features":{"type":1,"values":[0.36220472440944884,0.35135135135135137]},"cur_scaled_len_features":{"type":1,"values":[0.2795275590551181,0.2972972972972973]},"next_scaled_len_features":{"type":1,"values":[0.562992125984252,0.5135135135135135]},"prev_has_citation":0,"cur_has_citation":1,"next_has_citation":0}

    For the code using this dataset to modeling citation worthiness, please refer to https://github.com/sciosci/cite-worthiness

  7. B

    Citing online references

    • borealisdata.ca
    • dataone.org
    Updated May 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Topps; Corey Wirun; Nishan Sharma (2019). Citing online references [Dataset]. http://doi.org/10.5683/SP2/80VX7U
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2019
    Dataset provided by
    Borealis
    Authors
    David Topps; Corey Wirun; Nishan Sharma
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Citation of reference material is well established for most traditional sources but remains inconsistent in its application for online resources such as web pages, blog posts and materials generated from underlying database queries. We present some tips on how authors can more effectively cite and archive such resources so they are persistent and sustainable.

  8. P

    MultiCite Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne Lauscher; Brandon Ko; Bailey Kuehl; Sophie Johnson; David Jurgens; Arman Cohan; Kyle Lo (2021). MultiCite Dataset [Dataset]. https://paperswithcode.com/dataset/multicite
    Explore at:
    Dataset updated
    Jun 30, 2021
    Authors
    Anne Lauscher; Brandon Ko; Bailey Kuehl; Sophie Johnson; David Jurgens; Arman Cohan; Kyle Lo
    Description

    MultiCite is a dataset of 12,653 citation contexts from over 1,200 computational linguistics papers used for Citation context analysis (CCA). MultiCite contains multi-sentence, multi-label citation contexts within full paper texts.

  9. Data from: CRAWDAD wireless network data citation bibliography

    • figshare.com
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tristan Henderson; David Kotz (2016). CRAWDAD wireless network data citation bibliography [Dataset]. http://doi.org/10.6084/m9.figshare.1203646.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tristan Henderson; David Kotz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This BibTeX file contains the corpus of papers that cite CRAWDAD wireless network datasets, as used in the paper: Tristan Henderson and David Kotz. Data citation practices in the CRAWDAD wireless network data archive. Proceedings of the Second Workshop on Linking and Contextualizing Publications and Datasets, London, UK, September 2014. Most of the fields are standard BibTeX fields. There are two that require further explanation. "citations" - this field contains the citations for a paper as countedby Google Scholar as of 24 September 2014. "keywords" - this field contains a set of tags indicating data citation practice. These are as follows:- "uses_crawdad_data" - this paper uses a CRAWDAD dataset- "cites_insufficiently" - this paper does not meet our sufficiency criteria- "cites_by_description" - this paper cites a dataset by description rather than dataset identifier- "cites_canonical_paper" - this paper cites the original ("canonical") paper that collected a dataset, rather than pointing to the dataset- "cites_by_name" - this paper cites a dataset by a colloquial name rather than dataset identifier- "cites_crawdad_url" - this paper cites the main CRAWDAD URL rather than a particular dataset- "cites_without_url" - this paper does not provide a URL for dataset access- "cites_wrong_attribution" - this paper attributes a dataset to CRAWDAD, Dartmouth etc rather than the dataset authors- "cites_vaguely" - this paper cites the used datasets (if any) too vaguely to be sufficient If you have any questions about the data, please contact us atcrawdad@crawdad.org

  10. Z

    Methodology data of "A qualitative and quantitative citation analysis toward...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peroni, Silvio (2024). Methodology data of "A qualitative and quantitative citation analysis toward retracted articles: a case of study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024337
    Explore at:
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Heibi, Ivan
    Peroni, Silvio
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This document contains the datasets and visualizations generated after the application of the methodology defined in our work: "A qualitative and quantitative citation analysis toward retracted articles: a case of study". The methodology defines a citation analysis of the Wakefield et al. [1] retracted article from a quantitative and qualitative point of view. The data contained in this repository are based on the first two steps of the methodology. The first step of the methodology (i.e. “Data gathering”) builds an annotated dataset of the citing entities, this step is largely discussed also in [2]. The second step (i.e. "Topic Modelling") runs a topic modeling analysis on the textual features contained in the dataset generated by the first step.

    Note: the data are all contained inside the "method_data.zip" file. You need to unzip the file to get access to all the files and directories listed below.

    Data gathering

    The data generated by this step are stored in "data/":

    "cits_features.csv": a dataset containing all the entities (rows in the CSV) which have cited the Wakefield et al. retracted article, and a set of features characterizing each citing entity (columns in the CSV). The features included are: DOI ("doi"), year of publication ("year"), the title ("title"), the venue identifier ("source_id"), the title of the venue ("source_title"), yes/no value in case the entity is retracted as well ("retracted"), the subject area ("area"), the subject category ("category"), the sections of the in-text citations ("intext_citation.section"), the value of the reference pointer ("intext_citation.pointer"), the in-text citation function ("intext_citation.intent"), the in-text citation perceived sentiment ("intext_citation.sentiment"), and a yes/no value to denote whether the in-text citation context mentions the retraction of the cited entity ("intext_citation.section.ret_mention"). Note: this dataset is licensed under a Creative Commons public domain dedication (CC0).

    "cits_text.csv": this dataset stores the abstract ("abstract") and the in-text citations context ("intext_citation.context") for each citing entity identified using the DOI value ("doi"). Note: the data keep their original license (the one provided by their publisher). This dataset is provided in order to favor the reproducibility of the results obtained in our work.

    Topic modeling We run a topic modeling analysis on the textual features gathered (i.e. abstracts and citation contexts). The results are stored inside the "topic_modeling/" directory. The topic modeling has been done using MITAO, a tool for mashing up automatic text analysis tools, and creating a completely customizable visual workflow [3]. The topic modeling results for each textual feature are separated into two different folders, "abstracts/" for the abstracts, and "intext_cit/" for the in-text citation contexts. Both the directories contain the following directories/files:

    "mitao_workflows/": the workflows of MITAO. These are JSON files that could be reloaded in MITAO to reproduce the results following the same workflows.

    "corpus_and_dictionary/": it contains the dictionary and the vectorized corpus given as inputs for the LDA topic modeling.

    "coherence/coherence.csv": the coherence score of several topic models trained on a number of topics from 1 - 40.

    "datasets_and_views/": the datasets and visualizations generated using MITAO.

    References

    Wakefield, A., Murch, S., Anthony, A., Linnell, J., Casson, D., Malik, M., Berelowitz, M., Dhillon, A., Thomson, M., Harvey, P., Valentine, A., Davies, S., & Walker-Smith, J. (1998). RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet, 351(9103), 637–641. https://doi.org/10.1016/S0140-6736(97)11096-0

    Heibi, I., & Peroni, S. (2020). A methodology for gathering and annotating the raw-data/characteristics of the documents citing a retracted article v1 (protocols.io.bdc4i2yw) [Data set]. In protocols.io. ZappyLab, Inc. https://doi.org/10.17504/protocols.io.bdc4i2yw

    Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A User Friendly and Modular Software for Topic Modelling [JD]. PuntOorg International Journal, 5(2), 135–149. https://doi.org/10.19245/25.05.pij.5.2.3
    
  11. Z

    Uncovering the Citation Landscape: Exploring OpenCitations COCI,...

    • data.niaid.nih.gov
    Updated Sep 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenzo Paolini (2023). Uncovering the Citation Landscape: Exploring OpenCitations COCI, OpenCitations Meta, and ERIH-PLUS in Social Sciences and Humanities Journals - DATA PRODUCED [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7974815
    Explore at:
    Dataset updated
    Sep 7, 2023
    Dataset provided by
    Sara Vellone
    Marta Soricetti
    Olga Pagnotta
    Lorenzo Paolini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This zipped folders contain all the data produced for the research "Uncovering the Citation Landscape: Exploring OpenCitations COCI, OpenCitations Meta, and ERIH-PLUS in Social Sciences and Humanities Journals": the results datasets (dataset_map_disciplines, dataset_no_SSH, dataset_SSH, erih_meta_with_disciplines and erih_meta_without_disciplines).

    dataset_map_disciplines.zip contains CSV files with four columns ("id", "citing", "cited", "disciplines") giving information about publications stored in OpenCitations META (version 3 released on February 2023) and part of SSH journals, according to ERIH PLUS (version downloaded on 2023-04-27), specifying the disciplines associated to them and a boolean value stating if they cite or are cited, according to the OpenCitations COCI dataset (version 19 released on January 2023).

    dataset_no_SSH.zip and dataset_SSH.zip contain CSV files with the same structure. Each dataset has four columns: "citing", "is_citing_SSH", "cited", and "is_cited_SSH". ”Citing” and “cited” columns are filled with DOIs of publications stored in OpenCitations META that according to OpenCitations COCI are involved in a citation. The "is_citing_SSH" and "is_cited_SSH" columns contain boolean values: "True" if the corresponding publication is associated with a SSH (Social Sciences and Humanities) discipline, according to ERIH PLUS, and "False" otherwise. The two datasets are built starting from the two different subsets obtained as a result of the union between OpenCitations META and ERIH PLUS: dataset_SSH comes from erih_meta_with_disciplines and dataset_no_SSH from erih_meta_without_disciplines. dataset_no_SSH comes from erih_meta_with_disciplines.zip and erih_meta_without_disciplines.zip, as explained before, contain CSV files originating from ERIH PLUS and META. erih_meta_without_disciplines has just one column “id” and contains the DOIs of all the publications in META that do not have any discipline associated, that is, have not been published on a SSH journal, while erih_meta_with_disciplines derives from all the publications in META that have at least one linked discipline and has two columns: “id” and “erih_disciplines”, containing a string with all the disciplines linked to that publication like "History, Interdisciplinary research in the Humanities, Interdisciplinary research in the Social Sciences, Sociology".

    Software: https://doi.org/10.5281/zenodo.8326023

    Data preprocessed: https://doi.org/10.5281/zenodo.7973159

    Article: https://zenodo.org/record/8326044

    DMP: https://zenodo.org/record/8324973

    Protocol: https://doi.org/10.17504/protocols.io.n92ldpeenl5b/v5

  12. I

    Data from: OpCitance: Citation contexts identified from the PubMed Central...

    • databank.illinois.edu
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tzu-Kun Hsiao; Vetle Torvik (2023). OpCitance: Citation contexts identified from the PubMed Central open access articles [Dataset]. http://doi.org/10.13012/B2IDB-4353270_V1
    Explore at:
    Dataset updated
    Feb 15, 2023
    Authors
    Tzu-Kun Hsiao; Vetle Torvik
    Dataset funded by
    U.S. National Institutes of Health (NIH)
    Description

    Sentences and citation contexts identified from the PubMed Central open access articles ---------------------------------------------------------------------- The dataset is delivered as 24 tab-delimited text files. The files contain 720,649,608 sentences, 75,848,689 of which are citation contexts. The dataset is based on a snapshot of articles in the XML version of the PubMed Central open access subset (i.e., the PMCOA subset). The PMCOA subset was collected in May 2019. The dataset is created as described in: Hsiao TK., & Torvik V. I. (manuscript) OpCitance: Citation contexts identified from the PubMed Central open access articles. Files: • A_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with A. • B_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with B. • C_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with C. • D_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with D. • E_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with E. • F_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with F. • G_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with G. • H_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with H. • I_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with I. • J_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with J. • K_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with K. • L_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with L. • M_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with M. • N_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with N. • O_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with O. • P_p1_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 1). • P_p2_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with P (part 2). • Q_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with Q. • R_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with R. • S_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with S. • T_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with T. • UV_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with U or V. • W_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with W. • XYZ_journal_IntxtCit.tsv – Sentences and citation contexts identified from articles published in journals with journal titles starting with X, Y or Z. Each row in the file is a sentence/citation context and contains the following columns: • pmcid: PMCID of the article • pmid: PMID of the article. If an article does not have a PMID, the value is NONE. • location: The article component (abstract, main text, table, figure, etc.) to which the citation context/sentence belongs. • IMRaD: The type of IMRaD section associated with the citation context/sentence. I, M, R, and D represent introduction/background, method, results, and conclusion/discussion, respectively; NoIMRaD indicates that the section type is not identifiable. • sentence_id: The ID of the citation context/sentence in the article component • total_sentences: The number of sentences in the article component. • intxt_id: The ID of the citation. • intxt_pmid: PMID of the citation (as tagged in the XML file). If a citation does not have a PMID tagged in the XML file, the value is "-". • intxt_pmid_source: The sources where the intxt_pmid can be identified. Xml represents that the PMID is only identified from the XML file; xml,pmc represents that the PMID is not only from the XML file, but also in the citation data collected from the NCBI Entrez Programming Utilities. If a citation does not have an intxt_pmid, the value is "-". • intxt_mark: The citation marker associated with the inline citation. • best_id: The best source link ID (e.g., PMID) of the citation. • best_source: The sources that confirm the best ID. • best_id_diff: The comparison result between the best_id column and the intxt_pmid column. • citation: A citation context. If no citation is found in a sentence, the value is the sentence. • progression: Text progression of the citation context/sentence. Supplementary Files • PMC-OA-patci.tsv.gz – This file contains the best source link IDs for the references (e.g., PMID). Patci [1] was used to identify the best source link IDs. The best source link IDs are mapped to the citation contexts and displayed in the *_journal IntxtCit.tsv files as the best_id column. Each row in the PMC-OA-patci.tsv.gz file is a citation (i.e., a reference extracted from the XML file) and contains the following columns: • pmcid: PMCID of the citing article. • pos: The citation's position in the reference list. • fromPMID: PMID of the citing article. • toPMID: Source link ID (e.g., PMID) of the citation. This ID is identified by Patci. • SRC: The sources that confirm the toPMID. • MatchDB: The origin bibliographic database of the toPMID. • Probability: The match probability of the toPMID. • toPMID2: PMID of the citation (as tagged in the XML file). • SRC2: The sources that confirm the toPMID2. • intxt_id: The ID of the citation. • journal: The first letter of the journal title. This maps to the *_journal_IntxtCit.tsv files. • same_ref_string: Whether the citation string appears in the reference list more than once. • DIFF: The comparison result between the toPMID column and the toPMID2 column. • bestID: The best source link ID (e.g., PMID) of the citation. • bestSRC: The sources that confirm the best ID. • Match: Matching result produced by Patci. [1] Agarwal, S., Lincoln, M., Cai, H., & Torvik, V. (2014). Patci – a tool for identifying scientific articles cited by patents. GSLIS Research Showcase 2014. http://hdl.handle.net/2142/54885 • Supplementary_File_1.zip – This file contains the code for generating the dataset.

  13. d

    Louisville Metro KY - Uniform Citation Data (2016-2019)

    • catalog.data.gov
    • data.louisvilleky.gov
    • +4more
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data (2016-2019) [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2016-2019
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  14. Data from: Invasive species - American bullfrog (Lithobates catesbeianus) in...

    • gbif.org
    • data.biodiversity.be
    • +4more
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet; Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet (2025). Invasive species - American bullfrog (Lithobates catesbeianus) in Flanders, Belgium [Dataset]. http://doi.org/10.15468/2hqkqn
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Research Institute for Nature and Forest (INBO)
    Authors
    Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet; Sander Devisscher; Tim Adriaens; Gerald Louette; Dimitri Brosens; Peter Desmet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Apr 27, 2010 - Dec 31, 2018
    Area covered
    Description

    Invasive species - American bullfrog (Lithobates catesbeianus) in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 7,500 occurrences (25% of which are American bullfrogs) sampled between 2010 until now, in the months April to October. The data are compiled from different sources at the INBO, but most of the occurrences were collected through fieldwork for the EU co-funded Interreg project INVEXO (http://www.invexo.eu). In this project, research was conducted on different methods for the management of American bullfrog populations, an alien invasive species in Belgium. Captured bullfrogs were almost always removed from the environment and humanely killed, while the other occurrences are recorded bycatch, which were released upon catch (see bibliography for detailed descriptions of the methods). Therefore, caution is advised when using these data for trend analysis, distribution range calculation, or other. Issues with the dataset can be reported at https://github.com/inbo/data-publication/tree/master/datasets/invasive-bullfrog-occurrences

    We strongly believe an open attitude is essential for tackling the IAS problem (Groom et al. 2015). To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate it however if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/2hqkqn) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.

  15. d

    Louisville Metro KY - Uniform Citation Data 2020

    • catalog.data.gov
    • data.lojic.org
    • +2more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Uniform Citation Data 2020 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-uniform-citation-data-2020
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

  16. Data from: InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés...

    • gbif.org
    • data.europa.eu
    Updated May 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Els De Bie; Dimitri Brosens; Els De Bie; Dimitri Brosens (2021). InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés for Flanders, Belgium [Dataset]. http://doi.org/10.15468/gouexm
    Explore at:
    Dataset updated
    May 4, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Research Institute for Nature and Forest (INBO)
    Authors
    Els De Bie; Dimitri Brosens; Els De Bie; Dimitri Brosens
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    May 21, 2002 - Jul 7, 2005
    Area covered
    Description

    The NICHE-Vlaanderen project had the goal to develop an hydro-ecological prediction model, used in ecological impact assessment studies. The data in this dataset is part of the vegetation-plot data used to feed the model and contains groundwater depending terrestrial vegetation relevées in relation to groundwater levels. Vegetation plot relevés were performed near selected piezometers (WATINA database, groundwater network Flanders) between May and August in 2002, 2004 and 2005. Initially the vegetation surveys were recorded in Turboveg (Hennekens, 1998) and later on moved to INBOVEG, the INBO vegetation plot database. The dataset contains 569 vegetation relevées, recorded during the fieldwork of the NICHE-Vlaanderen project. Relevées contain species coverage data, coverage data for layers, vegetation height and the date of recording. All the vegetation relevées were classified as vegetation types. Issues related to the dataset can by submitted here: https://github.com/inbo/data-publication/tree/master/datasets/inboveg-niche-vlaanderen-events

    To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/gouexm) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.

  17. Bibliometric dataset: list of highly cited papers in bibliometric

    • zenodo.org
    • data.niaid.nih.gov
    bin, png, txt
    Updated Jul 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dasapta Erwin Irawan; Dasapta Erwin Irawan; Dini Sofiani Permatasari; Dini Sofiani Permatasari; Lusia Marliana Nurani; Lusia Marliana Nurani (2024). Bibliometric dataset: list of highly cited papers in bibliometric [Dataset]. http://doi.org/10.5281/zenodo.2544533
    Explore at:
    png, bin, txtAvailable download formats
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dasapta Erwin Irawan; Dasapta Erwin Irawan; Dini Sofiani Permatasari; Dini Sofiani Permatasari; Lusia Marliana Nurani; Lusia Marliana Nurani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Motivation

    My motivation in providing this dataset is to invite more interests from Indonesia's librarian to understand their diverse field of study.

    Method

    This dataset is harvested in 19 January 2019 from Scopus database provided by The University of Sydney. I used the keyword "bibliometric" in title, sort the search results by total citation, then download the first 2000 papers as RIS file. This file can be converted to other formats like bibtex or csv using available reference manager, like Zotero.

    Visualisations

    I did two small visualisations using the following options:

    1. "create a map based on bibliographic data"
    2. "create a map based on text data"

    Both mappings are done using VosViewer open source app from CWTS Leiden University.

  18. Data from: Loopkevers Grensmaas - Ground beetles near the river Meuse in...

    • gbif.org
    • metadata.vlaanderen.be
    • +2more
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet (2021). Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium [Dataset]. http://doi.org/10.15468/hy3pzl
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Research Institute for Nature and Forest (INBO)
    Authors
    Stijn Vanacker; Dimitri Brosens; Peter Desmet; Stijn Vanacker; Dimitri Brosens; Peter Desmet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Aug 25, 1998 - Oct 4, 1999
    Area covered
    Description

    Loopkevers Grensmaas - Ground beetles near the river Meuse in Flanders, Belgium is a species occurrence dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 5,800 beetle occurrences sampled between 1998 and 1999 from 28 locations on the left bank (Belgium) of the river Meuse on the border between Belgium and the Netherlands. The dataset includes over 100 ground beetles species (Carabidae) and some non-target species. The data were used to assess the dynamics of the Grensmaas area and to help river management. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/kevers-grensmaas-occurrences

    To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate however, if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/hy3pzl) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.

  19. NIST Statistical Reference Datasets - SRD 140

    • datasets.ai
    • data.nist.gov
    • +2more
    21
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). NIST Statistical Reference Datasets - SRD 140 [Dataset]. https://datasets.ai/datasets/nist-statistical-reference-datasets-srd-140-df30c
    Explore at:
    21Available download formats
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

  20. g

    Louisville Metro KY - Uniform Citation Data 2022

    • gimi9.com
    • s.cnmilf.com
    • +5more
    Updated Feb 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Louisville Metro KY - Uniform Citation Data 2022 [Dataset]. https://gimi9.com/dataset/data-gov_louisville-metro-ky-uniform-citation-data-2022-1e968/
    Explore at:
    Dataset updated
    Feb 1, 2022
    Area covered
    Kentucky, Louisville
    Description

    A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anita Khadka (2020). Citation Knowledge with Section and Context [Dataset]. http://doi.org/10.21954/ou.rd.11346848.v1

Citation Knowledge with Section and Context

Explore at:
zipAvailable download formats
Dataset updated
May 5, 2020
Dataset provided by
The Open University
Authors
Anita Khadka
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This dataset contains information from scientific publications written by authors who have published papers in the RecSys conference. It contains four files which have information extracted from scientific publications. The details of each file are explained below:i) all_authors.tsv: This file contains the details of authors who published research papers in the RecSys conference. The details include authors' identifier in various forms, such as number, orcid id, dblp url, dblp key and google scholar url, authors' first name, last name and their affiliation (where they work)ii) all_publications.tsv: This file contains the details of publications authored by the authors mentioned in the all_authors.tsv file (Please note the list of publications does not contain all the authored publications of the authors, refer to the publication for further details).The details include publications' identifier in different forms (such as number, dblp key, dblp url, dblp key, google scholar url), title, filtered title, published date, published conference and paper abstract.iii) selected_author_publications-information.tsv: This file consists of identifiers of authors and their publications. Here, we provide the information of selected authors and their publications used for our experiment.iv) selected_publication_citations-information.tsv: This file contains the information of the selected publications which consists of both citing and cited papers’ information used in our experiment. It consists of identifier of citing paper, identifier of cited paper, citation title, citation filtered title, the sentence before the citation is mentioned, citing sentence, the sentence after the citation is mentioned, citation position (section).Please note, it does not contain information of all the citations cited in the publications. For more detail, please refer to the paper.This dataset is for the use of research purposes only and if you use this dataset, please cite our paper "Capturing and exploiting citation knowledge for recommending recently published papers" due to be published in Web2Touch track 2020 (not yet published).

Search
Clear search
Close search
Google apps
Main menu