Facebook
TwitterThis file contains a national set of names and contact information for doctors, hospitals, clinics, and other facilities (known collectively as sources) from which medical evidence of record (MER) may be requested to support a claimant's disability application.
Facebook
TwitterIf a model utilized data from multiple categories, it was placed in each.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Self-citation analysis data based on PubMed Central subset (2002-2005) ---------------------------------------------------------------------- Created by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik on April 5th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-Citation is the Hallmark of Productive Authors, of Any Gender. PLOS ONE. It contains files for running the self citation analysis on articles published in PubMed Central between 2002 and 2005, collected in 2015. The dataset is distributed in the form of the following tab separated text files: * Training_data_2002_2005_pmc_pair_First.txt (1.2G) - Data for first authors * Training_data_2002_2005_pmc_pair_Last.txt (1.2G) - Data for last authors * Training_data_2002_2005_pmc_pair_Middle_2nd.txt (964M) - Data for middle 2nd authors * Training_data_2002_2005_pmc_pair_txt.header.txt - Header for the data * COLUMNS_DESC.txt file - Descriptions of all columns * model_text_files.tar.gz - Text files containing model coefficients and scores for model selection. * results_all_model.tar.gz - Model coefficient and result files in numpy format used for plotting purposes. v4.reviewer contains models for analysis done after reviewer comments. * README.txt file ## Dataset creation Our experiments relied on data from multiple sources including properitery data from Thompson Rueter's (now Clarivate Analytics) Web of Science collection of MEDLINE citations. Author's interested in reproducing our experiments should personally request from Clarivate Analytics for this data. However, we do make a similar but open dataset based on citations from PubMed Central which can be utilized to get similar results to those reported in our analysis. Furthermore, we have also freely shared our datasets which can be used along with the citation datasets from Clarivate Analytics, to re-create the datased used in our experiments. These datasets are listed below. If you wish to use any of those datasets please make sure you cite both the dataset as well as the paper introducing the dataset. * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * Citation data from PubMed Central (original paper includes additional citations from Web of Science) * Author-ity 2009 dataset: - Dataset citation: Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1 - Paper citation: Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304 - Paper citation: Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2004). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158. https://doi.org/10.1002/asi.20105 * Genni 2.0 + Ethnea for identifying author gender and ethnicity: - Dataset citation: Torvik, Vetle (2018): Genni + Ethnea for the Author-ity 2009 dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9087546_V1 - Paper citation: Smith, B. N., Singh, M., & Torvik, V. I. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’13. ACM Press. https://doi.org/10.1145/2467696.2467720 - Paper citation: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA. http://hdl.handle.net/2142/88927 * MapAffil for identifying article country of affiliation: - Dataset citation: Torvik, Vetle I. (2018): MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1 - Paper citation: Torvik VI. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. D-Lib magazine : the magazine of the Digital Library Forum. 2015;21(11-12):10.1045/november2015-torvik * IMPLICIT journal similarity: - Dataset citation: Torvik, Vetle (2018): Author-implicit journal, MeSH, title-word, and affiliation-word pairs based on Author-ity 2009. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4742014_V1 * Novelty dataset for identify article level novelty: - Dataset citation: Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1 - Paper citation: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : The Magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra - Code: https://github.com/napsternxg/Novelty * Expertise dataset for identifying author expertise on articles: * Source code provided at: https://github.com/napsternxg/PubMed_SelfCitationAnalysis Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions Additional data related updates can be found at Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Self-citation analysis data based on PubMed Central subset (2002-2005) by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/PubMed_SelfCitationAnalysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data source collection (e.g., COCI, DOCI, POCI, etc) information about all the citation data (in N-Triples format) included in the OpenCitations Index, released on July 10, 2025. In particular, any citation in the dataset, defined as an individual of the class cito:Citation, includes the following information:[property "prov:atLocation"] the data source entity identified by its URL (https://w3id.org/oc/index/[DATA-SOURCE]/);This version of the dataset contains:2,693,728,426 citationsThe size of the zipped archive is 25.7 GB, while the size of the unzipped N-Triples files is 426 GB.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data source collection (e.g., COCI, DOCI, POCI, etc) information about all the citation data (in CSV format) included in the OpenCitations Index, released on July 10, 2025. In particular, any citation in the dataset, defined with its corresponding OCI (first column) has a corresponding value that defines the source (second column), e.g. "coci", "doci", "poci", etc.This version of the dataset contains:2,693,728,426 citationsThe size of the zipped archive is 23 GB, while the size of the unzipped CSV files is 104 GB.
Facebook
TwitterThe text file "Air temperature.txt" contains hourly data and associated data-source flag from January 1, 1948, to September 30, 2015. The primary source of the data is the Argonne National Laboratory, Illinois. The first four columns give year, month, day and hour of the observation. Column 5 is the data in degrees Fahrenheit. Column 6 is the three-digit data-source flag. They indicate if the air temperature data are original or missing, the method that was used to fill the missing periods, and any other transformations of the data. These flags consist of a three-digit sequence in the form "xyz". The user of the data should consult Over and others (2010) for the detailed documentation of this hourly data-source flag series. Reference Cited: Over, T.M., Price, T.H., and Ishii, A.L., 2010, Development and analysis of a meteorological database, Argonne National Laboratory, Illinois: U.S. Geological Survey Open File Report 2010-1220, 67 p., http://pubs.usgs.gov/of/2010/1220/.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data file for the third release of the Data Citation Corpus, produced by DataCite and Make Data Count as part of an ongoing grant project funded by the Wellcome Trust. Read more about the project.
The data file includes 5,322,388 data citation records in JSON and CSV formats. The JSON file is the version of record.
For convenience, the data is provided in batches of approximately 1 million records each. The publication date and batch number are included in the file name, ex: 2025-02-01-data-citation-corpus-01-v3.0.json.
The data citations in the file originate from the following sources:
DataCite Event Data
A project by Chan Zuckerberg Initiative (CZI) to identify mentions to datasets in the full text of articles
Data citations identified Aligning Science Across Parkinson’s (ASAP)
Each data citation record is comprised of:
A pair of identifiers: An identifier for the dataset (a DOI or an accession number) and the DOI of the publication (journal article or preprint) in which the dataset is cited
Metadata for the cited dataset and for the citing publication
The data file includes the following fields:
Field
Description
Required?
id
Internal identifier for the citation
Yes
created
Date of item's incorporation into the corpus
Yes
updated
Date of item's most recent update in corpus
Yes
repository
Repository where cited data is stored
No
publisher
Publisher for the article citing the data
No
journal
Journal for the article citing the data
No
title
Title of cited data
No
publication
DOI of article where data is cited
Yes
dataset
DOI or accession number of cited data
Yes
publishedDate
Date when citing article was published
No
source
Source where citation was harvested
Yes
subjects
Subject information for cited data
No
affiliations
Affiliation information for creator of cited data
No
funders
Funding information for cited data
No
Additional documentation about the citations and metadata in the file is available on the Make Data Count website.
Notes on v3.0:
The third release of the Data Citation Corpus data file reflects a few changes made to add new citations, including those from a new data source (ASAP), update and enhance citation metadata, and improve the overall usability of the file. These changes are as follows:
Add and update Event Data citations:
Add 65,524 new data citations created in DataCite Event Data between August 2024 and December 2024
Add ASAP citations:
Add 750 new data citations provided by Aligning Science Across Parkinson’s (ASAP), identified through processes to evaluate compliance with ASAP’s for open science practices, which involve a partnership with DataSeer and internal curation (described here).
Citations with provenance from ASAP are identified as “asap” in the source field
Metadata enhancements:
Reconcile and normalize organization names for affiliations and funders in a subset of records with the Research Organization Registry (ROR)
Add ror_name and ror_id subfields for affiliations and funders in JSON files. Unreconciled affiliation and funder strings are identified with values of null
Add new columns affiliationsROR and fundersROR in CSV files. Unreconciled affiliation and funder strings are identified with values of NONE NONE (this is to ensure consistency in number and order of values in cases where some strings have been reconciled and others have not)
Normalize DOI formats for articles and papers as full URLs
Additional details about the above changes, including scripts used to perform the above tasks, are available in GitHub.
Additional enhancements to the corpus are ongoing and will be addressed in the course of subsequent releases. Users are invited to submit feedback via GitHub. For general questions, email info@makedatacount.org.
Facebook
TwitterThe text file "Wind speed.txt" contains hourly data and associated data-source flag from January 1, 1948, to September 30, 2015. The primary source of the data is the Argonne National Laboratory, Illinois. The first four columns give year, month, day and hour of the observation. Column 5 is the data in miles per hour. Column 6 is the three-digit data-source flag to identify the wind speed data processing and they indicate if the data are original or missing, the method that was used to fill the missing periods, and any other transformations of the data. The data-source flag consist of a three-digit sequence in the form "xyz" that describe the origin and transformations of the data values. The user of the data should consult Over and others (2010) for the detailed documentation of this hourly data-source flag series. Reference Cited: Over, T.M., Price, T.H., and Ishii, A.L., 2010, Development and analysis of a meteorological database, Argonne National Laboratory, Illinois: U.S. Geological Survey Open File Report 2010-1220, 67 p., http://pubs.usgs.gov/of/2010/1220/.
Facebook
TwitterGuide to Publicly Available Demographic Data This data source guide is a reference tool describing data important to workforce professionals. We created the guide because multiple federal and state organizations provide data relevant to workforce professionals; and skillful data use requires understanding: the sources of data how often it is collected, for what years it is available, and a link to the data release dates the geographic level of analysis (state, county, etc.) the variables included in the data how to access and use the data
Facebook
TwitterData licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
This data set shows all the sources recorded in the source register of North Rhine-Westphalia - independently managed by five institutions - or their sampling points based on the state's water stationing map (gsk3c). The attribute table provides information about the number, the location and the data holders of all within one Source surface represented objects and identifies the reference source. Sources from GeoBasis NRW - i.e. from the state survey - are always reference sources. All objects recorded within a radius of 10 m around the reference source are brought together under a source NRW_ID. Overlapping radii are combined to form a larger coherent headwaters. If there is no reference source from Geobasis NRW in an area, the source closest to the centroid represents the reference source.
Facebook
TwitterInformation on all data sets used in meta-analysis. Column “Name” refers to abbreviations used in Figure 2- and Figure S1–S5. “Study title” refers to name of original source article, with reference given in column “PMID [Ref]”. Column “No. samples” gives number of samples in each group in the relevant data set, with the following group names: N – Normal controls, CD – Crohn’s disease, UC – Ulcerative colitis, CDU – Un-inflamed Crohn’s disease and UCU – Un-inflamed ulcerative colitis. Column “Platform” refers to microarray technology used in the relevant analysis.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.en.htmlhttps://www.gnu.org/licenses/gpl-3.0.en.html
The reference data contains 65 referenced and cited literature sources from the writing process, including the title, author(s), source and other bibliographic information for each reference.The image data contains two images embedded within the body text of the paper.
Facebook
TwitterThis digital dataset release of the Tectonic Map of the Colorado Plateau is a courtesy publication of the previously published legacy report by V.C. Kelley in 1955. The original publication, "Tectonic Map of the Colorado Plateau Showing Uranium Deposits" contains elevation contours from the top of the Chinle formation in 1000 ft intervals and geologic structural formations such as monoclinal, synclinal, and anticlinal structures. The digitizing of this map is to provide a more accessible dataset to be available for public usage. The original dataset was in relation to a larger project by the University of New Mexico and their publications in geology of uranium distributions throughout the Colorado Plateau (Kelley, V.C., 1955, Regional tectonics of the Colorado Plateau and relationship to the origin and distribution of uranium: Albuquerque, University of New Mexico, Publications in Geology no. 5, 120 p., 1 sheet, scale 1:1,000,000.). The entirety of this dataset includes both spatial and non-spatial data held in a singular, GeMS compliant geodatabase. This geodatabase includes a geologic map feature dataset holding fault lines, iso value lines, structure contours, and other geologic lines; nonspatial data recorded in standalone tables such as a description of map units, glossary, data source reference, geomaterials dictionary, and their entities and attributes. Data source references include web links to published standards, data dictionaries, and any other referenced data within the published map. There is a final nonspatial table that is in reference to the original digitized and identified geologic structures per the legacy map plate, these structures were broken up by state (Arizona, Colorado, New Mexico, and Utah) with each structure given a numerical value (starting at 1, for each individual state) these structures were compiled into a synchronous excel document to provide a digital record of those structures and features listed on the legacy map plate.
Facebook
TwitterThe text file "Dewpoint temperature.txt" contains hourly data and associated data-source flag from January 1, 1948, to September 30, 2015. The primary source of the data is the Argonne National Laboratory, Illinois. The first four columns give year, month, day and hour of the observation. Column 5 is the data in degrees Fahrenheit. Column 6 is the data-source flag consist of a three-digit sequence in the form "xyz". They indicate if the dewpoint temperature data are original or missing, the method that was used to fill the missing periods, and any other transformations of the data. The user of the data should consult Over and others (2010) for the detailed documentation of this hourly data-source flag series. Reference Cited: Over, T.M., Price, T.H., and Ishii, A.L., 2010, Development and analysis of a meteorological database, Argonne National Laboratory, Illinois: U.S. Geological Survey Open File Report 2010-1220, 67 p., http://pubs.usgs.gov/of/2010/1220/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Part of VMREFTAB, the set of Reference Tables for the VICMAP suite of products.
Facebook
TwitterThe purpose of the California Wellness Plan (CWP) Data Reference Guide (Reference Guide) is to provide access to the lowest-level data for each CWP Objective; lowest-level data source, instructions to access data, and additional details are described. Some CWP Objectives do not have program leads, data sources, baselines, and/or targets, but are included because they were a result of CDPH program or partner input and were felt to be important to the reduction of chronic disease incidence, prevalence, and health disparities. Agencies, programs and/or partners identified with an objective may be either data stewards and/or engaged in activities to achieve the target, but may not have adequate resources for statewide activities. Developmental Objectives will be updated as information becomes available.
Background: The California Wellness Plan, California's Chronic Disease Prevention and Health Promotion Plan was released February 2014 by the California Department of Public Health (CDPH). The overarching goal of CWP is Equity in Health and Wellbeing; additional CWP Goals include: 1) Healthy Communities, 2) Optimal Health Systems Linked with Community Prevention, 3) Accessible and Usable Health Information, and 4) Prevention Sustainability and Capacity. All CWP objectives fall under the framework of Let's Get Healthy California Task Force priorities. California Wellness Plan
Green text in the “Objective” column indicates updates that were made to the California Wellness Plan objectives in 2016.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We use journal articles published by Scientometrics in 2016-2020 as the data source. Through the analysis of the data set usage records of scientometrics research, the frequency ranking of the usage of each dataset for information reference is listed, so as to provide a reference for the selection of data sets for scientometrics research.
Facebook
TwitterThis spreadsheet contains a list of component raster data layers that were used to compile our resistance surface, the classes of data represented within each of these rasters, and the resistance value we assigned to each class. It also provides a web reference for each data layer to provide additional context and information about the source datasets.
Please refer to the embedded spatial metadata and the information in our full report for details on the development of the resulting ResistanceSurface, as well as these component data layers:
ResistanceData_Roads ResistanceData_ForestedCover ResistanceData_Rivers ResistanceData_Waterbodies ResistanceData_NonForestedCover ResistanceData_BaysEstuaries ResistancePostProcessing_Serpentine
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Samples in this benchmark were generated by RELAI using the following data source(s): Data Source Name: numba Data Source Link: https://numba.readthedocs.io/en/stable/reference/index.html Data Source License: https://numba.readthedocs.io/en/stable/developer/repomap.html?highlight=license Data Source Authors: Anaconda, Inc. and others. AI Benchmarks by Data Agents. 2025 RELAI.AI. Licensed under CC BY 4.0. Source: https://relai.ai
Facebook
TwitterThis file contains a national set of names and contact information for doctors, hospitals, clinics, and other facilities (known collectively as sources) from which medical evidence of record (MER) may be requested to support a claimant's disability application.