Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary material to an analysis on data citation practices based on the Data Citation Index from Thomson Reuters. This database launched in 2012 aims to link data sets and data studies with citation received from the rest of their citation indexes. Funding bodies and research organizations are increasingly demanding the need of researchers to make their scientific data available in a reusable and reproducible manner, aiming to maximize the allocation of funding while providing transparency on the scientific process. The DCI harvests citations to research data from papers indexed in the Web of Knowledge. It relies on the information provided by the data repository as data citation practices are inconsistent or inexistent in many cases. The findings of this study show that data citation practices are far from common in most research fields.. Some differences have been reported on the way researchers cite data: while in the areas of Science and Engineering & Technology data sets were the most cited, in Social Sciences and Arts & Humanities data studies play a greater role. 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. While data citation practices are rare in most fields, they have expanded in disciplines such as Crystallography or Genomics. We conclude by emphasizing the role the DCI may play to encourage consistent and standardized citation of research data which will allow considering its use on following the research process developed by researchers, from data collection to publication.
The journals’ author guidelines and/or editorial policies were examined on whether they take a stance with regard to the availability of the underlying data of the submitted article. The mere explicated possibility of providing supplementary material along with the submitted article was not considered as a research data policy in the present study. Furthermore, the present article excluded source codes or algorithms from the scope of the paper and thus policies related to them are not included in the analysis of the present article.
For selection of journals within the field of neurosciences, Clarivate Analytics’ InCites Journal Citation Reports database was searched using categories of neurosciences and neuroimaging. From the results, journals with the 40 highest Impact Factor (for the year 2017) indicators were extracted for scrutiny of research data policies. Respectively, the selection journals within the field of physics was created by performing a similar search with the categories of physics, applied; physics, atomic, molecular & chemical; physics, condensed matter; physics, fluids & plasmas; physics, mathematical; physics, multidisciplinary; physics, nuclear and physics, particles & fields. From the results, journals with the 40 highest Impact Factor indicators were again extracted for scrutiny. Similarly, the 40 journals representing the field of operations research were extracted by using the search category of operations research and management.
Journal-specific data policies were sought from journal specific websites providing journal specific author guidelines or editorial policies. Within the present study, the examination of journal data policies was done in May 2019. The primary data source was journal-specific author guidelines. If journal guidelines explicitly linked to the publisher’s general policy with regard to research data, these were used in the analyses of the present article. If journal-specific research data policy, or lack of, was inconsistent with the publisher’s general policies, the journal-specific policies and guidelines were prioritized and used in the present article’s data. If journals’ author guidelines were not openly available online due to, e.g., accepting submissions on an invite-only basis, the journal was not included in the data of the present article. Also journals that exclusively publish review articles were excluded and replaced with the journal having the next highest Impact Factor indicator so that each set representing the three field of sciences consisted of 40 journals. The final data thus consisted of 120 journals in total.
‘Public deposition’ refers to a scenario where researcher deposits data to a public repository and thus gives the administrative role of the data to the receiving repository. ‘Scientific sharing’ refers to a scenario where researcher administers his or her data locally and by request provides it to interested reader. Note that none of the journals examined in the present article required that all data types underlying a submitted work should be deposited into a public data repositories. However, some journals required public deposition of data of specific types. Within the journal research data policies examined in the present article, these data types are well presented by the Springer Nature policy on “Availability of data, materials, code and protocols” (Springer Nature, 2018), that is, DNA and RNA data; protein sequences and DNA and RNA sequencing data; genetic polymorphisms data; linked phenotype and genotype data; gene expression microarray data; proteomics data; macromolecular structures and crystallographic data for small molecules. Furthermore, the registration of clinical trials in a public repository was also considered as a data type in this study. The term specific data types used in the custom coding framework of the present study thus refers to both life sciences data and public registration of clinical trials. These data types have community-endorsed public repositories where deposition was most often mandated within the journals’ research data policies.
The term ‘location’ refers to whether the journal’s data policy provides suggestions or requirements for the repositories or services used to share the underlying data of the submitted works. A mere general reference to ‘public repositories’ was not considered a location suggestion, but only references to individual repositories and services. The category of ‘immediate release of data’ examines whether the journals’ research data policy addresses the timing of publication of the underlying data of submitted works. Note that even though the journals may only encourage public deposition of the data, the editorial processes could be set up so that it leads to either publication of the research data or the research data metadata in conjunction to publishing of the submitted work.
This dataset describes how datasets published in the research data repository RADAR are referenced, combining references extracted from Google Scholar, DataCite Event Data and the Data Citation Corpus.
DOIs assigned to RADAR datasets were retrieved from the RADAR API 2025-01-27. References in the three data sources were then identified using these DOIs. Each research output referencing a RADAR dataset was accessed to determine where the reference occurred in the full text. Author names and publication dates for datasets and referencing objects were added from OpenAlex and DataCite on 2025-02-10. Author names of datasets and referencing objects were compared to determine if data reuse occurred.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data were generated for an investigation of research data repository (RDR) mentions in biuomedical research articles.
Supplementary Table 1 is a discrete subset of SciCrunch RDRs used to study RDR mentions in biomedical literature. We generated this list by starting with the top 1000 entries in the SciCrunch database, measured by citations, removed entries for organizations (such as universities without a corresponding RDR) or non-relevant tools (such as reference managers), updated links, and consolidated duplicates resulting from RDR mergers and name variations. The resulting list of 737 RDRs is shown in with as a base based on a source list of RDRs in the SciCrunch database. The file includes the Research Resource Identifier (RRID), the RDR name, and a link to the RDR record in the SciCrunch database.
Supplementary Table 2 shows the RDRs, associated journals, and article-mention pairs (records) with text snippets extracted from mined Methods text in 2020 PubMed articles. The dataset has 4 components. The first shows the list of repositories with RDR mentions, and includes the Research Resource Identifier (RRID), the RDR name, the number of articles that mention the RDR, and a link to the record in the SciCrunch database. The second shows the list of journals in the study set with at least 1 RDR mention, andincludes the Journal ID, nam, ESSN/ISSN, the total count of publications in 2020, the number of articles that had text available to mine, the number of article-mention pairs (records), number of articles with RDR mentions, the number of unique RDRs mentioned, % of articles with minable text. The third shows the top 200 journals by RDR mention, normalized by the proportion of articles with available text to mine, with the same metadata as the second table. The fourth shows text snippets for each RDR mention, and includes the RRID, RDR name, PubMedID (PMID), DOI, article publication date, journal name, journal ID, ESSN/ISSN, article title, and snippet.
This file collection is part of the ORD Landscape and Cost Analysis Project (DOI: 10.5281/zenodo.2643460), a study jointly commissioned by the SNSF and swissuniversities in 2018. Please cite this data collection as: von der Heyde, M. (2019). Data from the International Open Data Repository Survey. Retrieved from https://doi.org/10.5281/zenodo.2643493 Further information is given in the corresponding data paper: von der Heyde, M. (2019). International Open Data Repository Survey: Description of collection, collected data, and analysis methods [Data paper]. Retrieved from https://doi.org/10.5281/zenodo.2643450 Contact Swiss National Science Foundation (SNSF) Open Research Data Group E-mail: ord@snf.ch swissuniversities Program "Scientific Information" Gabi Schneider E-Mail: isci@swissuniversities.ch
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine-readable metadata available from landing pages for datasets facilitate data citation by enabling easy integration with reference managers and other tools used in a data citation workflow. Embedding these metadata using the schema.org standard with the JSON-LD is emerging as the community standard. This dataset is a listing of data repositories that have implemented this approach or are in the progress of doing so.
This is the first version of this dataset and was generated via community consultation. We expect to update this dataset, as an increasing number of data repositories adopt this approach, and we hope to see this information added to registries of data repositories such as re3data and FAIRsharing.
In addition to the listing of data repositories we provide information of the schema.org properties supported by these data repositories, focussing on the required and recommended properties from the "Data Citation Roadmap for Scholarly Data Repositories".
https://networkrepository.com/policy.phphttps://networkrepository.com/policy.php
Citation Networks
The data life cycle from experiments to scientific publications follows in general the schema: experiments, data analysis, interpretation, and publication of scientific paper. Beside the publication of scientific findings, it is important to keep the data investment and ensure its future processing. This implies a guarantee for a long-term preservation and preventing of data loss. Condensed and enriched with metadata, primary data would be a more valuable resource than the re-extraction from articles. In this context it becomes essential, to change the handling and the acceptance of primary data within the scientific community. Data and publications should be honored with a high attention and reputation for data publishers. Here, we present new features of the e!DAL Java API (http://edal.ipk-gatersleben.de) as a lightweight software framework for publishing and sharing of research data. Its main features are version tracking, management of metadata, information retrieval, registration of persistent identifier, embedded HTTP(S) server for public data access, access as network file system, and a scalable storage backend. e!DAL is available as an open-source API for a local nonshared storage and remote usage to feature distributed applications. IPK is an approved data center in the international DataCite consortium (http://www.datacite.org/) and applies e!DAL as data submission and registration system. In the latest version the focus was to extend the features for the registration of Digital Object Identifier (DOI) and the development of a simple, but sufficient approval process to regulate the assignment of persistent identifier. In addition we implement some new graphical components, like an easy installation/demo wizard, to simplify the establishment of a repositories using e!DAL. An intuitive publication tool (Figure 1), allows uploading your data into your own private repository over the web and getting a DOI to permanently reference the datasets and increase your “data citation” index.
No description is available. Visit https://dataone.org/datasets/urn%3Auuid%3Afc37b5f2-f69b-497e-bb85-048a19e9950a for complete metadata about this dataset.
Collected in this dataset are the slideset and abstract for a presentation on Toward a Reproducible Research Data Repository by the depositar team at International Symposium on Data Science 2023 (DSWS 2023), hosted by the Science Council of Japan in Tokyo on December 13-15, 2023. The conference was organized by the Joint Support-Center for Data Science Research (DS), Research Organization of Information and Systems (ROIS) and the Committee of International Collaborations on Data Science, Science Council of Japan. The conference programme is also included as a reference.
Toward a Reproducible Research Data Repository
Cheng-Jen Lee, Chia-Hsun Ally Wang, Ming-Syuan Ho, and Tyng-Ruey Chuang
Institute of Information Science, Academia Sinica, Taiwan
The depositar (https://data.depositar.io/) is a research data repository at Academia Sinica (Taiwan) open to researhers worldwide for the deposit, discovery, and reuse of datasets. The depositar software itself is open source and builds on top of CKAN. CKAN, an open source project initiated by the Open Knowledge Foundation and sustained by an active user community, is a leading data management system for building data hubs and portals. In addition to CKAN's out-of-the-box features such as JSON data API and in-browser preview of uploaded data, we have added several features to the depositar, including sourcing from Wikidata as dataset keywords, a citation snippet for datasets, in-browser Shapefile preview, and a persistent identifier system based on ARK (Archival Resource Keys). At the same time, the depositar team faces an increasing demand for interactive computing (e.g. Jupyter Notebook) which facilitates not just data analysis, but also for the replication and demonstration of scientific studies. Recently, we have provided a JupyterHub service (a multi-tenancy JupyterLab) to some of the depositar's users. However, it still requires users to first download the data files (or copy the URLs of the files) from the depositar, then upload the data files (or paste the URLs) to the Jupyter notebooks for analysis. Furthermore, a JupyterHub deployed on a single server is limited by its processing power which may lower the service level to the users. To address the above issues, we are integrating the BinderHub into the depositar. BinderHub (https://binderhub.readthedocs.io/) is a kubernetes-based service that allows users to create interactive computing environments from code repositories. Once the integration is completed, users will be able to launch Jupyter Notebooks to perform data analysis and vsualization without leaving the depositar by clicking the BinderHub buttons on the datasets. In this presentation, we will first make a brief introduction to the depositar and BinderHub along with their relationship, then we will share our experiences in incorporating interactive computation in a data repository. We shall also evaluate the possibility of integrating the depositar with other automation frameworks (e.g. the Snakemake workflow management system) in order to enable users to reproduce data analysis.
BinderHub, CKAN, Data Repositories, Interactive Computing, Reproducible Research
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Cline Center Global News Index is a searchable database of textual features extracted from millions of news stories, specifically designed to provide comprehensive coverage of events around the world. In addition to searching documents for keywords, users can query metadata and features such as named entities extracted using Natural Language Processing (NLP) methods and variables that measure sentiment and emotional valence. Archer is a web application purpose-built by the Cline Center to enable researchers to access data from the Global News Index. Archer provides a user-friendly interface for querying the Global News Index (with the back-end indexing still handled by Solr). By default, queries are built using icons and drop-down menus. More technically-savvy users can use Lucene/Solr query syntax via a ‘raw query’ option. Archer allows users to save and iterate on their queries, and to visualize faceted query results, which can be helpful for users as they refine their queries. Additional Resources: - Access to Archer and the Global News Index is limited to account-holders. If you are interested in signing up for an account, please fill out the Archer Access Request Form so we can determine if you are eligible for access or not. - Current users who would like to provide feedback, such as reporting a bug or requesting a feature, can fill out the Archer User Feedback Form. - The Cline Center sends out periodic email newsletters to the Archer Users Group. Please fill out this form to subscribe to it. Citation Guidelines: 1) To cite the GNI codebook (or any other documentation associated with the Global News Index and Archer) please use the following citation: Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [codebook], v1.3.0. Champaign, IL: University of Illinois. June. XX. doi:10.13012/B2IDB-5649852_V6 2) To cite data from the Global News Index (accessed via Archer or otherwise) please use the following citation (filling in the correct date of access): Cline Center for Advanced Social Research. 2025. Global News Index and Extracted Features Repository [database], v1.3.0. Champaign, IL: University of Illinois. Jun. XX. Accessed Month, DD, YYYY. doi:10.13012/B2IDB-5649852_V6 *NOTE: V6 is replacing V5 with updated ‘Archer’ documents to reflect changes made to the Archer system.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This document contains brief descriptions of many of the treatments found in the PTSD Repository, organized by treatment category.
Note: The download is a .zip file which contains the PDF Reference Guide.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This document contains the datasets and visualizations generated after the application of the methodology defined in our work: "A qualitative and quantitative citation analysis toward retracted articles: a case of study". The methodology defines a citation analysis of the Wakefield et al. [1] retracted article from a quantitative and qualitative point of view. The data contained in this repository are based on the first two steps of the methodology. The first step of the methodology (i.e. “Data gathering”) builds an annotated dataset of the citing entities, this step is largely discussed also in [2]. The second step (i.e. "Topic Modelling") runs a topic modeling analysis on the textual features contained in the dataset generated by the first step.
Note: the data are all contained inside the "method_data.zip" file. You need to unzip the file to get access to all the files and directories listed below.
Data gathering
The data generated by this step are stored in "data/":
"cits_features.csv": a dataset containing all the entities (rows in the CSV) which have cited the Wakefield et al. retracted article, and a set of features characterizing each citing entity (columns in the CSV). The features included are: DOI ("doi"), year of publication ("year"), the title ("title"), the venue identifier ("source_id"), the title of the venue ("source_title"), yes/no value in case the entity is retracted as well ("retracted"), the subject area ("area"), the subject category ("category"), the sections of the in-text citations ("intext_citation.section"), the value of the reference pointer ("intext_citation.pointer"), the in-text citation function ("intext_citation.intent"), the in-text citation perceived sentiment ("intext_citation.sentiment"), and a yes/no value to denote whether the in-text citation context mentions the retraction of the cited entity ("intext_citation.section.ret_mention"). Note: this dataset is licensed under a Creative Commons public domain dedication (CC0).
"cits_text.csv": this dataset stores the abstract ("abstract") and the in-text citations context ("intext_citation.context") for each citing entity identified using the DOI value ("doi"). Note: the data keep their original license (the one provided by their publisher). This dataset is provided in order to favor the reproducibility of the results obtained in our work.
Topic modeling We run a topic modeling analysis on the textual features gathered (i.e. abstracts and citation contexts). The results are stored inside the "topic_modeling/" directory. The topic modeling has been done using MITAO, a tool for mashing up automatic text analysis tools, and creating a completely customizable visual workflow [3]. The topic modeling results for each textual feature are separated into two different folders, "abstracts/" for the abstracts, and "intext_cit/" for the in-text citation contexts. Both the directories contain the following directories/files:
"mitao_workflows/": the workflows of MITAO. These are JSON files that could be reloaded in MITAO to reproduce the results following the same workflows.
"corpus_and_dictionary/": it contains the dictionary and the vectorized corpus given as inputs for the LDA topic modeling.
"coherence/coherence.csv": the coherence score of several topic models trained on a number of topics from 1 - 40.
"datasets_and_views/": the datasets and visualizations generated using MITAO.
References
Wakefield, A., Murch, S., Anthony, A., Linnell, J., Casson, D., Malik, M., Berelowitz, M., Dhillon, A., Thomson, M., Harvey, P., Valentine, A., Davies, S., & Walker-Smith, J. (1998). RETRACTED: Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. The Lancet, 351(9103), 637–641. https://doi.org/10.1016/S0140-6736(97)11096-0
Heibi, I., & Peroni, S. (2020). A methodology for gathering and annotating the raw-data/characteristics of the documents citing a retracted article v1 (protocols.io.bdc4i2yw) [Data set]. In protocols.io. ZappyLab, Inc. https://doi.org/10.17504/protocols.io.bdc4i2yw
Ferri, P., Heibi, I., Pareschi, L., & Peroni, S. (2020). MITAO: A User Friendly and Modular Software for Topic Modelling [JD]. PuntOorg International Journal, 5(2), 135–149. https://doi.org/10.19245/25.05.pij.5.2.3
This dataset describes how datasets published in the research data repository RADAR are referenced, combining references extracted from Google Scholar, DataCite Event Data and the Data Citation Corpus. DOIs assigned to RADAR datasets were retrieved from the RADAR API 2025-01-27. References in the three data sources were then identified using these DOIs. Each research output referencing a RADAR dataset was accessed to determine where the reference occurred in the full text. Author names and publication dates for datasets and referencing objects were added from OpenAlex and DataCite on 2025-02-10. Author names of datasets and referencing objects were compared to determine if data reuse occurred. Columns
IG PAS Data Portal is an open data platform for Institute of Geophysics Polish Academy of Sciences. This Portal is used to collect and share IG PAS data, taking into account access to the data from external repositories. IG PAS data are open and provided for free, use in non-commercial purpose under applicable laws on data sharing public and publicly-funded. In the case of using data by external entities, the data source must be cited following the IG PAS guidelines contained in the metadata of the IG PAS Data Portal (information on how to cite data is included in the Dataset Citation field). This Data Portal is currently a work in progress, and we will be adding much more data of IG PAS in the coming months as well as developing a host of new interactive features where users can explore the data in several ways. Any feedback, request for data and other comments are very welcome. Please use the following email address for contact: data_steward (at) igf.edu.pl
The datacitation extension for CKAN aims to facilitate proper data citation practices within the CKAN data catalog ecosystem. By providing tools and features to create and manage citations for datasets, the extension promotes discoverability and acknowledgment of data sources, enhancing the reproducibility and transparency of research and analysis based on these datasets. The available information is limited, but based on the name, the extension likely focuses on generating, displaying, and potentially exporting citation information. Key Features (Assumed based on Extension Name): * Dataset Citation Generation: Likely provides functionality to automatically generate citation strings for datasets based on metadata fields, adhering to common citation formats (e.g., APA, MLA, Chicago). * Citation Metadata Management: Potentially offers tools to manage citation-related metadata within datasets, such as author names, publication dates, and version numbers, which are essential elements for creating accurate citations. * Citation Display on Dataset Pages: It's reasonable to expect that the extension displays the generated citation information prominently on the dataset's display page, facilitating easy access for users. * Citation Export Options: May provide options to export citations in various formats (e.g., BibTeX, RIS) to integrate with reference management software popular among researchers. * Citation Style Customization: Possibly provides configuration options to customize the citation style used for generation, accommodating different disciplinary requirements. Use Cases (Inferred): 1. Research Data Repositories: Data repositories can utilize datacitation to ensure that researchers cite datasets correctly, which is crucial for tracking the impact of data and recognizing the contributions of data creators. 2. Government Data Portals: Government agencies can implement the extension to promote the proper use and attribution of open government datasets, fostering transparency and accountability. Technical Integration: Due to limited information, the integration details are speculative. However, it can be assumed that the datacitation extension likely integrates with CKAN by: * Adding a new plugin or module to CKAN that handles citation generation and display. * Extending the CKAN dataset schema to include citation-related metadata fields. * Potentially providing API endpoints for programmatic access to citation information. Benefits & Impact: The anticipated benefits of the datacitation extension include: * Improved data discoverability and reusability through proper citation practices. * Enhanced research reproducibility and transparency by ensuring that data sources are properly acknowledged. * Increased recognition of data creators and contributors. * Simplified citation management for users of CKAN-based data catalogs. Disclaimer: The above information is largely based on assumptions derived from the extension's name and common data citation practices. The actual features and capabilities of the datacitation extension may vary due to the unavailability of a README file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Within the ESA funded WorldCereal project we have built an open harmonized reference data repository at global extent for model training or product validation in support of land cover and crop type mapping. Data from 2017 onwards were collected from many different sources and then harmonized, annotated and evaluated. These steps are explained in the harmonization protocol (10.5281/zenodo.7584463). This protocol also clarifies the naming convention of the shape files and the WorldCereal attributes (LC, CT, IRR, valtime and sampleID) that were added to the original data sets.
This publication includes those harmonized data sets of which the original data set was published under the CC-BY license or a license similar to CC-BY. See document "_In-situ-data-World-Cereal - license - CC-BY.pdf" for an overview of the original data sets.
Current strategies for primary data management towards effective data publication, retrieval, sharing and citation require enhanced platforms. One driving force is the recent developments in biotechnology. It is attended by a strong growth of scientific primary data. For example, “Next-Generation-Sequencing” or “Plant-Phenotyping” technologies produce a huge amount of primary data. Its analysis and publication is one pillar in modern life science research. Consequently, the responsible use and efficient availability of digital resources is an important factor in the nowadays “e-science” age. The JAVA-based e!DAL-API is a comprehensive storage backend for primary data management. It provides main features for the long-term preservation of scientific primary data and has been designed and tested using experiences from several research projects and literature studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file collection is part of the ORD Landscape and Cost Analysis Project (DOI: 10.5281/zenodo.2643460), a study jointly commissioned by the SNSF and swissuniversities in 2018.
Please cite this data collection as: von der Heyde, M. (2019). Data and tools of the landscape and cost analysis of data repositories currently used by the Swiss research community. Retrieved from https://doi.org/10.5281/zenodo.2643495
Connected data papers are: von der Heyde, M. (2019). Open Data Landscape: Repository Usage of the Swiss Research Community: Description of collection, collected data, and analysis methods [Data paper]. Retrieved from https://doi.org/10.5281/zenodo.2643430 von der Heyde, M. (2019). International Open Data Repository Survey: Description of collection, collected data, and analysis methods [Data paper]. Retrieved from https://doi.org/10.5281/zenodo.2643450
Connected data sets are: von der Heyde, M. (2019). Data from the Swiss Open Data Repository Landscape survey. Retrieved from https://doi.org/10.5281/zenodo.2643487 von der Heyde, M. (2019). Data from the International Open Data Repository Survey. Retrieved from https://doi.org/10.5281/zenodo.2643493
Contact
Swiss National Science Foundation (SNSF)
Open Research Data Group
E-mail: ord@snf.ch
swissuniversities
Program "Scientific Information"
Gabi Schneider
E-Mail: isci@swissuniversities.ch
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Dataset available only to University of Arizona affiliates. To obtain access, you must log in to ReDATA with your NetID. Data is for research use by each individual downloader only. Sharing and/or redistribution of any portion of this dataset is prohibited.This ReferenceUSA dataset from Data Axle (formerly Infogroup) contains household data about US consumers in annual snapshots from 2006-2021. It includes details such as family demographics, income, home ownership status, lifestyle, location and more, which can help users to create marketing plans and conduct competitive analyses.Consumer profiles are described with 58-66 indicators. Data for all states are combined into single files for each year between 2006 and 2012 while there is a file for each state in 2013-2021. The Layout - Consumer DB Historical 2006-2012.xlsx in Documentation.zip applies to 2006-2012. Codebooks for 2013, 2014, 2015, 2017, 2018, 2019 and 2021 are not included but files in 2013-2021 have similar layouts therefore 2016 Historical Residential File Layout.xlsx and 2020 Historical Residential File Layout.xlsx in Documentation.zip apply to 2013-2021.The University of Arizona University Libraries also subscribe to Data Axle Reference Solutions which provides this data in a searchable, online database with historical data available going back to 2003.NOTE: The uncompressed datasets are very large.Detailed file descriptions and MD5 hash values for each file can be found in the README.txt file.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary material to an analysis on data citation practices based on the Data Citation Index from Thomson Reuters. This database launched in 2012 aims to link data sets and data studies with citation received from the rest of their citation indexes. Funding bodies and research organizations are increasingly demanding the need of researchers to make their scientific data available in a reusable and reproducible manner, aiming to maximize the allocation of funding while providing transparency on the scientific process. The DCI harvests citations to research data from papers indexed in the Web of Knowledge. It relies on the information provided by the data repository as data citation practices are inconsistent or inexistent in many cases. The findings of this study show that data citation practices are far from common in most research fields.. Some differences have been reported on the way researchers cite data: while in the areas of Science and Engineering & Technology data sets were the most cited, in Social Sciences and Arts & Humanities data studies play a greater role. 88.1% of the records have received no citation, but some repositories show very low uncitedness rates. While data citation practices are rare in most fields, they have expanded in disciplines such as Crystallography or Genomics. We conclude by emphasizing the role the DCI may play to encourage consistent and standardized citation of research data which will allow considering its use on following the research process developed by researchers, from data collection to publication.