100+ datasets found
  1. d

    Data Definition Guidelines

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Data Definition Guidelines [Dataset]. https://catalog.data.gov/dataset/data-definition-guidelines
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    ACF Agency Wide resource Metadata-only record linking to the original dataset. Open original dataset below.

  2. Z

    Dataset: A Systematic Literature Review on the topic of High-value datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    University of the Aegean
    Gdańsk University of Technology
    University of Zagreb
    University of Tartu
    Authors
    Anastasija Nikiforova; Nina Rizun; Magdalena Ciesielska; Charalampos Alexopoulos; Andrea Miletič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

    The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

    Methodology

    To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

    These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

    To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

    Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

    Description of the data in this data set

    Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

    The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

    Descriptive information
    1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

    Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

    Quality- and relevance- related information
    17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

    HVD determination-related information
    19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

    Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

    Licenses or restrictions CC-BY

    For more info, see README.txt

  3. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  4. Medical Service Study Area Data Dictionary

    • gis.data.chhs.ca.gov
    • data.ca.gov
    • +4more
    Updated Sep 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CA Department of Health Care Access and Information (2024). Medical Service Study Area Data Dictionary [Dataset]. https://gis.data.chhs.ca.gov/datasets/hcai::medical-service-study-area-data-dictionary
    Explore at:
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    Department of Health Care Access and Information
    Authors
    CA Department of Health Care Access and Information
    Description

    Field Name Data Type Description

    Statefp Number US Census Bureau unique identifier of the state

    Countyfp Number US Census Bureau unique identifier of the county

    Countynm Text County name

    Tractce Number US Census Bureau unique identifier of the census tract

    Geoid Number US Census Bureau unique identifier of the state + county + census tract

    Aland Number US Census Bureau defined land area of the census tract

    Awater Number US Census Bureau defined water area of the census tract

    Asqmi Number Area calculated in square miles from the Aland

    MSSAid Text ID of the Medical Service Study Area (MSSA) the census tract belongs to

    MSSAnm Text Name of the Medical Service Study Area (MSSA) the census tract belongs to

    Definition Text Type of MSSA, possible values are urban, rural and frontier.

    TotalPovPop Number US Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701

  5. d

    Data from: Data Dictionary Template

    • catalog.data.gov
    • data-academy.tempe.gov
    • +8more
    Updated Mar 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2023). Data Dictionary Template [Dataset]. https://catalog.data.gov/dataset/data-dictionary-template-2e170
    Explore at:
    Dataset updated
    Mar 18, 2023
    Dataset provided by
    City of Tempe
    Description

    Data Dictionary template for Tempe Open Data.

  6. Z

    Conceptualization of public data ecosystems

    • data.niaid.nih.gov
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
    Explore at:
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    University of Tartu
    University of Hradec Králové
    Authors
    Anastasija, Nikiforova; Martin, Lnenicka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

    As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

    This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

    Description of the data in this data set

    PublicDataEcosystem_SLR provides the structure of the protocol

    Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

    Spreadsheets #2 provides the protocol structure.

    Spreadsheets #3 provides the filled protocol for relevant studies.

    The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

    Descriptive Information

    Article number

    A study number, corresponding to the study number assigned in an Excel worksheet

    Complete reference

    The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

    Year of publication

    The year in which the study was published.

    Journal article / conference paper / book chapter

    The type of the paper, i.e., journal article, conference paper, or book chapter.

    Journal / conference / book

    Journal article, conference, where the paper is published.

    DOI / Website

    A link to the website where the study can be found.

    Number of words

    A number of words of the study.

    Number of citations in Scopus and WoS

    The number of citations of the paper in Scopus and WoS digital libraries.

    Availability in Open Access

    Availability of a study in the Open Access or Free / Full Access.

    Keywords

    Keywords of the paper as indicated by the authors (in the paper).

    Relevance for our study (high / medium / low)

    What is the relevance level of the paper for our study

    Approach- and research design-related information

    Approach- and research design-related information

    Objective / Aim / Goal / Purpose & Research Questions

    The research objective and established RQs.

    Research method (including unit of analysis)

    The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

    Study’s contributions

    The study’s contribution as defined by the authors

    Qualitative / quantitative / mixed method

    Whether the study uses a qualitative, quantitative, or mixed methods approach?

    Availability of the underlying research data

    Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

    Period under investigation

    Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

    Use of theory / theoretical concepts / approaches? If yes, specify them

    Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

    Quality-related information

    Quality concerns

    Whether there are any quality concerns (e.g., limited information about the research methods used)?

    Public Data Ecosystem-related information

    Public data ecosystem definition

    How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

    Public data ecosystem evolution / development

    Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

    What constitutes a public data ecosystem?

    What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

    Components and relationships

    What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

    Stakeholders

    What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

    Actors and their roles

    What actors does the public data ecosystem involve? What are their roles?

    Data (data types, data dynamism, data categories etc.)

    What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

    Processes / activities / dimensions, data lifecycle phases

    What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

    Level (if relevant)

    What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

    Other elements or relationships (if any)

    What other elements or relationships does the public data ecosystem consist of?

    Additional comments

    Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

    New papers

    Does the study refer to any other potentially relevant papers?

    Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

    Format of the file.xls, .csv (for the first spreadsheet only), .docx

    Licenses or restrictionsCC-BY

    For more info, see README.txt

  7. English Wikipedia People Dataset

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2025). English Wikipedia People Dataset [Dataset]. https://www.kaggle.com/datasets/wikimedia-foundation/english-wikipedia-people-dataset
    Explore at:
    zip(4293465577 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    Authors
    Wikimedia
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains biographical information derived from articles on English Wikipedia as it stood in early June 2024. It was created as part of the Structured Contents initiative at Wikimedia Enterprise and is intended for evaluation and research use.

    The beta sample dataset is a subset of the Structured Contents Snapshot focusing on people with infoboxes in EN wikipedia; outputted as json files (compressed in tar.gz).

    We warmly welcome any feedback you have. Please share your thoughts, suggestions, and any issues you encounter on the discussion page for this dataset here on Kaggle.

    Data Structure

    • File name: wme_people_infobox.tar.gz
    • Size of compressed file: 4.12 GB
    • Size of uncompressed file: 21.28 GB

    Noteworthy Included Fields: - name - title of the article. - identifier - ID of the article. - image - main image representing the article's subject. - description - one-sentence description of the article for quick reference. - abstract - lead section, summarizing what the article is about. - infoboxes - parsed information from the side panel (infobox) on the Wikipedia article. - sections - parsed sections of the article, including links. Note: excludes other media/images, lists, tables and references or similar non-prose sections.

    The Wikimedia Enterprise Data Dictionary explains all of the fields in this dataset.

    Stats

    Infoboxes - Compressed: 2GB - Uncompressed: 11GB

    Infoboxes + sections + short description - Size of compressed file: 4.12 GB - Size of uncompressed file: 21.28 GB

    Article analysis and filtering breakdown: - total # of articles analyzed: 6,940,949 - # people found with QID: 1,778,226 - # people found with Category: 158,996 - people found with Biography Project: 76,150 - Total # of people articles found: 2,013,372 - Total # people articles with infoboxes: 1,559,985 End stats - Total number of people articles in this dataset: 1,559,985 - that have a short description: 1,416,701 - that have an infobox: 1,559,985 - that have article sections: 1,559,921

    This dataset includes 235,146 people articles that exist on Wikipedia but aren't yet tagged on Wikidata as instance of:human.

    Maintenance and Support

    This dataset was originally extracted from the Wikimedia Enterprise APIs on June 5, 2024. The information in this dataset may therefore be out of date. This dataset isn't being actively updated or maintained, and has been shared for community use and feedback. If you'd like to retrieve up-to-date Wikipedia articles or data from other Wikiprojects, get started with Wikimedia Enterprise's APIs

    Initial Data Collection and Normalization

    The dataset is built from the Wikimedia Enterprise HTML “snapshots”: https://enterprise.wikimedia.com/docs/snapshot/ and focuses on the Wikipedia article namespace (namespace 0 (main)).

    Who are the source language producers?

    Wikipedia is a human generated corpus of free knowledge, written, edited, and curated by a global community of editors since 2001. It is the largest and most accessed educational resource in history, accessed over 20 billion times by half a billion people each month. Wikipedia represents almost 25 years of work by its community; the creation, curation, and maintenance of millions of articles on distinct topics. This dataset includes the biographical contents of English Wikipedia language editions: English https://en.wikipedia.org/, written by the community.

    Attribution

    Terms and conditions

    Wikimedia Enterprise provides this dataset under the assumption that downstream users will adhere to the relevant free culture licenses when the data is reused. In situations where attribution is required, reusers should identify the Wikimedia project from which the content was retrieved as the source of the content. Any attribution should adhere to Wikimedia’s trademark policy (available at https://foundation.wikimedia.org/wiki/Trademark_policy) and visual identity guidelines (ava...

  8. d

    Natural Resources Data Dictionary

    • catalog.data.gov
    • datasets.ai
    • +4more
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lake County Illinois GIS (2023). Natural Resources Data Dictionary [Dataset]. https://catalog.data.gov/dataset/natural-resources-data-dictionary-aeff9
    Explore at:
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    Lake County Illinois GIS
    Description

    An in-depth description of the various Natural Resources GIS data layers outlining terms of use, update frequency, attribute explanations, and more. District data layers include: Forest Preserve Boundaries and State Park Boundaries.

  9. f

    20CDA35310305 data information (README and data dictionary)

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Burchat, Natalie; Sampath, Harini (2024). 20CDA35310305 data information (README and data dictionary) [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001287002
    Explore at:
    Dataset updated
    Aug 30, 2024
    Authors
    Burchat, Natalie; Sampath, Harini
    Description

    An intestine-specific Scd1 knockout model was developed by crossing Scd1fl/fl mice with mice expressing Cre recombinase under the control of the Villin promoter to study the specific role of intestinal SCD1 in intestinal and whole-body lipid metabolism. The intestinal, hepatic and plasma lipid content and composition of these mice were evaluated by GC-MS analysis under chow fed and sucrose refed conditions. The role of intestinal SCD1 in the regulation of energy balance was also evaluated under chow fed and high-fat conditions. Bile acid content, composition, and signaling was analyzed. Additionally, metabolic phenotyping including body composition, indirect calorimetry and glucose tolerance analyses were conducted.

  10. f

    Data from: Variable definition.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min, Liangyu; Huang, Xiaohong; Zhang, Xiaorong; Zhang, Jun; Zeng, Qianqian; Liu, Jiangwei (2023). Variable definition. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000998892
    Explore at:
    Dataset updated
    Mar 17, 2023
    Authors
    Min, Liangyu; Huang, Xiaohong; Zhang, Xiaorong; Zhang, Jun; Zeng, Qianqian; Liu, Jiangwei
    Description

    The impact of a chief executive officer’s (CEO’s) functional experience on firm performance has gained the attention of many scholars. However, the measurement of functional experience is rarely disclosed in the public database. Few studies have been conducted on the comprehensive functional experience of CEOs. This paper used the upper echelons theory and obtained deep-level curricula vitae (CVs) data through the named entity recognition technique. First, we mined 15 consecutive years of CEOs’ CVs from 2006 to 2020 from Chinese listed companies. Second, we extracted information throughout their careers and automatically classified their functional hierarchy. Finally, we constructed breadth (functional breadth: functional experience richness) and depth (functional depth: average tenure and the hierarchy of function) for empirical analysis. We found that a CEO’s breadth is significantly negatively related to firm performance, and the quadratic term is significantly positive. A CEO’s depth is significantly positively related to firm performance, and the quadratic term is significantly negative. The research results indicate a u-shaped relationship between a CEO’s breadth and firm performance and an inverted u-shaped relationship between their depth and firm performance. The study’s findings extend the literature on factors influencing firm performance and CEOs’ functional experience. The study expands from the horizontal macro to the vertical micro level, providing new evidence to support the recruitment and selection of high-level corporate talent.

  11. TxDOT Street Definition Data Dictionary

    • geoportal-mpo.opendata.arcgis.com
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Texas Department of Transportation (2025). TxDOT Street Definition Data Dictionary [Dataset]. https://geoportal-mpo.opendata.arcgis.com/documents/2c7c512e64334fb49884613fe745b406
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    Texas Department of Transportationhttp://txdot.gov/
    Description

    Programmatically generated Data Dictionary document detailing the TxDOT Street Definition service.

        The PDF contains service metadata and a complete list of data fields.
        For any questions or issues related to the document, please contact the data owner of the service identified in the PDF and Credits of this portal item.
    
    
      Related Links
      TxDOT Street Definition Service URL
      TxDOT Street Definition Portal Item
    
  12. Wikipedia Structured Contents

    • kaggle.com
    zip
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2025). Wikipedia Structured Contents [Dataset]. https://www.kaggle.com/datasets/wikimedia-foundation/wikipedia-structured-contents
    Explore at:
    zip(25121685657 bytes)Available download formats
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    Authors
    Wikimedia
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Summary Early beta release of pre-parsed English and French Wikipedia articles including infoboxes. Inviting feedback.

    This dataset contains all articles of the English and French language editions of Wikipedia, pre-parsed and outputted as structured JSON files with a consistent schema. Each JSON line holds the content of one full Wikipedia article stripped of extra markdown and non-prose sections (references, etc.).

    Invitation for Feedback The dataset is built as part of the Structured Contents initiative and based on the Wikimedia Enterprise html snapshots. It is an early beta release to improve transparency in the development process and request feedback. This first version includes pre-parsed Wikipedia abstracts, short descriptions, main images links, infoboxes and article sections, excluding non-prose sections (e.g. references). More elements (such as lists and tables) may be added over time. For updates follow the project’s blog and our Mediawiki Quarterly software updates on MediaWiki. As this is an early beta release, we highly value your feedback to help us refine and improve this dataset. Please share your thoughts, suggestions, and any issues you encounter either on the discussion page of Wikimedia Enterprise’s homepage on Meta wiki, or on the discussion page for this dataset here on Kaggle.

    The contents of this dataset of Wikipedia articles is collectively written and curated by a global volunteer community. All original textual content is licensed under the GNU Free Documentation License (GFDL) and the Creative Commons Attribution-Share-Alike 4.0 License. Some text may be available only under the Creative Commons license; see the Wikimedia Terms of Use for details. Text written by some authors may be released under additional licenses or into the public domain.

    The dataset in its structured form is generally helpful for a wide variety of tasks, including all phases of model development, from pre-training to alignment, fine-tuning, updating/RAG as well as testing/benchmarking. We would love to hear more about your use cases.

    Data Fields The data fields are the same among all, noteworthy included fields: name - title of the article. identifier - ID of the article. url - URL of the article. version: metadata related to the latest specific revision of the article version.editor - editor-specific signals that can help contextualize the revision version.scores - returns assessments by ML models on the likelihood of a revision being reverted. main entity - Wikidata QID the article is related to. abstract - lead section, summarizing what the article is about. description - one-sentence description of the article for quick reference. image - main image representing the article's subject. infoboxes - parsed information from the side panel (infobox) on the Wikipedia article. sections - parsed sections of the article, including links. Note: excludes other media/images, lists, tables and references or similar non-prose sections. Full data dictionary is available here: https://enterprise.wikimedia.com/docs/data-dictionary/

    Curation Rationale This dataset has been created as part of the larger Structured Contents initiative at Wikimedia Enterprise with the aim of making Wikimedia data more machine readable. These efforts are both focused on pre-parsing Wikipedia snippets as well as connecting the different projects closer together. Even if Wikipedia is very structured to the human eye, it is a non-triv...

  13. n

    National concept directory in National data catalogue

    • data.norge.no
    json
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Digitaliseringsdirektoratet (2025). National concept directory in National data catalogue [Dataset]. https://data.norge.no/en/datasets/8fbe9c6d-4962-3362-9952-62d9d7ce17bf/national-concept-directory-in-national-data-catalogue
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 9, 2025
    Dataset provided by
    Digitaliseringsdirektoratet
    Description

    The data set "National concept directory in National data catalogue" (Begrepskatalog i Felles datakatalog) contains all terms published in National concept directory in National data catalogue. Each term contains at least information about the recommended term, definition and source of definition. The terms may also include the following information if the owner of the concept has provided such information: additional information about the meaning of the term that does not belong in the definition field; permitted and advised term, example on use of the term, subject area the term belongs to, area of ​​application, legal categories or value ranges of the term, the date the term is valid from, the date the term shall apply to and contact information by e-mail and telephone.

    Objective: To make all concepts in the National concept directory in National data catalogue available for downloading

  14. c

    Data from: Delta Neighborhood Physical Activity Study

    • s.cnmilf.com
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.

  15. CRITEO FAIRNESS IN JOB ADS DATASET

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Abdur Rahman (2024). CRITEO FAIRNESS IN JOB ADS DATASET [Dataset]. https://www.kaggle.com/datasets/borhanitrash/fairness-in-job-ads-dataset
    Explore at:
    zip(201430692 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Md. Abdur Rahman
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset is released by Criteo to foster research and innovation on Fairness in Advertising and AI systems in general. See also Criteo pledge for Fairness in Advertising.

    The dataset is intended to learn click predictions models and evaluate by how much their predictions are biased between different gender groups.

    Data description

    The dataset contains pseudononymized users' context and publisher features that was collected from a job targeting campaign ran for 5 months by Criteo AdTech company. Each line represents a product that was shown to a user. Each user has an impression session where they can see several products at the same time. Each product can be clicked or not clicked by the user. The dataset consists of 1072226 rows and 55 columns.

    • features
      • user_id is a unique identifier assigned to each user. This identifier has been anonymized and does not contain any information related to the real users.
      • product_id is a unique identifier assigned to each product, i.e. job offer.
      • impression_id is a unique identifier assigned to each impression, i.e. online session that can have several products at the same time.
      • cat0 to cat5 are anonymized categorical user features.
      • cat6 to cat12 are anonymized categorical product features.
      • num13 to num47 are anonymized numerical user features.
    • labels
      • protected_attribute is a binary feature that describes user gender proxy, i.e. female is 0, male is 1. The detailed description on the meaning can be found below.
      • senior is a binary feature that describes the seniority of the job position, i.e. an assistant role is 0, a managerial role is 1. This feature was created during data processing step from the product title feature: if the product title contains words describing managerial role (e.g. 'president', 'ceo', and others), it is assigned to 1, otherwise to 0.
      • rank is a numerical feature that corresponds to the positional rank of the product on the display for given impression_id. Usually, the position on the display creates the bias with respect to the click: lower rank means higher position of the product on the display.
      • displayrandom is a binary feature that equals 1 if the display position on the banner of the products associated with the same impression_id was randomized. The click-rank metric should be computed on displayrandom = 1 to avoid positional bias.
      • click is a binary feature that equals 1 if the product product_id in the impression impression_id was clicked by the user user_id.

    Data statistics

    dimensionaverage
    click0.077
    protected attribute0.500
    senior0.704

    License

    The data is released under the CC-BY-NC-SA 4.0 license. You are free to Share and Adapt this data provided that you respect the Attribution, NonCommercial and ShareAlike conditions. Please read carefully the full license before using.

    Protected attribute

    As Criteo does not have access to user demographics we report a proxy of gender as protected attribute. This proxy is reported as binary for simplicity yet we acknowledge gender is not necessarily binary.

    The value of the proxy is computed as the majority of gender attributes of products seen in the user timeline. Product having a gender attribute are typically fashion and clothing. We acknowledge that this proxy does not necessarily represent how users relate to a given gender yet we believe it to be a realistic approximation for research purposes.

    We encourage research in Fairness defined with respect to other attributes as well.

    Limitations and interpretations

    We remark that the proposed gender proxy does not give a definition of the gender. Since we do not have access to the sensitive information, this is the best solution we have identified at this stage to idenitify bias on pseudonymised data, and we encourage any discussion on better approximations. This proxy is reported as binary for simplicity yet we acknowledge gender is not necessarily binary. Although our research focuses on gender, this should not diminish the importance of investigating other types of algorithmic discrimination. While this dataset provides important application of fairness-aware algorithms in a high-risk domain, there are several fundamental limitation that can not be addressed easily through data collection or curation processes. These limitations in...

  16. Data dictionary from: Gridded National Soil Survey Geographic Database...

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ag Data Commons (2023). Data dictionary from: Gridded National Soil Survey Geographic Database (gNATSGO) [Dataset]. http://doi.org/10.6084/m9.figshare.19108361.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ag Data Commons
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data dictionary for Gridded National Soil Survey Geographic Database (gNATSGO). https://data.nal.usda.gov/node/23067gNATSGO has a schema that is very similar to that of SSURGO and STATSGO2. A CSV version of the data dictionary is presented.A data dictionary typically provides a detailed description for each element or variable in a dataset or data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description.Dataset citation: (dataset) Soil Survey Staff. Gridded National Soil Survey Geographic (gNATSGO) Database for [State name -or- the Conterminous United States]. United States Department of Agriculture, Natural Resources Conservation Service. Available online at https://nrcs.app.box.com/v/soils. Month, day, year.

  17. d

    Trail Centerline Data Dictionary

    • catalog.data.gov
    • data-test-lakecountyil.opendata.arcgis.com
    • +1more
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lake County Illinois GIS (2023). Trail Centerline Data Dictionary [Dataset]. https://catalog.data.gov/dataset/trail-centerline-data-dictionary-5f8be
    Explore at:
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    Lake County Illinois GIS
    Description

    An in-depth description of the Trail Centerline GIS dataset outlining terms of use, update frequency, attribute explanations, and more.

  18. U

    Elevation, Flow Accumulation, Flow Direction, and Stream Definition Data in...

    • data.usgs.gov
    • datasets.ai
    • +2more
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lindsey Schafer; Jennifer Sharpe (2023). Elevation, Flow Accumulation, Flow Direction, and Stream Definition Data in Support of the Illinois StreamStats Upgrade to the Basin Delineation Database [Dataset]. http://doi.org/10.5066/P9YIAUZQ
    Explore at:
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Lindsey Schafer; Jennifer Sharpe
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    2023
    Area covered
    Illinois
    Description

    The U.S. Geological Survey (USGS), in cooperation with the Illinois Center for Transportation and the Illinois Department of Transportation, prepared hydro-conditioned geographic information systems (GIS) layers for use in the Illinois StreamStats application. These data were used to delineate drainage basins and compute basin characteristics for updated peak flow and flow duration regression equations for Illinois. This dataset consists of raster grid files for elevation (dem), flow accumulation (fac), flow direction (fdr), and stream definition (str900) for each 8-digit Hydrologic Unit Code (HUC) area in Illinois merged into a single dataset. There are 51 full or partial HUC 8s represented by this data set: 04040002, 05120108, 05120109, 05120111, 05120112, 05120113, 05120114, 05120115, 05140202, 05140203, 05140204, 05140206, 07060005, 07080101, 07080104, 07090001, 07090002, 07090003, 07090004, 07090005, 07090006, 07090007, 07110001, 07110004, 07110009, 07120001, 07120002, 071200 ...

  19. m

    Semantic Similarity with Concept Senses: new Experiment

    • data.mendeley.com
    Updated Oct 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Taglino (2022). Semantic Similarity with Concept Senses: new Experiment [Dataset]. http://doi.org/10.17632/v2bwh7z8kj.1
    Explore at:
    Dataset updated
    Oct 24, 2022
    Authors
    Francesco Taglino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset represents the results of the experimentation of a method for evaluating semantic similarity between concepts in a taxonomy. The method is based on the information-theoretic approach and allows senses of concepts in a given context to be considered. Relevance of senses is calculated in terms of semantic relatedness with the compared concepts. In a previous work [9], the adopted semantic relatedness method was the one described in [10], while in this work we also adopted the ones described in [11], [12], [13], [14], [15], and [16].

    We applied our proposal by extending 7 methods for computing semantic similarity in a taxonomy, selected from the literature. The methods considered in the experiment are referred to as R[2], W&P[3], L[4], J&C[5], P&S[6], A[7], and A&M[8]

    The experiment was run on the well-known Miller and Charles benchmark dataset [1] for assessing semantic similarity.

    The results are organized in seven folders, each with the results related to one of the above semantic relatedness methods. In each folder there is a set of files, each referring to one pair of the Miller and Charles dataset. In fact, for each pair of concepts, all the 28 pairs are considered as possible different contexts.

    REFERENCES [1] Miller G.A., Charles W.G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1). [2] Resnik P. 1995. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Int. Joint Conf. on Artificial Intelligence, Montreal. [3] Wu Z., Palmer M. 1994. Verb semantics and lexical selection. 32nd Annual Meeting of the Associations for Computational Linguistics. [4] Lin D. 1998. An Information-Theoretic Definition of Similarity. Int. Conf. on Machine Learning. [5] Jiang J.J., Conrath D.W. 1997. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Inter. Conf. Research on Computational Linguistics. [6] Pirrò G. 2009. A Semantic Similarity Metric Combining Features and Intrinsic Information Content. Data Knowl. Eng, 68(11). [7] Adhikari A., Dutta B., Dutta A., Mondal D., Singh S. 2018. An intrinsic information content-based semantic similarity measure considering the disjoint common subsumers of concepts of an ontology. J. Assoc. Inf. Sci. Technol. 69(8). [8] Adhikari A., Singh S., Mondal D., Dutta B., Dutta A. 2016. A Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet. CoRR, arXiv:1607.05422, abs/1607.05422. [9] Formica A., Taglino F. 2021. An Enriched Information-Theoretic Definition of Semantic Similarity in a Taxonomy. IEEE Access, vol. 9. [10] Information Content-based approach [Schuhmacher and Ponzetto, 2014]. [11] Linked Data Semantic Distance (LDSD) [Passant, 2010]. [12] Wikipedia Link-based Measure (WLM ) [Witten and Milne, 2008]; [13] Linked Open Data Description Overlap-based approach (LODDO) [Zhou et al. 2012] [14] Exclusivity-based [Hulpuş et al 2015] [15] ASRMP [El Vaigh et al. 2020] [16] LDSDGN [Piao and Breslin, 2016]

  20. US Industry Data by State, by Industry

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). US Industry Data by State, by Industry [Dataset]. https://www.kaggle.com/datasets/thedevastator/2012-us-industry-data-by-state-by-industry
    Explore at:
    zip(53066 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    US Industry Data by State, by Industry

    Number of Establishments, Sales, Payroll, and Employees

    By Gary Hoover [source]

    About this dataset

    This data set provides a detailed look into the US economy. It includes information on establishments and nonemployer businesses, as well as sales revenue, payrolls, and the number of employees. Gleaned from the Economic Census done every five years, this data is a valuable resource to anyone curious about where the nation was economically at the time. With columns including geographic area name, North American Industry Classification System (NAICS) codes for industries, descriptions of those codes meaning of operation or tax status, and annual payroll, this information-rich dataset contains all you need to track economic trends over time. Whether you’re a researcher studying industry patterns or an entrepreneur looking for market insight — this dataset has what you’re looking for!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides detailed US industry data by state, including the number of establishments, value of sales, payroll, and number of employees. All the data is based on the North American Industry Classification System (NAICS) code for each specific industry. This will allow you to easily analyze and compare industries across different states or regions.

    Research Ideas

    • Analyzing the economic impact of a new business or industry trends in different states: Comparing the change in the number of establishments, payroll, and employees over time can give insight into how a state is affected by a new industry trend or introduction of a new service or product.
    • Estimating customer sales potential for businesses: This dataset can be used to estimate the potential customer base for businesses in different geographic areas. By analyzing total business done by non-employers in an area along with its estimated population can help estimate how much overall sales potential exists for a given region.
    • Tracking competitor performance: By looking at shipments, receipts, and value of business done across industries in different regions or even cities, companies can track their competitors’ performance and compare it to their own to better assess their strategies going forward

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: 2012 Industry Data by Industry and State.csv | Column name | Description | |:----------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------| | Geographic area name | The name of the geographic area the data is for. (String) | | NAICS code | The North American Industry Classification System (NAICS) code for the industry. (String) | | Meaning of NAICS code | The description of the NAICS code. (String) | | Meaning of Type of operation or tax status code | The description of the type of operation or tax status code. (String) ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Administration for Children and Families (2025). Data Definition Guidelines [Dataset]. https://catalog.data.gov/dataset/data-definition-guidelines

Data Definition Guidelines

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 8, 2025
Dataset provided by
Administration for Children and Families
Description

ACF Agency Wide resource Metadata-only record linking to the original dataset. Open original dataset below.

Search
Clear search
Close search
Google apps
Main menu