100+ datasets found

Frequently leveraged external data sources for global enterprises 2020
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Frequently leveraged external data sources for global enterprises 2020 [Dataset]. https://www.statista.com/statistics/1235514/worldwide-popular-external-data-sources-companies/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Aug 2020
Area covered
Worldwide
Description
In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.
B
Data Management Plan Examples Database
borealisdata.ca
search.dataone.org
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebeca Gaston Jothyraj; Shrey Acharya; Isaac Pratt; Danica Evering; Sarthak Behal (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/SDITUG
Dataset updated
Aug 27, 2024
Dataset provided by
Borealis
Authors
Rebeca Gaston Jothyraj; Shrey Acharya; Isaac Pratt; Danica Evering; Sarthak Behal
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
2011 - 2024
Description
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined in the README. Data included/extracted from the examples included the discipline and field of study, author, institutional affiliation and funding information, location, date modified, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications, grant pages, or French language versions. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100
ceicdata.com
Updated Apr 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2021). Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 [Dataset]. https://www.ceicdata.com/en/algeria/governance-policy-and-institutions/dz-spi-pillar-4-data-sources-score-scale-0100
Explore at:
Dataset updated
Apr 4, 2021
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2016 - Dec 1, 2022
Area covered
Algeria
Variables measured
Money Market Rate
Description
Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 45.958 NA in 2022. This records a decrease from the previous number of 49.075 NA for 2021. Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 49.892 NA from Dec 2016 (Median) to 2022, with 7 observations. The data reached an all-time high of 52.417 NA in 2018 and a record low of 45.958 NA in 2022. Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Algeria – Table DZ.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
f
iCoverT: A rich data source on the incidence of child maltreatment over time...
plos.figshare.com
docx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michelle Degli Esposti; Jonathan Taylor; David K. Humphreys; Lucy Bowes (2023). iCoverT: A rich data source on the incidence of child maltreatment over time in England and Wales [Dataset]. http://doi.org/10.1371/journal.pone.0201223
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0201223
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Michelle Degli Esposti; Jonathan Taylor; David K. Humphreys; Lucy Bowes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
England
Description
Child maltreatment is a major public health problem, which is plagued with research challenges. Good epidemiological data can help to establish the nature and scope of past and present child maltreatment, and monitor its progress going forward. However, high quality data sources are currently lacking for England and Wales. We employed systematic methodology to harness pre-existing datasets (including non-digitalised datasets) and develop a rich data source on the incidence of Child maltreatment over Time (iCoverT) in England and Wales. The iCoverT consists of six databases and accompanying data documentation: Child Protection Statistics, Children In Care Statistics, Criminal Statistics, Homicide Index, Mortality Statistics and NSPCC Statistics. Each database is a unique indicator of child maltreatment incidence with 272 data variables in total. The databases span from 1858 to 2016 and therefore extends current data sources by over 80 years. We present a proof-of-principle analysis of a subset of the data to show how time series methods may be used to address key research challenges. This example demonstrates the utility of iCoverT and indicates that it will prove to be a valuable data source for researchers, clinicians and policy-makers concerned with child maltreatment. The iCoverT is freely available at the Open Science Framework (osf.io/cf7mv).
v
Global Real World Evidence Solutions Market By Data Source (Electronic...
verifiedmarketresearch.com
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Real World Evidence Solutions Market By Data Source (Electronic Health Records, Claims Data, Registries, Medical Devices), By Therapeutic Area (Oncology, Cardiovascular Diseases, Neurology, Rare Diseases), By Application (Drug Development, Clinical Decision Support, Epidemiological Studies, Post-Marketing Surveillance), By Geographic Scope and Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/real-world-evidence-solutions-market/
Explore at:
Dataset updated
Jul 16, 2024
Dataset authored and provided by
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Real World Evidence Solutions Market size was valued at USD 1.30 Billion in 2024 and is projected to reach USD 3.71 Billion by 2031, growing at a CAGR of 13.92% during the forecast period 2024-2031.

Global Real World Evidence Solutions Market Drivers

The market drivers for the Real World Evidence Solutions Market can be influenced by various factors. These may include:

Growing Need for Evidence-Based Healthcare: Real-world evidence (RWE) is becoming more and more important in healthcare decision-making, according to stakeholders such as payers, providers, and regulators. In addition to traditional clinical trial data, RWE solutions offer important insights into the efficacy, safety, and value of healthcare interventions in real-world situations. Growing Use of RWE by Pharmaceutical Companies: RWE solutions are being used by pharmaceutical companies to assist with market entry, post-marketing surveillance, and drug development initiatives. Pharmaceutical businesses can find new indications for their current medications, improve clinical trial designs, and convince payers and providers of the worth of their products with the use of RWE. Increasing Priority for Value-Based Healthcare: The emphasis on proving the cost- and benefit-effectiveness of healthcare interventions in real-world settings is growing as value-based healthcare models gain traction. To assist value-based decision-making, RWE solutions are essential in evaluating the economic effect and real-world consequences of healthcare interventions. Technological and Data Analytics Advancements: RWE solutions are becoming more capable due to advances in machine learning, artificial intelligence, and big data analytics. With the use of these technologies, healthcare stakeholders can obtain actionable insights from the analysis of vast and varied datasets, including patient-generated data, claims data, and electronic health records. Regulatory Support for RWE Integration: RWE is being progressively integrated into regulatory decision-making processes by regulatory organisations including the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). The FDA's Real-World Evidence Programme and the EMA's Adaptive Pathways and PRIority MEdicines (PRIME) programme are two examples of initiatives that are making it easier to incorporate RWE into regulatory submissions and drug development. Increasing Emphasis on Patient-Centric Healthcare: The value of patient-reported outcomes and real-world experiences in healthcare decision-making is becoming more widely acknowledged. RWE technologies facilitate the collection and examination of patient-centered data, offering valuable insights into treatment efficacy, patient inclinations, and quality of life consequences. Extension of RWE Use Cases: RWE solutions are being used in medication development, post-market surveillance, health economics and outcomes research (HEOR), comparative effectiveness research, and market access, among other healthcare fields. The necessity for a variety of RWE solutions catered to the needs of different stakeholders is being driven by the expansion of RWE use cases.
f
Table1_An open-source platform integrating emerging data sources to support...
frontiersin.figshare.com
docx
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vlasios K. Dimitriadis; Stella Dimitsaki; Achilleas Chytas; George I. Gavriilidis; Christine Kakalou; Panos Bonotis; Pantelis Natsiavas (2023). Table1_An open-source platform integrating emerging data sources to support multi-modal active pharmacovigilance.DOCX [Dataset]. http://doi.org/10.3389/fdsfr.2022.1016042.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fdsfr.2022.1016042.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Vlasios K. Dimitriadis; Stella Dimitsaki; Achilleas Chytas; George I. Gavriilidis; Christine Kakalou; Panos Bonotis; Pantelis Natsiavas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As new data sources have emerged, the data space which Pharmacovigilance (PV) processes can use has significantly expanded. However, still, the currently available tools do not widely exploit data sources beyond Spontaneous Report Systems built to collect Individual Case Safety Reports (ICSRs). This article presents an open-source platform enabling the integration of heterogeneous data sources to support the analysis of drug safety related information. Furthermore, the results of a comparative study as part of the project’s pilot phase are also presented. Data sources were integrated in the form of four “workspaces”: (a) Individual Case Safety Reports—obtained from OpenFDA, (b) Real-World Data (RWD) —using the OMOP-CDM data model, (c) social media data—collected via Twitter, and (d) scientific literature—retrieved from PubMed. Data intensive analytics are built for each workspace (e.g., disproportionality analysis metrics are used for OpenFDA data, descriptive statistics for OMOP-CDM data and twitter data streams etc.). Upon these workspaces, the end-user sets up “investigation scenarios” defined by Drug-Event Combinations (DEC). Specialized features like detailed reporting which could be used to support reports for regulatory purposes and also “quick views” are provided to facilitate use where detailed statistics might not be needed and a qualitative overview of the available information might be enough (e.g., clinical environment). The platform’s technical features are presented as Supplementary Material via a walkthrough of an example “investigation scenario”. The presented platform is evaluated via a comparative study against the EVDAS system, conducted by PV professionals. Results from the comparative study, show that there is indeed a need for relevant technical tools and the ability to draw recent data from heterogeneous data sources is appreciated. However, a reluctance by end-users is also outlined as they feel technical improvements and systematic training are required before the potential adoption of the presented software. As a whole, it is concluded that integrating such a platform in real-world setting is far from trivial, requiring significant effort on training and usability aspects.
o
Questions to SQL Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Questions to SQL Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/5a0fa182-be98-46d5-96e4-60ac97c14760
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset is a large, crowd-sourced collection designed for developing natural language interfaces for relational databases. It contains hand-annotated examples of natural language questions paired with their corresponding SQL queries. The data is derived from Wikipedia tables, providing a rich context for understanding how natural language can be translated into database queries. It serves as a valuable resource for training and testing models that aim to bridge the gap between human language and structured database interactions.

Columns

phase: The stage of the data collection process. (String)

question: The user's question posed in natural language. (String)

table: The specific database table relevant to the question. (String)

sql: The SQL query that corresponds to the user's question. (String)

Distribution

The dataset is typically provided in a CSV file format. It comprises 80,654 hand-annotated examples of questions and SQL queries. These examples are distributed across 24,241 distinct tables originating from Wikipedia. Specific numbers for rows or records beyond this total are not explicitly detailed, but unique values for questions are 5,069 and for SQL queries are 15,595.

Usage

This dataset is ideal for several applications: * Developing and improving natural language interfaces for relational databases. * Building a knowledge base of frequently used SQL queries. * Generating training sets for neural networks that convert natural language into SQL queries.

Coverage

The dataset's scope is global, reflecting its origins from Wikipedia tables which have worldwide applicability. There are no specific geographical, time range, or demographic notes on data availability for particular groups or years within the dataset itself. It focuses on the general relationship between questions and SQL queries.

License

CC0

Who Can Use It

This dataset is intended for: * Data scientists developing machine learning models for language processing. * AI and ML researchers focused on natural language understanding (NLU) and natural language generation (NLG) in the context of databases. * Software developers creating intelligent database query tools or conversational AI agents that interact with databases. * Academics and students conducting research in areas like computational linguistics, database systems, and artificial intelligence.

Dataset Name Suggestions

WikiSQL Natural Language Interface Data

Questions to SQL Dataset

NLP2SQL Database Interface Dataset

Structured Query Language Question Bank

Wiki Table Query Data

Attributes

Original Data Source: WikiSQL (Questions and SQL Queries)`
State Health IT Policy Levers
kaggle.com
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). State Health IT Policy Levers [Dataset]. https://www.kaggle.com/datasets/thedevastator/state-health-it-policy-levers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
State Health IT Policy Levers

300+ Examples of Advancing Interoperability and Promoting Health IT

By US Open Data Portal, data.gov [source]

About this dataset

This dataset contains over 300 examples of health IT policy levers used by states to advance interoperability, promote health IT and support delivery system reform. The U.S Government's Office of National Coordinator for Health Information Technology (ONC) has curated this catalog as part of its Health IT State Policy Levers Compendium. It provides an exhaustive directory on the policy levers being utilized, along with information on the state enacting them and their official sources. This collection seeks to act as a comprehensive guide for government officials and healthcare providers who are interested in state-based initiatives for optimizing health information technology. Explore the strategies your own state might be using to unlock improved patient outcomes!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides information on policy levers used by various states in the United States to promote health IT and advance interoperability. The comprehensive list includes over 300 documented examples of health IT policy levers used by these states. This catalog can be used to identify which specific policy levers are being used, as well as what activities they are associated with.

If you're interested in learning more about how states use health IT policy levers, this dataset is a great resource. It contains detailed information on each entry, including the state where it's being used, the status of that activity, a description of the activity and its purpose, and an official source for additional information about that particular entry.

Using this data set is easy - simply search for specific states or find out which kinds of activities each state is using their health IT policy levers for. You can also look up any specific application or implementation detail from each record by opening up its corresponding source URL link . With all this information at hand you can better understand how states use their health IT tools to make a difference in advancing interoperability within healthcare systems today!

Research Ideas

It can be used to provide states with potential models of successful health IT policy levers, allowing them to learn from the experiences of other states in developing and implementing health IT legislation.

The dataset can also be used by researchers looking to study the effectiveness of existing health care policy levers, as well as to identify any gaps that need to be filled in order for certain policies to have a greater overall impact.

Additionally, it could be used by industry stakeholders such as hospitals or other healthcare organizations for benchmarking their own efforts related to IT implementation, such as understanding what activities are being undertaken and which sources are being used for best practices or additional resources when making decisions related to new technology implementations into an organization's operations and services

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: policy-levers-activities-catalog-csv-1.csv | Column name | Description | |:-------------------------|:----------------------------------------------------------------------------------------------| | state | The state in which the policy lever is being used. (String) | | policy_lever | Type of policy lever being used. (String) | | activity_status | Status of activity (e.g., active or inactive). (String) | | activity_description | Description of activity. (String) | | source | Source from where data is gathered from. (String) | | source_url | A link that points directly back to an original sources with additional information. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit US Open Data Portal, data.gov.

Commute Source Intensity

data.wfrc.org

Updated Oct 18, 2018

Facebook

Twitter

Click to copy link

Link copied

Cite

Wasatch Front Regional Council (2018). Commute Source Intensity [Dataset]. https://data.wfrc.org/datasets/wfrc::commute-source-intensity/about

Explore at:

Dataset updated

Oct 18, 2018

Dataset authored and provided by

Wasatch Front Regional Council

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Description

All grid squares are approximately the size of a downtown Salt Lake City block.For each grid square, three metrics are available, each of which reflects a total from the analysis grid square and its 12 nearest grid squares (those whose center is within 0.25 miles of the boundary of the analysis grid square). Geometrically the 12 nearest cells are two cells in each cardinal direction and 1 cell diagonally (see graphic below).

 The three attribute values, representing metrics for the current 2015 model base year, are:

 Nearby Employment Intensity (NEI):
 Jobs within quarter mile of each grid square. County-level job counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Job locations are then determined using the WFRC/MAG Real Estate Market Model (an customized implementation of UrbanSim open source software) using county assessor tax parcel data together with generalized job data from the Department of Workforce Services as key of the model inputs.

 Nearby Residential Intensity (NRI):
 Households within Quarter Mile. County-level household counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Household locations are determined using the WFRC/MAG REM model using county assessor tax parcel data together with US Census population (block level) as key model inputs.

 Nearby Combined Intensity (NCI):
 Jobs plus scaled households within a quarter mile of each grid square. To give NEI and NRI equal weighting, the NRI household number is scaled by multiplying by 1,295,513 (total number of jobs in the region) and dividing by. 731,392 (the total number of households in the region)0

 Quarter mile grid square example graphic:

g
Scorecard example small source data file | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scorecard example small source data file | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_scorecard-example-small-source-data-file-69e0a
Explore at:
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
🇺🇸 미국
c
Compilation of Public-Supply Well Construction Depths in California
s.cnmilf.com
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Compilation of Public-Supply Well Construction Depths in California [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/compilation-of-public-supply-well-construction-depths-in-california
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
California
Description
This data release is a compilation of construction depth information for 12,383 active and inactive public-supply wells (PSWs) in California from various data sources. Construction data from multiple sources were indexed by the California State Water Resources Control Board Division of Drinking Water (DDW) primary station code (PS Code). Five different data sources were compared with the following priority order: 1, Local sources from select municipalities and water purveyors (Local); 2, Local DDW district data (DDW); 3, The United States Geological Survey (USGS) National Water Information System (NWIS); 4, The California State Water Resources Control Board Groundwater Ambient Monitoring and Assessment Groundwater Information System (SWRCB); and 5, USGS attribution of California Department of Water Resources well completion report data (WCR). For all data sources, the uppermost depth to the well's open or perforated interval was attributed as depth to top of perforations (ToP). The composite depth to bottom of well (Composite BOT) field was attributed from available construction data in the following priority order: 1, Depth to bottom of perforations (BoP); 2, Depth of completed well (Well Depth); 3; Borehole depth (Hole Depth). PSW ToPs and Composite BOTs from each of the five data sources were then compared and summary construction depths for both fields were selected for wells with multiple data sources according to the data-source priority order listed above. Case-by-case modifications to the final selected summary construction depths were made after priority order-based selection to ensure internal logical consistency (for example, ToP must not exceed Composite BOT). This data release contains eight tab-delimited text files. WellConstructionSourceData_Local.txt contains well construction-depth data, Composite BOT data-source attribution, and local agency data-source attribution for the Local data. WellConstructionSourceData_DDW.txt contains well construction-depth data and Composite BOT data-source attribution for the DDW data. WellConstructionSourceData_NWIS.txt contains well construction-depth data, Composite BOT data-source attribution, and USGS site identifiers for the NWIS data. WellConstructionSourceData_SWRCB.txt contains well construction-depth data and Composite BOT data-source attribution for the SWRCB data. WellConstructionSourceData_WCR.txt contains contains well construction depth data and Composite BOT data-source attribution for the WCR data. WellConstructionCompilation_ToP.txt contains all ToP data listed by data source. WellConstructionCompilation_BOT.txt contains all Composite BOT data listed by data source. WellConstructionCompilation_Summary.txt contains summary ToP and Composite BOT values for each well with data-source attribution for both construction fields. All construction depths are in units of feet below land surface and are reported to the nearest foot.
Map of articles about "Teaching Open Science"
zenodo.org
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isabel Steinhardt; Isabel Steinhardt (2020). Map of articles about "Teaching Open Science" [Dataset]. http://doi.org/10.5281/zenodo.3371415
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3371415
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Isabel Steinhardt; Isabel Steinhardt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This description is part of the blog post "Systematic Literature Review of teaching Open Science" https://sozmethode.hypotheses.org/839

According to my opinion, we do not pay enough attention to teaching Open Science in higher education. Therefore, I designed a seminar to teach students the practices of Open Science by doing qualitative research.About this seminar, I wrote the article ”Teaching Open Science and qualitative methods“. For the article ”Teaching Open Science and qualitative methods“, I started to review the literature on ”Teaching Open Science“. The result of my literature review is that certain aspects of Open Science are used for teaching. However, Open Science with all its aspects (Open Access, Open Data, Open Methodology, Open Science Evaluation and Open Science Tools) is not an issue in publications about teaching.

Based on this insight, I have started a systematic literature review. I realized quickly that I need help to analyse and interpret the articles and to evaluate my preliminary findings. Especially different disciplinary cultures of teaching different aspects of Open Science are challenging, as I myself, as a social scientist, do not have enough insight to be able to interpret the results correctly. Therefore, I would like to invite you to participate in this research project!

I am now looking for people who would like to join a collaborative process to further explore and write the systematic literature review on “Teaching Open Science“. Because I want to turn this project into a Massive Open Online Paper (MOOP). According to the 10 rules of Tennant et al (2019) on MOOPs, it is crucial to find a core group that is enthusiastic about the topic. Therefore, I am looking for people who are interested in creating the structure of the paper and writing the paper together with me. I am also looking for people who want to search for and review literature or evaluate the literature I have already found. Together with the interested persons I would then define, the rules for the project (cf. Tennant et al. 2019). So if you are interested to contribute to the further search for articles and / or to enhance the interpretation and writing of results, please get in touch. For everyone interested to contribute, the list of articles collected so far is freely accessible at Zotero: https://www.zotero.org/groups/2359061/teaching_open_science. The figure shown below provides a first overview of my ongoing work. I created the figure with the free software yEd and uploaded the file to zenodo, so everyone can download and work with it:

To make transparent what I have done so far, I will first introduce what a systematic literature review is. Secondly, I describe the decisions I made to start with the systematic literature review. Third, I present the preliminary results.

Systematic literature review – an Introduction

Systematic literature reviews “are a method of mapping out areas of uncertainty, and identifying where little or no relevant research has been done.” (Petticrew/Roberts 2008: 2). Fink defines the systematic literature review as a “systemic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars, and practitioners.” (Fink 2019: 6). The aim of a systematic literature reviews is to surpass the subjectivity of a researchers’ search for literature. However, there can never be an objective selection of articles. This is because the researcher has for example already made a preselection by deciding about search strings, for example “Teaching Open Science”. In this respect, transparency is the core criteria for a high-quality review.

In order to achieve high quality and transparency, Fink (2019: 6-7) proposes the following seven steps:

Selecting a research question.

Selecting the bibliographic database.

Choosing the search terms.

Applying practical screening criteria.

Applying methodological screening criteria.

Doing the review.

Synthesizing the results.

I have adapted these steps for the “Teaching Open Science” systematic literature review. In the following, I will present the decisions I have made.

Systematic literature review – decisions I made

Research question: I am interested in the following research questions: How is Open Science taught in higher education? Is Open Science taught in its full range with all aspects like Open Access, Open Data, Open Methodology, Open Science Evaluation and Open Science Tools? Which aspects are taught? Are there disciplinary differences as to which aspects are taught and, if so, why are there such differences?

Databases: I started my search at the Directory of Open Science (DOAJ). “DOAJ is a community-curated online directory that indexes and provides access to high quality, open access, peer-reviewed journals.” (https://doaj.org/) Secondly, I used the Bielefeld Academic Search Engine (base). Base is operated by Bielefeld University Library and “one of the world’s most voluminous search engines especially for academic web resources” (base-search.net). Both platforms are non-commercial and focus on Open Access publications and thus differ from the commercial publication databases, such as Web of Science and Scopus. For this project, I deliberately decided against commercial providers and the restriction of search in indexed journals. Thus, because my explicit aim was to find articles that are open in the context of Open Science.

Search terms: To identify articles about teaching Open Science I used the following search strings: “teaching open science” OR teaching “open science” OR teach „open science“. The topic search looked for the search strings in title, abstract and keywords of articles. Since these are very narrow search terms, I decided to broaden the method. I searched in the reference lists of all articles that appear from this search for further relevant literature. Using Google Scholar I checked which other authors cited the articles in the sample. If the so checked articles met my methodological criteria, I included them in the sample and looked through the reference lists and citations at Google Scholar. This process has not yet been completed.

Practical screening criteria: I have included English and German articles in the sample, as I speak these languages (articles in other languages are very welcome, if there are people who can interpret them!). In the sample only journal articles, articles in edited volumes, working papers and conference papers from proceedings were included. I checked whether the journals were predatory journals – such articles were not included. I did not include blogposts, books or articles from newspapers. I only included articles that fulltexts are accessible via my institution (University of Kassel). As a result, recently published articles at Elsevier could not be included because of the special situation in Germany regarding the Project DEAL (https://www.projekt-deal.de/about-deal/). For articles that are not freely accessible, I have checked whether there is an accessible version in a repository or whether preprint is available. If this was not the case, the article was not included. I started the analysis in May 2019.

Methodological criteria: The method described above to check the reference lists has the problem of subjectivity. Therefore, I hope that other people will be interested in this project and evaluate my decisions. I have used the following criteria as the basis for my decisions: First, the articles must focus on teaching. For example, this means that articles must describe how a course was designed and carried out. Second, at least one aspect of Open Science has to be addressed. The aspects can be very diverse (FOSS, repositories, wiki, data management, etc.) but have to comply with the principles of openness. This means, for example, I included an article when it deals with the use of FOSS in class and addresses the aspects of openness of FOSS. I did not include articles when the authors describe the use of a particular free and open source software for teaching but did not address the principles of openness or re-use.

Doing the review: Due to the methodical approach of going through the reference lists, it is possible to create a map of how the articles relate to each other. This results in thematic clusters and connections between clusters. The starting point for the map were four articles (Cook et al. 2018; Marsden, Thompson, and Plonsky 2017; Petras et al. 2015; Toelch and Ostwald 2018) that I found using the databases and criteria described above. I used yEd to generate the network. „yEd is a powerful desktop application that can be used to quickly and effectively generate high-quality diagrams.” (https://www.yworks.com/products/yed) In the network, arrows show, which articles are cited in an article and which articles are cited by others as well. In addition, I made an initial rough classification of the content using colours. This classification is based on the contents mentioned in the articles’ title and abstract. This rough content classification requires a more exact, i.e., content-based subdivision and
United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 [Dataset]. https://www.ceicdata.com/en/united-states/governance-policy-and-institutions/us-spi-pillar-4-data-sources-score-scale-0100
Explore at:
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2016 - Dec 1, 2023
Area covered
United States
Variables measured
Money Market Rate
Description
United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 85.625 NA in 2023. This stayed constant from the previous number of 85.625 NA for 2022. United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 82.204 NA from Dec 2016 (Median) to 2023, with 8 observations. The data reached an all-time high of 85.625 NA in 2023 and a record low of 76.767 NA in 2020. United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
d
Data from: Reference Mysteries
search.dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Hamilton (2023). Reference Mysteries [Dataset]. http://doi.org/10.5683/SP3/2VLBGJ
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/2VLBGJ
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Elizabeth Hamilton
Description
The requests we receive at the Reference Desk keep surprising us. We'll take a look at some of the best examples from the year on data questions and data solutions.
g
Development Economics Data Group - Statistical performance indicators (SPI):...
gimi9.com
Updated Mar 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Development Economics Data Group - Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100) | gimi9.com [Dataset]. https://gimi9.com/dataset/worldbank_wb_wdi_iq_spi_pil4
Explore at:
Dataset updated
Mar 31, 2021
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.
Z
SCAR Southern Ocean Diet and Energetics Database
data.niaid.nih.gov
data.aad.gov.au
+3more
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scientific Committee on Antarctic Research (2023). SCAR Southern Ocean Diet and Energetics Database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5072527
Explore at:
Dataset updated
Jul 24, 2023
Dataset authored and provided by
Scientific Committee on Antarctic Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Southern Ocean
Description
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.

Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.

A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.

Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.

Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.

There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.

Data table schemas

Sources data table

SOURCE_ID: The unique identifier of this source

DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")

NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract

DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"

Diet data table

RECORD_ID: The unique identifier of this record

SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)

SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience

ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one

LOCATION: The name of the location at which the data was collected

WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)

EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)

SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)

NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)

ALTITUDE_MIN: The minimum altitude of the sampling region, in metres

ALTITUDE_MAX: The maximum altitude of the sampling region, in metres

DEPTH_MIN: The shallowest depth of the sampling, in metres

DEPTH_MAX: The deepest depth of the sampling, in metres

OBSERVATION_DATE_START: The start of the sampling period

OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)

PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names

PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source

PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register

PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register

PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)

PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)

PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"

PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"

PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed

PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).

PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample

PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")

PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")

PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample

PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")

PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents

PREY_NAME: The scientific name of the prey item (corrected, if necessary)

PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source

PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register

PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register

PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)

PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses

PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")

PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"

PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of
d
PEST++ Version 5.0 source code, pre-compiled binaries and example problem
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). PEST++ Version 5.0 source code, pre-compiled binaries and example problem [Dataset]. https://catalog.data.gov/dataset/pest-version-5-0-source-code-pre-compiled-binaries-and-example-problem
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
PEST++ Version 5 software release. This release includes ASCII format C++11 source code, precompiled binaries for windows 10 and linux, and inputs files the example problem shown in the report
f
Different sources of data for the uncovering of failures in reporting of...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Doshi; Tom Jefferson; Chris Del Mar (2023). Different sources of data for the uncovering of failures in reporting of safety and effectiveness of some examples of new drugs. [Dataset]. http://doi.org/10.1371/journal.pmed.1001201.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pmed.1001201.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Medicine
Authors
Peter Doshi; Tom Jefferson; Chris Del Mar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Different sources of data for the uncovering of failures in reporting of safety and effectiveness of some examples of new drugs.
n
Jurisdictional Unit (Public) - Dataset - CKAN
nationaldataplatform.org
Updated Feb 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Jurisdictional Unit (Public) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/jurisdictional-unit-public
Explore at:
Dataset updated
Feb 28, 2024
Description
Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The
Museums, Aquariums, and Zoos
kaggle.com
Updated Mar 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute of Museum and Library Services (2017). Museums, Aquariums, and Zoos [Dataset]. https://www.kaggle.com/forums/f/2808/museums-aquariums-and-zoos
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2017
Dataset provided by
Kaggle
Authors
Institute of Museum and Library Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

The museum dataset is an evolving list of museums and related organizations in the United States. The data file includes basic information about each organization (name, address, phone, website, and revenue) plus the museum type or discipline. The discipline type is based on the National Taxonomy of Exempt Entities, which the National Center for Charitable Statistics and IRS use to classify nonprofit organizations.

Non-museum organizations may be included. For example, a non-museum organization may be included in the data file because it has a museum-like name on its IRS record for tax-exempt organizations. Museum foundations may also be included.

Museums may be missing. For example, local municipal museums may be undercounted because original data sources used to create the compilation did not include them.

Museums may be listed multiple times. For example, one museum may be listed as both itself and its parent organization because it was listed differently in each original data sources. Duplicate records are especially common for museums located within universities.

Information about museums may be outdated. The original scan and compilation of data sources occurred in 2014. Scans are no longer being done to update the data sources or add new data sources to the compilation. Information about museums may have changed since it was originally included in the file.

Acknowledgements

The museum data was compiled from IMLS administrative records for discretionary grant recipients, IRS records for tax-exempt organizations, and private foundation grant recipients.

Inspiration

Which city or state has the most museums per capita? How many zoos or aquariums exist in the United States? What museum or related organization had the highest revenue last year? How does the composition of museum types differ across the country?

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Frequently leveraged external data sources for global enterprises 2020 [Dataset]. https://www.statista.com/statistics/1235514/worldwide-popular-external-data-sources-companies/

Frequently leveraged external data sources for global enterprises 2020

Explore at:

Dataset updated

Jul 1, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Aug 2020

Area covered

Worldwide

Description

In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.

Clear search

Close search

Google apps

Main menu

Frequently leveraged external data sources for global enterprises 2020

Data Management Plan Examples Database

Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100

iCoverT: A rich data source on the incidence of child maltreatment over time...

Global Real World Evidence Solutions Market By Data Source (Electronic...

Table1_An open-source platform integrating emerging data sources to support...

Questions to SQL Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

State Health IT Policy Levers

State Health IT Policy Levers

300+ Examples of Advancing Interoperability and Promoting Health IT

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Commute Source Intensity

Scorecard example small source data file | gimi9.com

Compilation of Public-Supply Well Construction Depths in California

Map of articles about "Teaching Open Science"

United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100

Data from: Reference Mysteries

Development Economics Data Group - Statistical performance indicators (SPI):...

SCAR Southern Ocean Diet and Energetics Database

PEST++ Version 5.0 source code, pre-compiled binaries and example problem

Different sources of data for the uncovering of failures in reporting of...

Jurisdictional Unit (Public) - Dataset - CKAN

Museums, Aquariums, and Zoos

Content

Acknowledgements

Inspiration

Frequently leveraged external data sources for global enterprises 2020