In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined in the README. Data included/extracted from the examples included the discipline and field of study, author, institutional affiliation and funding information, location, date modified, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications, grant pages, or French language versions. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 45.958 NA in 2022. This records a decrease from the previous number of 49.075 NA for 2021. Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 49.892 NA from Dec 2016 (Median) to 2022, with 7 observations. The data reached an all-time high of 52.417 NA in 2018 and a record low of 45.958 NA in 2022. Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Algeria – Table DZ.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Child maltreatment is a major public health problem, which is plagued with research challenges. Good epidemiological data can help to establish the nature and scope of past and present child maltreatment, and monitor its progress going forward. However, high quality data sources are currently lacking for England and Wales. We employed systematic methodology to harness pre-existing datasets (including non-digitalised datasets) and develop a rich data source on the incidence of Child maltreatment over Time (iCoverT) in England and Wales. The iCoverT consists of six databases and accompanying data documentation: Child Protection Statistics, Children In Care Statistics, Criminal Statistics, Homicide Index, Mortality Statistics and NSPCC Statistics. Each database is a unique indicator of child maltreatment incidence with 272 data variables in total. The databases span from 1858 to 2016 and therefore extends current data sources by over 80 years. We present a proof-of-principle analysis of a subset of the data to show how time series methods may be used to address key research challenges. This example demonstrates the utility of iCoverT and indicates that it will prove to be a valuable data source for researchers, clinicians and policy-makers concerned with child maltreatment. The iCoverT is freely available at the Open Science Framework (osf.io/cf7mv).
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Real World Evidence Solutions Market size was valued at USD 1.30 Billion in 2024 and is projected to reach USD 3.71 Billion by 2031, growing at a CAGR of 13.92% during the forecast period 2024-2031.
Global Real World Evidence Solutions Market Drivers
The market drivers for the Real World Evidence Solutions Market can be influenced by various factors. These may include:
Growing Need for Evidence-Based Healthcare: Real-world evidence (RWE) is becoming more and more important in healthcare decision-making, according to stakeholders such as payers, providers, and regulators. In addition to traditional clinical trial data, RWE solutions offer important insights into the efficacy, safety, and value of healthcare interventions in real-world situations. Growing Use of RWE by Pharmaceutical Companies: RWE solutions are being used by pharmaceutical companies to assist with market entry, post-marketing surveillance, and drug development initiatives. Pharmaceutical businesses can find new indications for their current medications, improve clinical trial designs, and convince payers and providers of the worth of their products with the use of RWE. Increasing Priority for Value-Based Healthcare: The emphasis on proving the cost- and benefit-effectiveness of healthcare interventions in real-world settings is growing as value-based healthcare models gain traction. To assist value-based decision-making, RWE solutions are essential in evaluating the economic effect and real-world consequences of healthcare interventions. Technological and Data Analytics Advancements: RWE solutions are becoming more capable due to advances in machine learning, artificial intelligence, and big data analytics. With the use of these technologies, healthcare stakeholders can obtain actionable insights from the analysis of vast and varied datasets, including patient-generated data, claims data, and electronic health records. Regulatory Support for RWE Integration: RWE is being progressively integrated into regulatory decision-making processes by regulatory organisations including the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). The FDA's Real-World Evidence Programme and the EMA's Adaptive Pathways and PRIority MEdicines (PRIME) programme are two examples of initiatives that are making it easier to incorporate RWE into regulatory submissions and drug development. Increasing Emphasis on Patient-Centric Healthcare: The value of patient-reported outcomes and real-world experiences in healthcare decision-making is becoming more widely acknowledged. RWE technologies facilitate the collection and examination of patient-centered data, offering valuable insights into treatment efficacy, patient inclinations, and quality of life consequences. Extension of RWE Use Cases: RWE solutions are being used in medication development, post-market surveillance, health economics and outcomes research (HEOR), comparative effectiveness research, and market access, among other healthcare fields. The necessity for a variety of RWE solutions catered to the needs of different stakeholders is being driven by the expansion of RWE use cases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As new data sources have emerged, the data space which Pharmacovigilance (PV) processes can use has significantly expanded. However, still, the currently available tools do not widely exploit data sources beyond Spontaneous Report Systems built to collect Individual Case Safety Reports (ICSRs). This article presents an open-source platform enabling the integration of heterogeneous data sources to support the analysis of drug safety related information. Furthermore, the results of a comparative study as part of the project’s pilot phase are also presented. Data sources were integrated in the form of four “workspaces”: (a) Individual Case Safety Reports—obtained from OpenFDA, (b) Real-World Data (RWD) —using the OMOP-CDM data model, (c) social media data—collected via Twitter, and (d) scientific literature—retrieved from PubMed. Data intensive analytics are built for each workspace (e.g., disproportionality analysis metrics are used for OpenFDA data, descriptive statistics for OMOP-CDM data and twitter data streams etc.). Upon these workspaces, the end-user sets up “investigation scenarios” defined by Drug-Event Combinations (DEC). Specialized features like detailed reporting which could be used to support reports for regulatory purposes and also “quick views” are provided to facilitate use where detailed statistics might not be needed and a qualitative overview of the available information might be enough (e.g., clinical environment). The platform’s technical features are presented as Supplementary Material via a walkthrough of an example “investigation scenario”. The presented platform is evaluated via a comparative study against the EVDAS system, conducted by PV professionals. Results from the comparative study, show that there is indeed a need for relevant technical tools and the ability to draw recent data from heterogeneous data sources is appreciated. However, a reluctance by end-users is also outlined as they feel technical improvements and systematic training are required before the potential adoption of the presented software. As a whole, it is concluded that integrating such a platform in real-world setting is far from trivial, requiring significant effort on training and usability aspects.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a large, crowd-sourced collection designed for developing natural language interfaces for relational databases. It contains hand-annotated examples of natural language questions paired with their corresponding SQL queries. The data is derived from Wikipedia tables, providing a rich context for understanding how natural language can be translated into database queries. It serves as a valuable resource for training and testing models that aim to bridge the gap between human language and structured database interactions.
The dataset is typically provided in a CSV file format. It comprises 80,654 hand-annotated examples of questions and SQL queries. These examples are distributed across 24,241 distinct tables originating from Wikipedia. Specific numbers for rows or records beyond this total are not explicitly detailed, but unique values for questions are 5,069 and for SQL queries are 15,595.
This dataset is ideal for several applications: * Developing and improving natural language interfaces for relational databases. * Building a knowledge base of frequently used SQL queries. * Generating training sets for neural networks that convert natural language into SQL queries.
The dataset's scope is global, reflecting its origins from Wikipedia tables which have worldwide applicability. There are no specific geographical, time range, or demographic notes on data availability for particular groups or years within the dataset itself. It focuses on the general relationship between questions and SQL queries.
CC0
This dataset is intended for: * Data scientists developing machine learning models for language processing. * AI and ML researchers focused on natural language understanding (NLU) and natural language generation (NLG) in the context of databases. * Software developers creating intelligent database query tools or conversational AI agents that interact with databases. * Academics and students conducting research in areas like computational linguistics, database systems, and artificial intelligence.
Original Data Source: WikiSQL (Questions and SQL Queries)`
By US Open Data Portal, data.gov [source]
This dataset contains over 300 examples of health IT policy levers used by states to advance interoperability, promote health IT and support delivery system reform. The U.S Government's Office of National Coordinator for Health Information Technology (ONC) has curated this catalog as part of its Health IT State Policy Levers Compendium. It provides an exhaustive directory on the policy levers being utilized, along with information on the state enacting them and their official sources. This collection seeks to act as a comprehensive guide for government officials and healthcare providers who are interested in state-based initiatives for optimizing health information technology. Explore the strategies your own state might be using to unlock improved patient outcomes!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides information on policy levers used by various states in the United States to promote health IT and advance interoperability. The comprehensive list includes over 300 documented examples of health IT policy levers used by these states. This catalog can be used to identify which specific policy levers are being used, as well as what activities they are associated with.
If you're interested in learning more about how states use health IT policy levers, this dataset is a great resource. It contains detailed information on each entry, including the state where it's being used, the status of that activity, a description of the activity and its purpose, and an official source for additional information about that particular entry.
Using this data set is easy - simply search for specific states or find out which kinds of activities each state is using their health IT policy levers for. You can also look up any specific application or implementation detail from each record by opening up its corresponding source URL link . With all this information at hand you can better understand how states use their health IT tools to make a difference in advancing interoperability within healthcare systems today!
- It can be used to provide states with potential models of successful health IT policy levers, allowing them to learn from the experiences of other states in developing and implementing health IT legislation.
- The dataset can also be used by researchers looking to study the effectiveness of existing health care policy levers, as well as to identify any gaps that need to be filled in order for certain policies to have a greater overall impact.
- Additionally, it could be used by industry stakeholders such as hospitals or other healthcare organizations for benchmarking their own efforts related to IT implementation, such as understanding what activities are being undertaken and which sources are being used for best practices or additional resources when making decisions related to new technology implementations into an organization's operations and services
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: policy-levers-activities-catalog-csv-1.csv | Column name | Description | |:-------------------------|:----------------------------------------------------------------------------------------------| | state | The state in which the policy lever is being used. (String) | | policy_lever | Type of policy lever being used. (String) | | activity_status | Status of activity (e.g., active or inactive). (String) | | activity_description | Description of activity. (String) | | source | Source from where data is gathered from. (String) | | source_url | A link that points directly back to an original sources with additional information. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit US Open Data Portal, data.gov.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All grid squares are approximately the size of a downtown Salt Lake City block.For each grid square, three metrics are available, each of which reflects a total from the analysis grid square and its 12 nearest grid squares (those whose center is within 0.25 miles of the boundary of the analysis grid square). Geometrically the 12 nearest cells are two cells in each cardinal direction and 1 cell diagonally (see graphic below).
The three attribute values, representing metrics for the current 2015 model base year, are:
Nearby Employment Intensity (NEI):
Jobs within quarter mile of each grid square. County-level job counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Job locations are then determined using the WFRC/MAG Real Estate Market Model (an customized implementation of UrbanSim open source software) using county assessor tax parcel data together with generalized job data from the Department of Workforce Services as key of the model inputs.
Nearby Residential Intensity (NRI):
Households within Quarter Mile. County-level household counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Household locations are determined using the WFRC/MAG REM model using county assessor tax parcel data together with US Census population (block level) as key model inputs.
Nearby Combined Intensity (NCI):
Jobs plus scaled households within a quarter mile of each grid square. To give NEI and NRI equal weighting, the NRI household number is scaled by multiplying by 1,295,513 (total number of jobs in the region) and dividing by. 731,392 (the total number of households in the region)0
Quarter mile grid square example graphic:
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
🇺🇸 미국
This data release is a compilation of construction depth information for 12,383 active and inactive public-supply wells (PSWs) in California from various data sources. Construction data from multiple sources were indexed by the California State Water Resources Control Board Division of Drinking Water (DDW) primary station code (PS Code). Five different data sources were compared with the following priority order: 1, Local sources from select municipalities and water purveyors (Local); 2, Local DDW district data (DDW); 3, The United States Geological Survey (USGS) National Water Information System (NWIS); 4, The California State Water Resources Control Board Groundwater Ambient Monitoring and Assessment Groundwater Information System (SWRCB); and 5, USGS attribution of California Department of Water Resources well completion report data (WCR). For all data sources, the uppermost depth to the well's open or perforated interval was attributed as depth to top of perforations (ToP). The composite depth to bottom of well (Composite BOT) field was attributed from available construction data in the following priority order: 1, Depth to bottom of perforations (BoP); 2, Depth of completed well (Well Depth); 3; Borehole depth (Hole Depth). PSW ToPs and Composite BOTs from each of the five data sources were then compared and summary construction depths for both fields were selected for wells with multiple data sources according to the data-source priority order listed above. Case-by-case modifications to the final selected summary construction depths were made after priority order-based selection to ensure internal logical consistency (for example, ToP must not exceed Composite BOT). This data release contains eight tab-delimited text files. WellConstructionSourceData_Local.txt contains well construction-depth data, Composite BOT data-source attribution, and local agency data-source attribution for the Local data. WellConstructionSourceData_DDW.txt contains well construction-depth data and Composite BOT data-source attribution for the DDW data. WellConstructionSourceData_NWIS.txt contains well construction-depth data, Composite BOT data-source attribution, and USGS site identifiers for the NWIS data. WellConstructionSourceData_SWRCB.txt contains well construction-depth data and Composite BOT data-source attribution for the SWRCB data. WellConstructionSourceData_WCR.txt contains contains well construction depth data and Composite BOT data-source attribution for the WCR data. WellConstructionCompilation_ToP.txt contains all ToP data listed by data source. WellConstructionCompilation_BOT.txt contains all Composite BOT data listed by data source. WellConstructionCompilation_Summary.txt contains summary ToP and Composite BOT values for each well with data-source attribution for both construction fields. All construction depths are in units of feet below land surface and are reported to the nearest foot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This description is part of the blog post "Systematic Literature Review of teaching Open Science" https://sozmethode.hypotheses.org/839
According to my opinion, we do not pay enough attention to teaching Open Science in higher education. Therefore, I designed a seminar to teach students the practices of Open Science by doing qualitative research.About this seminar, I wrote the article ”Teaching Open Science and qualitative methods“. For the article ”Teaching Open Science and qualitative methods“, I started to review the literature on ”Teaching Open Science“. The result of my literature review is that certain aspects of Open Science are used for teaching. However, Open Science with all its aspects (Open Access, Open Data, Open Methodology, Open Science Evaluation and Open Science Tools) is not an issue in publications about teaching.
Based on this insight, I have started a systematic literature review. I realized quickly that I need help to analyse and interpret the articles and to evaluate my preliminary findings. Especially different disciplinary cultures of teaching different aspects of Open Science are challenging, as I myself, as a social scientist, do not have enough insight to be able to interpret the results correctly. Therefore, I would like to invite you to participate in this research project!
I am now looking for people who would like to join a collaborative process to further explore and write the systematic literature review on “Teaching Open Science“. Because I want to turn this project into a Massive Open Online Paper (MOOP). According to the 10 rules of Tennant et al (2019) on MOOPs, it is crucial to find a core group that is enthusiastic about the topic. Therefore, I am looking for people who are interested in creating the structure of the paper and writing the paper together with me. I am also looking for people who want to search for and review literature or evaluate the literature I have already found. Together with the interested persons I would then define, the rules for the project (cf. Tennant et al. 2019). So if you are interested to contribute to the further search for articles and / or to enhance the interpretation and writing of results, please get in touch. For everyone interested to contribute, the list of articles collected so far is freely accessible at Zotero: https://www.zotero.org/groups/2359061/teaching_open_science. The figure shown below provides a first overview of my ongoing work. I created the figure with the free software yEd and uploaded the file to zenodo, so everyone can download and work with it:
To make transparent what I have done so far, I will first introduce what a systematic literature review is. Secondly, I describe the decisions I made to start with the systematic literature review. Third, I present the preliminary results.
Systematic literature review – an Introduction
Systematic literature reviews “are a method of mapping out areas of uncertainty, and identifying where little or no relevant research has been done.” (Petticrew/Roberts 2008: 2). Fink defines the systematic literature review as a “systemic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars, and practitioners.” (Fink 2019: 6). The aim of a systematic literature reviews is to surpass the subjectivity of a researchers’ search for literature. However, there can never be an objective selection of articles. This is because the researcher has for example already made a preselection by deciding about search strings, for example “Teaching Open Science”. In this respect, transparency is the core criteria for a high-quality review.
In order to achieve high quality and transparency, Fink (2019: 6-7) proposes the following seven steps:
I have adapted these steps for the “Teaching Open Science” systematic literature review. In the following, I will present the decisions I have made.
Systematic literature review – decisions I made
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 85.625 NA in 2023. This stayed constant from the previous number of 85.625 NA for 2022. United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 82.204 NA from Dec 2016 (Median) to 2023, with 8 observations. The data reached an all-time high of 85.625 NA in 2023 and a record low of 76.767 NA in 2020. United States US: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
The requests we receive at the Reference Desk keep surprising us. We'll take a look at some of the best examples from the year on data questions and data solutions.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.
Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.
A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.
Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.
Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.
There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.
Data table schemas
Sources data table
SOURCE_ID: The unique identifier of this source
DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")
NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract
DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"
Diet data table
RECORD_ID: The unique identifier of this record
SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)
SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience
ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one
LOCATION: The name of the location at which the data was collected
WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
ALTITUDE_MIN: The minimum altitude of the sampling region, in metres
ALTITUDE_MAX: The maximum altitude of the sampling region, in metres
DEPTH_MIN: The shallowest depth of the sampling, in metres
DEPTH_MAX: The deepest depth of the sampling, in metres
OBSERVATION_DATE_START: The start of the sampling period
OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)
PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names
PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source
PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register
PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register
PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)
PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"
PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"
PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed
PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).
PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample
PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")
PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")
PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample
PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")
PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents
PREY_NAME: The scientific name of the prey item (corrected, if necessary)
PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source
PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register
PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register
PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses
PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")
PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"
PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of
PEST++ Version 5 software release. This release includes ASCII format C++11 source code, precompiled binaries for windows 10 and linux, and inputs files the example problem shown in the report
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different sources of data for the uncovering of failures in reporting of safety and effectiveness of some examples of new drugs.
Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The museum dataset is an evolving list of museums and related organizations in the United States. The data file includes basic information about each organization (name, address, phone, website, and revenue) plus the museum type or discipline. The discipline type is based on the National Taxonomy of Exempt Entities, which the National Center for Charitable Statistics and IRS use to classify nonprofit organizations.
Non-museum organizations may be included. For example, a non-museum organization may be included in the data file because it has a museum-like name on its IRS record for tax-exempt organizations. Museum foundations may also be included.
Museums may be missing. For example, local municipal museums may be undercounted because original data sources used to create the compilation did not include them.
Museums may be listed multiple times. For example, one museum may be listed as both itself and its parent organization because it was listed differently in each original data sources. Duplicate records are especially common for museums located within universities.
Information about museums may be outdated. The original scan and compilation of data sources occurred in 2014. Scans are no longer being done to update the data sources or add new data sources to the compilation. Information about museums may have changed since it was originally included in the file.
The museum data was compiled from IMLS administrative records for discretionary grant recipients, IRS records for tax-exempt organizations, and private foundation grant recipients.
Which city or state has the most museums per capita? How many zoos or aquariums exist in the United States? What museum or related organization had the highest revenue last year? How does the composition of museum types differ across the country?
In 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.