Facebook
TwitterIn 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.
Facebook
Twitterhttps://choosealicense.com/licenses/artistic-2.0/https://choosealicense.com/licenses/artistic-2.0/
from bs4 import BeautifulSoup import requests
Fetch the webpage
url = 'https://example.com' response = requests.get(url) html_content = response.text
Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')
Extract all the links
for link in soup.find_all('a'): print(link.get('href'))
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 45.958 NA in 2022. This records a decrease from the previous number of 49.075 NA for 2021. Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 49.892 NA from Dec 2016 (Median) to 2022, with 7 observations. The data reached an all-time high of 52.417 NA in 2018 and a record low of 45.958 NA in 2022. Algeria DZ: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Algeria – Table DZ.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.
Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.
We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.
Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.
The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.
To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.
The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.
The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:
Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.
The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.
A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.
The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.
The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Child maltreatment is a major public health problem, which is plagued with research challenges. Good epidemiological data can help to establish the nature and scope of past and present child maltreatment, and monitor its progress going forward. However, high quality data sources are currently lacking for England and Wales. We employed systematic methodology to harness pre-existing datasets (including non-digitalised datasets) and develop a rich data source on the incidence of Child maltreatment over Time (iCoverT) in England and Wales. The iCoverT consists of six databases and accompanying data documentation: Child Protection Statistics, Children In Care Statistics, Criminal Statistics, Homicide Index, Mortality Statistics and NSPCC Statistics. Each database is a unique indicator of child maltreatment incidence with 272 data variables in total. The databases span from 1858 to 2016 and therefore extends current data sources by over 80 years. We present a proof-of-principle analysis of a subset of the data to show how time series methods may be used to address key research challenges. This example demonstrates the utility of iCoverT and indicates that it will prove to be a valuable data source for researchers, clinicians and policy-makers concerned with child maltreatment. The iCoverT is freely available at the Open Science Framework (osf.io/cf7mv).
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Real World Evidence Solutions Market size was valued at USD 1.30 Billion in 2024 and is projected to reach USD 3.71 Billion by 2032, growing at a CAGR of 13.92% during the forecast period 2026-2032.Global Real World Evidence Solutions Market DriversThe market drivers for the Real World Evidence Solutions Market can be influenced by various factors. These may include:Growing Need for Evidence-Based Healthcare: Real-world evidence (RWE) is becoming more and more important in healthcare decision-making, according to stakeholders such as payers, providers, and regulators. In addition to traditional clinical trial data, RWE solutions offer important insights into the efficacy, safety, and value of healthcare interventions in real-world situations.Growing Use of RWE by Pharmaceutical Companies: RWE solutions are being used by pharmaceutical companies to assist with market entry, post-marketing surveillance, and drug development initiatives. Pharmaceutical businesses can find new indications for their current medications, improve clinical trial designs, and convince payers and providers of the worth of their products with the use of RWE.Increasing Priority for Value-Based Healthcare: The emphasis on proving the cost- and benefit-effectiveness of healthcare interventions in real-world settings is growing as value-based healthcare models gain traction. To assist value-based decision-making, RWE solutions are essential in evaluating the economic effect and real-world consequences of healthcare interventions.Technological and Data Analytics Advancements: RWE solutions are becoming more capable due to advances in machine learning, artificial intelligence, and big data analytics. With the use of these technologies, healthcare stakeholders can obtain actionable insights from the analysis of vast and varied datasets, including patient-generated data, claims data, and electronic health records.Regulatory Support for RWE Integration: RWE is being progressively integrated into regulatory decision-making processes by regulatory organisations including the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). The FDA's Real-World Evidence Programme and the EMA's Adaptive Pathways and PRIority MEdicines (PRIME) programme are two examples of initiatives that are making it easier to incorporate RWE into regulatory submissions and drug development.Increasing Emphasis on Patient-Centric Healthcare: The value of patient-reported outcomes and real-world experiences in healthcare decision-making is becoming more widely acknowledged. RWE technologies facilitate the collection and examination of patient-centered data, offering valuable insights into treatment efficacy, patient inclinations, and quality of life consequences.Extension of RWE Use Cases: RWE solutions are being used in medication development, post-market surveillance, health economics and outcomes research (HEOR), comparative effectiveness research, and market access, among other healthcare fields. The necessity for a variety of RWE solutions catered to the needs of different stakeholders is being driven by the expansion of RWE use cases.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Mariusz Šapczyński, Cracow University of Economics, Poland, lapczynm '@' uek.krakow.pl Sylwester Białowąs, Poznan University of Economics and Business, Poland, sylwester.bialowas '@' ue.poznan.pl
The dataset contains information on clickstream from online store offering clothing for pregnant women. Data are from five months of 2008 and include, among others, product category, location of the photo on the page, country of origin of the IP address and product price in US dollars.
The dataset contains 14 variables described in a separate file (See 'Data set description')
N/A
If you use this dataset, please cite:
Šapczyński M., Białowąs S. (2013) Discovering Patterns of Users' Behaviour in an E-shop - Comparison of Consumer Buying Behaviours in Poland and Other European Countries, “Studia Ekonomiczne†, nr 151, “La société de l'information : perspective européenne et globale : les usages et les risques d'Internet pour les citoyens et les consommateurs†, p. 144-153
========================================================
========================================================
========================================================
========================================================
following categories:
1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)
========================================================
========================================================
1-trousers 2-skirts 3-blouses 4-sale
========================================================
(217 products)
========================================================
1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white
========================================================
1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right
========================================================
1-en face 2-profile
========================================================
========================================================
the average price for the entire product category
1-yes 2-no
========================================================
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Facebook
TwitterThis data release is a compilation of construction depth information for 12,383 active and inactive public-supply wells (PSWs) in California from various data sources. Construction data from multiple sources were indexed by the California State Water Resources Control Board Division of Drinking Water (DDW) primary station code (PS Code). Five different data sources were compared with the following priority order: 1, Local sources from select municipalities and water purveyors (Local); 2, Local DDW district data (DDW); 3, The United States Geological Survey (USGS) National Water Information System (NWIS); 4, The California State Water Resources Control Board Groundwater Ambient Monitoring and Assessment Groundwater Information System (SWRCB); and 5, USGS attribution of California Department of Water Resources well completion report data (WCR). For all data sources, the uppermost depth to the well's open or perforated interval was attributed as depth to top of perforations (ToP). The composite depth to bottom of well (Composite BOT) field was attributed from available construction data in the following priority order: 1, Depth to bottom of perforations (BoP); 2, Depth of completed well (Well Depth); 3; Borehole depth (Hole Depth). PSW ToPs and Composite BOTs from each of the five data sources were then compared and summary construction depths for both fields were selected for wells with multiple data sources according to the data-source priority order listed above. Case-by-case modifications to the final selected summary construction depths were made after priority order-based selection to ensure internal logical consistency (for example, ToP must not exceed Composite BOT). This data release contains eight tab-delimited text files. WellConstructionSourceData_Local.txt contains well construction-depth data, Composite BOT data-source attribution, and local agency data-source attribution for the Local data. WellConstructionSourceData_DDW.txt contains well construction-depth data and Composite BOT data-source attribution for the DDW data. WellConstructionSourceData_NWIS.txt contains well construction-depth data, Composite BOT data-source attribution, and USGS site identifiers for the NWIS data. WellConstructionSourceData_SWRCB.txt contains well construction-depth data and Composite BOT data-source attribution for the SWRCB data. WellConstructionSourceData_WCR.txt contains contains well construction depth data and Composite BOT data-source attribution for the WCR data. WellConstructionCompilation_ToP.txt contains all ToP data listed by data source. WellConstructionCompilation_BOT.txt contains all Composite BOT data listed by data source. WellConstructionCompilation_Summary.txt contains summary ToP and Composite BOT values for each well with data-source attribution for both construction fields. All construction depths are in units of feet below land surface and are reported to the nearest foot.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.
Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.
A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.
Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.
Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.
There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.
Data table schemas
Sources data table
SOURCE_ID: The unique identifier of this source
DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")
NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract
DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"
Diet data table
RECORD_ID: The unique identifier of this record
SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)
SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience
ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one
LOCATION: The name of the location at which the data was collected
WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
ALTITUDE_MIN: The minimum altitude of the sampling region, in metres
ALTITUDE_MAX: The maximum altitude of the sampling region, in metres
DEPTH_MIN: The shallowest depth of the sampling, in metres
DEPTH_MAX: The deepest depth of the sampling, in metres
OBSERVATION_DATE_START: The start of the sampling period
OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)
PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names
PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source
PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register
PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register
PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)
PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"
PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"
PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed
PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).
PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample
PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")
PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")
PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample
PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")
PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents
PREY_NAME: The scientific name of the prey item (corrected, if necessary)
PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source
PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register
PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register
PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses
PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")
PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"
PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan JP: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 84.050 NA in 2024. This stayed constant from the previous number of 84.050 NA for 2023. Japan JP: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 78.317 NA from Mar 2017 (Median) to 2024, with 8 observations. The data reached an all-time high of 84.050 NA in 2024 and a record low of 71.542 NA in 2017. Japan JP: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Japan – Table JP.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Syria SY: SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 13.317 NA in 2023. This stayed constant from the previous number of 13.317 NA for 2022. Syria SY: SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 15.267 NA from Dec 2017 (Median) to 2023, with 7 observations. The data reached an all-time high of 18.417 NA in 2017 and a record low of 11.967 NA in 2021. Syria SY: SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Syrian Arab Republic – Table SY.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023
This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.
The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).
## Data sources
Folder 01_SourceData/
- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)
- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)
- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)
- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)
## Automatic classification
Folder 02_AutomaticClassification/
- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)
- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)
- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)
- oddpub_results_wDOIs.csv (results file of the ODDPub classification)
- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)
## Manual coding
Folder 03_ManualCheck/
- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)
- ManualCheck_2023-06-08.csv (Manual coding results file)
- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)
## Explorative analysis for the discoverability of open data
Folder04_FurtherAnalyses
Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German
## R-Script
Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Palau SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 53.817 NA in 2023. This stayed constant from the previous number of 53.817 NA for 2022. Palau SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 53.317 NA from Sep 2020 (Median) to 2023, with 4 observations. The data reached an all-time high of 53.817 NA in 2023 and a record low of 52.817 NA in 2021. Palau SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Palau – Table PW.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Facebook
TwitterJurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The
Facebook
TwitterThis Zenodo repository contains datasets related to the detected and inferred positions of individuals within a study area (approximately 15 km by 20 km) surrounding the city of Zofingen, Switzerland. The data provide aggregated and anonymized positions of individuals at a fine-grained level, corresponding to road segments or railway tracks, and span a 44-day period from June 17th to July 30th, 2017. This period includes a severe local flood event as well as social events. The datasets also include the postal codes of individuals' municipalities of residence who traveled to the study area. The data were acquired by the University of Bern from Swisscom, at a total cost of 32400 CHF (around 35796 USD as of December 2024). The related peer-reviewed research article will be available soon.
The network topology within the study area surrounding the city of Zofingen, Switzerland, is defined by two JSON files: "nodes.json" and "edges.json". These files respectively contain the attributes of all nodes and edges within the study area.
The JSON file for the nodes is an array of nodes that contains the following information:
The JSON file for the edges is an array of edges that contains the following information:
These datasets provide estimated counts of people observed along road and rail networks, within a roughly 15 km by 20 km area surrounding the city of Zofingen, Switzerland. These estimates rely on the Swisscom's market penetration and are calculated using two methodologies called as the datasets (i.e. POSACT and PATHACT), and explained in the "explanation_of_POSACT_and_PATHACT.pdf" file. To safeguard the user privacy, estimates for edges with less than 20 detected users are not included (missing data for an edge is assumed to represent less than 20 observations).
Both POSACT and PATHACT datasets have the following headers:
| edgeStart | edgeEnd | hourOfDay | estimatedCount |
An explanation of the headers:
This dataset provides estimates of the number of users traveling from their municipality of residence to the local area of study.
The "plzs" dataset has the following headers:
| municipalityID | locationName | hourOfDay | estimatedCount |
An explanation of the headers:
The information presented here on Zenodo is based on the details originally provided by Swisscom in their dataset package. The dissemination of this information on Zenodo strictly adheres to the contractual agreements in place between Swisscom and the University of Bern, as well as to the explicit consent granted by Swisscom via email, dated January 31, 2024. This consent specifically authorized the publication of this dataset package, including its data descriptions and explanations.
S.L. conceived the idea of using mobile phone data, initiated and established the collaboration with Swisscom, uploaded the data to Zenodo, and wrote the data descriptions (based on the information provided by Swisscom). M.K. and A.Z. secured funding to support the data acquisition efforts.
Simone Loreti
Facebook
TwitterPEST++ Version 5 software release. This release includes ASCII format C++11 source code, precompiled binaries for windows 10 and linux, and inputs files the example problem shown in the report
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Mackinac Island population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Mackinac Island. The dataset can be utilized to understand the population distribution of Mackinac Island by age. For example, using this dataset, we can identify the largest age group in Mackinac Island.
Key observations
The largest age group in Mackinac Island, MI was for the group of age 15 to 19 years years with a population of 83 (12.52%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Mackinac Island, MI was the Under 5 years years with a population of 2 (0.30%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Mackinac Island Population by Age. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As new data sources have emerged, the data space which Pharmacovigilance (PV) processes can use has significantly expanded. However, still, the currently available tools do not widely exploit data sources beyond Spontaneous Report Systems built to collect Individual Case Safety Reports (ICSRs). This article presents an open-source platform enabling the integration of heterogeneous data sources to support the analysis of drug safety related information. Furthermore, the results of a comparative study as part of the project’s pilot phase are also presented. Data sources were integrated in the form of four “workspaces”: (a) Individual Case Safety Reports—obtained from OpenFDA, (b) Real-World Data (RWD) —using the OMOP-CDM data model, (c) social media data—collected via Twitter, and (d) scientific literature—retrieved from PubMed. Data intensive analytics are built for each workspace (e.g., disproportionality analysis metrics are used for OpenFDA data, descriptive statistics for OMOP-CDM data and twitter data streams etc.). Upon these workspaces, the end-user sets up “investigation scenarios” defined by Drug-Event Combinations (DEC). Specialized features like detailed reporting which could be used to support reports for regulatory purposes and also “quick views” are provided to facilitate use where detailed statistics might not be needed and a qualitative overview of the available information might be enough (e.g., clinical environment). The platform’s technical features are presented as Supplementary Material via a walkthrough of an example “investigation scenario”. The presented platform is evaluated via a comparative study against the EVDAS system, conducted by PV professionals. Results from the comparative study, show that there is indeed a need for relevant technical tools and the ability to draw recent data from heterogeneous data sources is appreciated. However, a reluctance by end-users is also outlined as they feel technical improvements and systematic training are required before the potential adoption of the presented software. As a whole, it is concluded that integrating such a platform in real-world setting is far from trivial, requiring significant effort on training and usability aspects.
Facebook
TwitterThis data set is a collection of anonymized sample fundraising data sets so that practitioners within our field can practice and share examples using a common data source
If you have any anonymous data that you would like to include here let me know: Michael Pawlus (pawlus@usc.edu)
Thanks to everyone who has shared data so far to make this possible.
Facebook
TwitterIn 2020, according to respondents surveyed, data masters typically leverage a variety of external data sources to enhance their insights. The most popular external data sources for data masters being publicly available competitor data, open data, and proprietary datasets from data aggregators, with **, **, and ** percent, respectively.