Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
This dataset consists of five comma-separated values (.csv) files describing our inventory:
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
Facebook
TwitterThe monthly means of ECMWF ERA-40 reanalysis isentropic level analysis data are in this dataset.
Facebook
TwitterThe modeled data in these archives are in the NetCDF format (https://www.unidata.ucar.edu/software/netcdf/). NetCDF (Network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. It is also a community standard for sharing scientific data. The Unidata Program Center supports and maintains netCDF programming interfaces for C, C++, Java, and Fortran. Programming interfaces are also available for Python, IDL, MATLAB, R, Ruby, and Perl. Data in netCDF format is: • Self-Describing. A netCDF file includes information about the data it contains. • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. • Scalable. Small subsets of large datasets in various formats may be accessed efficiently through netCDF interfaces, even from remote servers. • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure. • Sharable. One writer and multiple readers may simultaneously access the same netCDF file. • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software. Pub_figures.tar.zip Contains the NCL scripts for figures 1-5 and Chesapeake Bay Airshed shapefile. The directory structure of the archive is ./Pub_figures/Fig#_data. Where # is the figure number from 1-5. EMISS.data.tar.zip This archive contains two NetCDF files that contain the emission totals for 2011ec and 2040ei emission inventories. The name of the files contain the year of the inventory and the file header contains a description of each variable and the variable units. EPIC.data.tar.zip contains the monthly mean EPIC data in NetCDF format for ammonium fertilizer application (files with ANH3 in the name) and soil ammonium concentration (files with NH3 in the name) for historical (Hist directory) and future (RCP-4.5 directory) simulations. WRF.data.tar.zip contains mean monthly and seasonal data from the 36km downscaled WRF simulations in the NetCDF format for the historical (Hist directory) and future (RCP-4.5 directory) simulations. CMAQ.data.tar.zip contains the mean monthly and seasonal data in NetCDF format from the 36km CMAQ simulations for the historical (Hist directory), future (RCP-4.5 directory) and future with historical emissions (RCP-4.5-hist-emiss directory). This dataset is associated with the following publication: Campbell, P., J. Bash, C. Nolte, T. Spero, E. Cooter, K. Hinson, and L. Linker. Projections of Atmospheric Nitrogen Deposition to the Chesapeake Bay Watershed. Journal of Geophysical Research - Biogeosciences. American Geophysical Union, Washington, DC, USA, 12(11): 3307-3326, (2019).
Facebook
TwitterAnimal ecologists often collect hierarchically-structured data and analyze these with linear mixed-effects models. Specific complications arise when the effect sizes of covariates vary on multiple levels (e.g., within vs among subjects). Mean-centering of covariates within subjects offers a useful approach in such situations, but is not without problems. A statistical model represents a hypothesis about the underlying biological process. Mean-centering within clusters assumes that the lower level responses (e.g. within subjects) depend on the deviation from the subject mean (relative) rather than on absolute values of the covariate. This may or may not be biologically realistic. We show that mismatch between the nature of the generating (i.e., biological) process and the form of the statistical analysis produce major conceptual and operational challenges for empiricists. We explored the consequences of mismatches by simulating data with three response-generating processes differing in the source of correlation between a covariate and the response. These data were then analyzed by three different analysis equations. We asked how robustly different analysis equations estimate key parameters of interest and under which circumstances biases arise. Mismatches between generating and analytical equations created several intractable problems for estimating key parameters. The most widely misestimated parameter was the among-subject variance in response. We found that no single analysis equation was robust in estimating all parameters generated by all equations. Importantly, even when response-generating and analysis equations matched mathematically, bias in some parameters arose when sampling across the range of the covariate was limited. Our results have general implications for how we collect and analyze data. They also remind us more generally that conclusions from statistical analysis of data are conditional on a hypothesis, sometimes implicit, for the process(es) that generated the attributes we measure. We discuss strategies for real data analysis in face of uncertainty about the underlying biological process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The goal of the project is to analyze altmetric data regarding published scientific papers in 2016, and to specifically determine which interdisciplinary fields are impactful. After taking some random samples, the program designed uses data clustering as well as data representation techniques to analyze the altmetric data. Trying to classify the papers into different levels of impact, k-means clustering is applied in a creative way. With the focus on the interdisciplinary fields, three kinds of matrices are now calculated to illustrate the strength of the connection between every possible pairing combination: average altmetric score, percentage of published papers in this interdisciplinary field, and total altmetric score. Sorting based on the values obtained in the matrices and comparing three matrices can yield insightful results and help people understand the connections between different subjects better.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel based tool was developed to analyze means-end chain data. The tool consists of a user manual, a calculator file for analyzing your data, and instructional videos.
The purpose of this tool is to aggregate laddering data into hierarchical value maps showing means-end chains. The summarized results consist of (1) a summary overview,
(2) a matrix, and (3) output for copy/pasting into NodeXL to generate hierarchal value maps (HVMs). To use this tool, you must have collected data via laddering interviews. Ladders are codes linked together consisting of attributes, consequences and values (ACVs).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The validity of empirical research often relies upon the accuracy of self-reported behavior and beliefs. Yet, eliciting truthful answers in surveys is challenging especially when studying sensitive issues such as racial prejudice, corruption, and support for militant groups. List experiments have attracted much attention recently as a potential solution to this measurement problem. Many researchers, however, have used a simple difference-in-means estimator without being able to efficiently examine multivariate relationships between respondents' characteristics and their answers to sensitive items. Moreover, no systematic means exist to investigate role of underlying assumptions. We fill these gaps by developing a set of new statistical methods for list experiments. We identify the commonly invoked assumptions, propose new multivariate regression estimators, and develop methods to detect and adjust for potential violations of key assumptions. For empirical illustrations, we analyze list experiments concerning racial prejudice. Open-source software is made available to implement the proposed methodology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This research entitles “A Semiotics Analysis Found on Music Vidio of You Belong with Me”.The aim of this research was to investigate and analyze the verbal and visual signs and the meaning itself in the music video of “You Belong with Me” by Taylor Swift. The type of this research was qualitative research. In collecting data, the writer used the method of observation and documentation by classifying videos into pictures in the form of sequences.The results of this study indicate that the semiotic signs contained in this music video are in the form of visual displays contained in body language in the music video which tells about a male friend that Swift likes who actually has a lover, and verbal signs contained in the music video is a paper that contains writing that is used to communicate. Based on the result of the analysis,it can be concluded as there are two classifications,namely: verbal sign and visual sign. In verbal sign, it was found eight data. In visual sign, it was found seven data. The concept of music video of You Belong With Me describe someone who is in love with someone where that person has been with a lover who doesn't appreciate it at all. In the data found, verbal and visual sign explained about caring, disappointment, jealousy, and express feelings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset extracted from the post 10 Important Questions on Fundamental Analysis of Stocks – Meaning, Parameters, and Step-by-Step Guide on Smart Investello.
Facebook
TwitterIn-depth proteome exploration of complex body fluids is a challenging task that requires optimal sample preparation and analysis in order to reach novel and meaningful insights. Analysis of follicular fluids is similarly difficult as that of blood serum due to the ubiquitous presence of several highly abundant proteins and a wide range of protein concentrations. Therefore, the accessibility of this complex body fluid for liquid chromatography-tandem mass spectrometry (LC/MS/MS) analysis is a challenging opportunity to gain insights into the physiological status or to identify new diagnostic and prognostic markers for e.g. the treatment of infertility. We compared different sample preparation methods (FASP, eFASP and in-solution digestion) and three different data analysis software packages (Proteome Discoverer with SEQUEST and Mascot, Maxquant with Andromeda) in conjunction with semi- and full-tryptic databank search approaches in order to obtain a maximum coverage of the proteome.
Facebook
TwitterBy Health [source]
The Behavioral Risk Factor Surveillance System (BRFSS) is an annual state-based, telephone survey of adults in the United States. It collects a variety of health-related data, including Health Related Quality of Life (HRQOL). This dataset contains results from the HRQOL survey within a range of locations across the US for the year indicated.
This dataset includes 14 columns which summarize and quantify different aspects concerning HRQOL topics. The year, location abbreviation, description and geo-location provide background contextual information which help define each row. The question column indicates the response provided to by respondents, while category classifies it into overarching groupings. Additionally there are columns covering sample size and data value attributes such as standard error, unit and type all evidence chipping away at informative insights into how Americans’ quality of life is changing over time — all cleverly presented in this one concise dataset!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to analyze this dataset, it is important have a good understanding of the columns included in it. The columns provide various pieces of information about the data such as year collected, location abbreviation, location name and type of data value collected. Furthermore, understanding what each column means is essential for proper interpretation and analysis; for example knowing that ‘Data_Value %’ indicates what percentage responded a certain way or that ‘Sample_Size’ shows how many people were surveyed can help you make better decisions when looking at patterns within the data set.
Once you understand the general structure behind this dataset one should also familiarize themselves with some basic statistical analysis tools such as mean/median/mode calculations comparative/correlative analysis so they can really gain insights into how health-related quality of life affects different populations across countries or regions.. To get even more meaningful results you might also want to consider adding other variables or datasets into your report that correlate with HRQOL - like poverty rate or average income level - so you can make clearer conclusions about potential contributing factors towards certain insights you uncover while using this dataset alone.
- Identifying trends between geolocation and health-related quality of life indicators to better understand how environmental factors may impact specific communities.
- Visualizing the correlations between health-related quality of life variables across different locations over time to gain insights on potential driving developmental or environmental issues.
- Monitoring the effects of public health initiatives dealing with qualitative health data such as those conducted by CDC, Department of Health and Human Services, and other organizations by tracking changes in different aspects of HRQOL measures over time across multiple locations
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: rows.csv | Column name | Description | |:-------------------------------|:-------------------------------------------------------------------------------------------------------------| | Year | Year when the data was collected. (Integer) | | LocationAbbr | Abbreviations of various locations where data was recorded. (String) | | LocationDesc | Full names of states whose records are included in this survey. (String) | | Category | Particular topic chosen for research such as “Healthy People 2010 Topics” or “Older Adults Issues”. (String) | | Question | Each question corresponds to metrics tracked within each topic. (String) | | DataSource | Source from which survey responses were collected. (String) | | Data_Value_Unit | Units taken for recording survey types...
Facebook
TwitterBackground Gene expression profiling among different tissues is of paramount interest in various areas of biomedical research. We have developed a novel method (DADA, Digital Analysis of cDNA Abundance), that calculates the relative abundance of genes in cDNA libraries. Results DADA is based upon multiple restriction fragment length analysis of pools of clones from cDNA libraries and the identification of gene-specific restriction fingerprints in the resulting complex fragment mixtures. A specific cDNA cloning vector had to be constructed that governed missing or incomplete cDNA inserts which would generate misleading fingerprints in standard cloning vectors. Double stranded cDNA was synthesized using an anchored oligo dT primer, uni-directionally inserted into the DADA vector and cDNA libraries were constructed in E. coli. The cDNA fingerprints were generated in a PCR-free procedure that allows for parallel plasmid preparation, labeling, restriction digest and fragment separation of pools of 96 colonies each. This multiplexing significantly enhanced the throughput in comparison to sequence-based methods (e.g. EST approach). The data of the fragment mixtures were integrated into a relational database system and queried with fingerprints experimentally produced by analyzing single colonies. Due to limited predictability of the position of DNA fragments on the polyacrylamid gels of a given size, fingerprints derived solely from cDNA sequences were not accurate enough to be used for the analysis. We applied DADA to the analysis of gene expression profiles in a model for impaired wound healing (treatment of mice with dexamethasone). Conclusions The method proved to be capable of identifying pharmacologically relevant target genes that had not been identified by other standard methods routinely used to find differentially expressed genes. Due to the above mentioned limited predictability of the fingerprints, the method was yet tested only with a limited number of experimentally determined fingerprints and was able to detect differences in gene expression of transcripts representing 0.05% of the total mRNA population (e.g. medium abundant gene transcripts).
Facebook
Twitterhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
This dataset contains ERA5 surface level analysis parameter data ensemble means (see linked dataset for spreads). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record.
Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data.
The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects.
An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is associated with the paper Knoop et al. (2019) titled "A generic gust definition and detection method based on wavelet-analysis" published in "Advances in Science and Research (ASR)" within the Special Issue: 18th EMS Annual Meeting: European Conference for Applied Meteorology and Climatology 2018. It contains the data and analysis software required to recreate all figures in the publication.
Facebook
TwitterThis dataset contains the original data, analysis data, and a results synopsis of 12 slug tests performed in 7 wells completed in unconfined fractured bedrock near the North Shore of Lake Superior in Minnesota. Aquifers tested include extrusive and intrusive volcanic rocks and slate. Estimated hydraulic conductivity range from 10.2 to 2x10-6 feet/day. Mean and median hydraulic conductivity are 3.7 and 1.6, respectively. The highest and lowest hydraulic conductivities were in slate and fractured lava, respectively. Compressed air and traditional displacement-tube methods were employed. Water levels were measured with barometrically compensated (11 tests) and absolute pressure transducers (1 test) and recorded with data loggers. Test data were analyzed with AQTESOLV software using the unconfined KGS (Hyder and others, 1994; 9 tests) and Bower-Rice, 1976 models (3 tests).This dataset contains the original data, analysis data, and a results synopsis of 12 slug tests performed in 7 wells completed in unconfined fractured bedrock near the North Shore of Lake Superior in Minnesota. Aquifers tested include extrusive and intrusive volcanic rocks and slate. Estimated hydraulic conductivity range from 10.2 to 2x10-6 feet/day. Mean and median hydraulic conductivity are 3.7 and 1.6, respectively. The highest and lowest hydraulic conductivities were in slate and fractured lava, respectively. Compressed air and traditional displacement-tube methods were employed. Water levels were measured with barometrically compensated (11 tests) and absolute pressure transducers (1 test) and recorded with data loggers. Test data were analyzed with AQTESOLV software using the unconfined KGS (Hyder and others, 1994; 9 tests) and Bower-Rice, 1976 models (3 tests). Data files include the original recorded data, data files transformed into a form necessary for AQTESLOV, AQTESOLV analysis files and results files, and a compilation of well information and slug-test results. All files are formatted as tab-delimited ASCII except for the AQTESOVE analysis and results files, which are proprietary aqt and PDF files respectively. For convenience, a Microsoft Excel file is included that contains a synopsis of the well data and slug-test results, original recorded, transformed, and plotted slug-test data, data formats, constants and variables used in the data analysis, and notes about each test. Data files include the original recorded data, data files transformed into a form necessary for AQTESLOV, AQTESOLV analysis files and results files, and a compilation of well information and slug-test results. All files are formatted as tab-delimited ASCII except for the AQTESOVE analysis and results files, which are proprietary aqt and PDF files respectively. For convenience, a Microsoft Excel file is included that contains a synopsis of the well data and slug-test results, original recorded, transformed, and plotted slug-test data, data formats, constants and variables used in the data analysis, and notes about each test.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Data as a Service market was valued at USD XXX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 20.00% during the forecast period.Data as a Service, in its most simplistic form, provides an on-demand cloud-based service model for data and analytics. The model will help business use the power of data by not requiring large upfront investments in data storage, processing, and analysis infrastructure. Therefore, data and insights as a service will make DaaS simple to manage, reduce operational costs, and accelerate time-to-value.DaaS suppliers deliver a collection of data services which may include data integration, data cleansing, data enrichment, and data analytics. These services ensure businesses are able to access, and thereby use, hundreds and thousands of data sources located internally or externally for valuable insight and informed decisions. Primarily, DaaS can help out those organizations lacking internal resources and expertise or in their means to gather, handle, and process significant data. Business results are therefore better outsourced with DaaS because they can, at a given time, tend to more core competencies related to the business. Recent developments include: September 2022: Asigra Inc., an ultra-secure backup and recovery pioneer, declared the general availability of Tigris Data Protection software with Content Disarm & Reconstruction (CDR). The addition of CDR makes Asigra the most security-forward backup and recovery software platform available, adding to its extensive suite of security features., June 2022: IMAT Solutions, a real-time healthcare data management and population health reporting solutions provider, announced the launch of a new Data-as-a-Service (DaaS) offering for health payers. The new DaaS solution meets the new Centers for Medicare & Medicaid Services (CMS) effort to transition all quality measures used in its reporting programs to digital quality measures (dQMs).. Key drivers for this market are: Growing Penetration of Data-based Decisions Among Enterprises, Transformation of Enterprises Leading to Real-time Analytics Demand. Potential restraints include: Concerns Regarding Privacy and Security. Notable trends are: BFSI Sector to Witness High Growth.
Facebook
TwitterThis dataset provides the raw anonymised (quantitative) data from the EDSA demand analysis. This data has been gathered from surveys performed with those who identify as data scientists and manages of data scientists in different sectors across Europe. The coverage of the data includes level of current expertise of the individual or team (data scientist and manager respectively) in eight key areas. The dataset also includes the importance of the eight key areas as capabilities of a data scientist. Further the dataset includes a breakdown of key tools, technologies and training delivery methods required to enhance the skill set of data scientists across Europe. The EDSA dashboard provides an interactive view of this dataset and demonstrates how it is being used within the project. The dataset forms part of the European Data Science Academy (EDSA) project which received funding from the European Unions's Horizon 2020 research and innovation programme under grant agreement No 643937. This three year project ran/runs from February 2015 to January 2018. Important note on privacy: This dataset has been collected and made available in a pseudo anonymous way, as agreed by participants. This means that while each record represents a person, no sensitive identifiable information, such as name, email or affiliation is available (we don't even collect it). Pseudo anonymisation is never full proof, however the projects privacy impact assessment has concluded that the risk resulting from the de-anonymisation of the data is extremely low. It should be noted that data is not included of participants who did not explicitly agree that it could be shared pseudo anonymously (this was due to a change of terms after the survey had started gathering responses, meaning any early responses had come from people who didn't see this clause). If you have any concerns please contact the data publisher via the links below.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
Facebook
TwitterThis dataset contains ERA5 initial release (ERA5t) surface level analysis parameter data ensemble means (see linked dataset for spreads). ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. The ensemble means and spreads are calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. See linked datasets for ensemble member and spread data. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains tabular files with information about the usage preferences of speakers of Maltese English with regard to 63 pairs of lexical expressions. These pairs (e.g. truck-lorry or realization-realisation) are known to differ in usage between BrE and AmE (cf. Algeo 2006). The data were elicited with a questionnaire that asks informants to indicate whether they always use one of the two variants, prefer one over the other, have no preference, or do not use either expression (see Krug and Sell 2013 for methodological details). Usage preferences were therefore measured on a symmetric 5-point ordinal scale. Data were collected between 2008 to 2018, as part of a larger research project on lexical and grammatical variation in settings where English is spoken as a native, second, or foreign language. The current dataset, which we use for our methodological study on ordinal data modeling strategies, consists of a subset of 500 speakers that is roughly balanced on year of birth. Abstract: Related publication In empirical work, ordinal variables are typically analyzed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological literature, it also generates simple and informative data summaries, a standard often not met by statistically more adequate procedures. Motivated by a survey of how ordered variables are dealt with in language research, we draw attention to an un(der)used latent-variable approach to ordinal data modeling, which constitutes an alternative perspective on the most widely used form of ordered regression, the cumulative model. Since the latent-variable approach does not feature in any of the studies in our survey, we believe it is worthwhile to promote its benefits. To this end, we draw on questionnaire-based preference ratings by speakers of Maltese English, who indicated on a 5-point scale which of two synonymous expressions (e.g. package-parcel) they (tend to) use. We demonstrate that a latent-variable formulation of the cumulative model affords nuanced and interpretable data summaries that can be visualized effectively, while at the same time avoiding limitations inherent in mean response models (e.g. distortions induced by floor and ceiling effects). The online supplementary materials include a tutorial for its implementation in R.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
This dataset consists of five comma-separated values (.csv) files describing our inventory:
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb