Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cronbach’s alpha (coefficient α) is the conventional statistic communication scholars use to estimate the reliability of multi-item measurement instruments. For many, if not most communication measures, α should not be calculated for reliability estimation. Instead, coefficient omega (ω) should be reported as it aligns with the definition of reliability itself. In this primer, we review α and ω, and explain why ω should be the new ‘gold standard’ in reliability estimation. Using Mplus, we demonstrate how ω is calculated on an available data set and show how preliminary scales can be revised with ‘ω if item deleted.’ We also list several easy-to-use resources to calculate ω in other software programs. Communication researchers should routinely report ω instead of α.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data accompanying the article Exploring Definitions of Quality and Diversity in Sonic Measurement Spaces The Innovation Engine algorithm is used to evolve sounds, where Quality Diversity search is guided by Behaviour Definitions by unsupervised models and full-reference and no-reference quality evaluation approaches. Sonic discoveries have shaped and transformed creative processes in sound art and music production. Compositions prompted by new timbres influence and improve our lives. Modern technology offers a vast space of sonic possibilities to explore. Background and expertise influence the explorers ability to navigate that space of possibilities. Efforts have been made to develop automated systems that can systematically generate and explore these sonic possibilities. One route of such efforts has involved the search for diversity and quality with evolutionary algorithms, automating the evaluation of those metrics with supervised models. We continue on that path of investigation by further exploring possible definitions of quality and diversity in sonic measurement spaces by applying and dynamically redefining unsupervised models to autonomously illuminate sonic search spaces. In particular we investigate the applicability of unsupervised dimensionality reduction models for defining dynamically expanding, structured containers for a quality diversity search algorithm to operate within. Furthermore we evaluate different approaches for defining sonic characteristics with different feature extraction approaches. Results demonstrate considerable ability in autonomously discovering a diversity of sounds, as well as limitations of simulating evolution within the confines of a single, structured, albeit dynamically redefined, search landscape. Sound objects discovered in traversals through such autonomously illuminated sonic spaces can serve as resources in shaping our lives and steering them through diverse creative paths, along which stepping stones towards interesting innovations can be collected and used as input to human culture.
Acoustic Doppler current profiler (ADCP) discharge measurement data were collected and analyzed for use in developing an operational uncertainty analysis tool known as QUant (Moore and others, 2016). These ADCP measurements were originally collected in the United States, Canada, and New Zealand as a part of research conducted to validate ADCP discharge measurements made with Teledyne RD Instruments RiverRay and SonTek M9 ADCPs (Boldt and Oberg, 2015). The data were chosen in order to represent a variety of geographic and streamflow conditions, such as mean depth and mean velocity. Due to current limitations in the QUant software, only measurements collected using Teledyne RD Instruments Rio Grande and StreamPro ADCPs were used. All measurements were collected and processed with WinRiver II (Teledyne RD Instruments, 2016). An appropriate method for estimation of flow near the water surface and the streambed was obtained by means of the extrap software (Mueller, 2013). The extrapolation method and parameters obtained with extrap were entered into WinRiver II and reprocessed before use in QUant. Due to the complexity of an ADCP data file and the various algorithms applied to compute the streamflow from ADCP data, these data are most useful in their original raw data format which can be opened and processed in either WinRiver II, which is available without cost at: http://www.teledynemarine.com/rdi/support#. Each measurement consists of: (1) .mmt file; an xml configuration file used by WinRiver II for instrument setup, specific measurement data entry, and filenames of the raw transect data files (.pd0). (2) .pd0 files; the raw binary data collected by WinRiver II. The format for these files is defined in Teledyne RD Instruments (2016). (3) .txt files; raw ASCII data from external sensors such as GPS receivers. These data are not used in WinRiver II nor for the present analyses. (4) *_extrap.txt file; a file that summarizes the method and parameters selected for estimation of near-surface and near-bed discharges. (5) WinRiver.pdf files; a file that provides a summary of the discharge measurement in pdf format. References Boldt, J. A., and Oberg, K. A., 2016, Validation of streamflow measurements made with M9 and RiverRay Acoustic Doppler current profilers: Journal of Hydraulic Engineering, v. 142, no. 2. [Also available at https://doi.org/10.1061/(asce)hy.1943-7900.0001087.] Moore, S.A., Jamieson, E. C., Rainville, F., Rennie, C.D.,and& Mueller, D.S., 2017, Monte Carlo approach for uncertainty analysis of Acoustic Doppler current profiler discharge measurement by moving boat: Journal of Hydraulic Engineering: v. 143 no. 3. [Also available at https://doi.org/10.1061/(asce)hy.1943-7900.0001249.] Mueller, D.S., 2013, extrap: Software to assist the selection of extrapolation methods for moving-boat ADCP streamflow measurements: Computers & Geosciences, v. 54, p. 211–218. [Also available at https://doi.org/10.1016/j.cageo.2013.02.001.] Teledyne RD Instruments, Inc., 2016, WinRiver II Software User’s Guide, P/N 957-6231-00, San Diego, CA, 310 p.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract This article presents the characterization of the mathematical and pedagogical knowledge regarding the measure estimation concept possessed by the Chilean primary school teachers based on the way they propose to utilize activities aimed to work on this concept. In the analysis, we use a definition of the measurement estimation concept constructed from the previous work of different authors. The methodology is of a descriptive and interpretative nature. The results shows weaknesses in the teachers knowledge about the measurement estimate and its use in the classroom, noting that some teachers confuse the measurement estimation activities indistinctly with the measurement activities or interpreting that they are those activities in which a random response can be given without justification, evidencing the need to include measurement estimation in teacher training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview This dataset provides the measurements of raw water storage levels in reservoirs crucial for public water supply, The reservoirs included in this dataset are natural bodies of water that have been dammed to store untreated water. Key Definitions Aggregation The process of summarizing or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes. Capacity The maximum volume of water a reservoir can hold above the natural level of the surrounding land, with thresholds for regulation at 10,000 cubic meters in England, Wales and Northern Ireland and a modified threshold of 25,000 cubic meters in Scotland pending full implementation of the Reservoirs (Scotland) Act 2011. Current Level The present volume of water held in a reservoir measured above a set baseline crucial for safety and regulatory compliance. Current Percentage The current water volume in a reservoir as a percentage of its total capacity, indicating how full the reservoir is at any given time. Dataset Structured and organized collection of related elements, often stored digitally, used for analysis and interpretation in various fields. Granularity Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours ID Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance. Open Data Triage The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data. Reservoir Large natural lake used for storing raw water intended for human consumption. Its volume is measurable, allowing for careful management and monitoring to meet demand for clean, safe water. Reservoir Type The classification of a reservoir based on the method of construction, the purpose it serves or the source of water it stores. Schema Structure for organizing and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute. Units Standard measurements used to quantify and compare different physical quantities. Data History Data Origin Reservoir level data is sourced from water companies who may also update this information on their website and government publications such as the Water situation reports provided by the UK government. Data Triage Considerations Identification of Critical Infrastructure Special attention is given to safeguard data on essential reservoirs in line with the National Infrastructure Act, to mitigate security risks and ensure resilience of public water systems. Currently, it is agreed that only reservoirs with a location already available in the public domain are included in this dataset. Commercial Risks and Anonymisation The risk of personal information exposure is minimal to none since the data concerns reservoir levels, which are not linked to individuals or households. Data Freshness It is not currently possible to make the dataset live. Some companies have digital monitoring, and some are measuring reservoir levels analogically. This dataset may not be used to determine reservoir level in place of visual checks where these are advised. Data Triage Review Frequency Annually unless otherwise requested Data Specifications Data specifications define what is included and excluded in the dataset to maintain clarity and focus. For this dataset: Each dataset covers measurements taken by the publisher. This dataset is published periodically in line with the publisher’s capabilities Historical datasets may be provided for comparison but are not required The location data provided may be a point from anywhere within the body of water or on its boundary. Reservoirs included in the dataset must be: Open bodies of water used to store raw/untreated water Filled naturally Measurable Contain water that may go on to be used for public supply Context This dataset must not be used to determine the implementation of low supply or high supply measures such as hose pipe bans being put in place or removed. Please await guidance from your water supplier regarding any changes required to your usage of water. Particularly high or low reservoir levels may be considered normal or as expected given the season or recent weather. This dataset does not remove the requirement for visual checks on reservoir level that are in place for caving/pot holing safety. Some water companies calculate the capacity of reservoirs differently than others. The capacity can mean the useable volume of the reservoir or the overall volume that can be held in the reservoir including water below the water table. Data Publish Frequency Annually
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this paper, we generalize the notion of measurement error on deterministic sample datasets to accommodate sample data that are random-variable-valued. This leads to the formulation of two distinct kinds of measurement error: intrinsic measurement error, and incidental measurement error. Incidental measurement error will be recognized as the traditional kind that arises from a set of deterministic sample measurements, and upon which the traditional measurement error modelling literature is based, while intrinsic measurement error reflects some subjective quality of either the measurement tool or the measurand itself. We define calibrating conditions that generalize common and classical types of measurement error models to this broader measurement domain, and explain how the notion of generalized Berkson error in particular mathematicizes what it means to be an expert assessor or rater for a measurement process. We then explore how classical point estimation, inference, and likelihood theory can be generalized to accommodate sample data composed of generic random-variable-valued measurements.
Three streamflow measurements are used to demonstrate the use of equations developed in Mueller (in review). All three measurements are from various locations on the Mississippi River. These data were not collected for the purpose of this paper but provide practical examples of the effect of heading errors. The use of data from the Mississippi River allows the collection of 500 or more ensembles in each transect which reduce the overall effect of random errors that could complicate the identification of effects due to heading errors. In addition, by using wide cross sections, the effect of GPS errors due to vegetation near the boundaries of the river are minimized. All measurements were collected with WinRiver II (Teledyne RD Instruments, 2016) and processed with QRev (Mueller, 2016). These three data sets represent three different situations: 1) availability of heading data from a GPS compass (Mississippi River near Hickman, KY), 2) transects intentionally collected at different speeds (Mississippi River near Vicksburg, MS), and 3) GPS data collected where there is minimal influence from a moving bed (Mississippi River near Clinton, IA). All data were collected using Teledyne RD Instruments Rio Grande ADCPs. All data were collected with Teledyne RD Instrument WinRiver II (Teledyne RD Instruments, 2016) and processed with QRev version 3.43 (Mueller, 2016). Due to the complexity of an ADCP data file and the various algorithms applied to compute the streamflow from ADCP data, these data are most useful in either 1) their original raw data format which can be opened and processed in either WinRiver II or QRev or 2) their processed format that can be opened and processed by QRev or opened by Matlab or any software that can read Matlab formatted files. Both WinRiver II and QRev are distributed free.
WinRiver II can be obtained from: http://www.teledynemarine.com/rdi/support#
QRev can be obtained from: https://hydroacoustics.usgs.gov/movingboat/QRev.shtml
Each measurement consists of:
1) *.mmt file is an xml configuration file used by WinRiver II for setup, specific measurement data entry, and filenames of the raw transect data files (pd0)
2) *.pd0 files are the raw binary data collected by WinRiver II. The format for these files is defined in Teledyne RD Instruments (2016).
3) *.txt files contain raw ASCII data from external sensors such as GPS receivers. These data are not used by WinRiver II or QRev but provide the raw external data strings sent by the GPS receiver.
4) *.mat files are the saved data processed by QRev. These files can be opened and processed by QRev or loaded into Matlab or software that can read Matlab formatted files. The variable definitions are documented in Mueller (2016).
5) *.xml are summaries of the data processed by QRev. The variable definitions are documented in Mueller (2016).
This dataset contains data collected within limestone cedar glades at Stones River National Battlefield (STRI) near Murfreesboro, Tennessee. This dataset contains measurements of volumetric soil water content at certain quadrat locations (points) within 12 selected cedar glades. All soil water content measurements in this file were obtained using a portable time-_domain reflectometry (TDR) unit (FieldScout TDR 300 Soil Moisture Meter; Spectrum Technologies, Inc., Plainfield, IL, USA) that was fitted with the shortest probe length, 3.8 centimeters (cm). Six measurements were obtained per quadrat, and the values present in the fields of this dataset represent the means of these six measurements. In some cases, equipment malfunction caused some missing data values; in these cases the values present in the fields represent means of at least three measurements per point. Missing measurements (either because fewer than 3 measurements were taken for a given point, or because a point was not sampled on a given day) are represented by the value -99999. This file contains observations only for those points that were candidates for TDR measurement, defined by having no exposed bedrock and at least 2 of 4 depth-to-bedrock measurements greater than 4 cm. Please note that this dataset can be used in conjunction with two associated datasets: STRI_glades_soil_Water_Content_OD.shp, which contains soil water content measurements obtained by gravimetric (oven-drying) analysis for points with insufficient soil depth for TDR measurement, and STRI_glades_soil_water_content_ALL.shp, which contains soil-water-content measurements obtained by both TDR and oven-drying methods wherein TDR measurements have been transformed to oven-drying-method equivalents using a calibration curve. For details on this calibration procedure to compare soil water content measurements obtained by the two methods, see the metadata section for the file STRI_glades_soil_water_content_ALL.shp.Detailed descriptions of experimental design, field data collection procedures, laboratory procedures, and data analysis are presented in Cartwright (2014).References:Cartwright, J. (2014). Soil ecology of a rock outcrop ecosystem: abiotic stresses, soil respiration, and microbial community profiles in limestone cedar glades. Ph.D. dissertation, Tennessee State University.Cofer, M., Walck, J., and Hidayati, S. (2008). Species richness and exotic species invasion in Middle Tennessee cedar glades in relation to abiotic and biotic factors. The Journal of the Torrey Botanical Society, 135(4), 540–553.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AbstractObjectives: To assess feasibility, acceptability, and clinical sensibility of a novel survey, the Advance Care Planning (ACP) Engagement Survey in various health care settings. Setting: A target sample of 50 patients from each of primary care, hospital, cancer care, and dialysis care settings. Participants: A convenience sample of patients without cognitive impairment who could speak and read English was recruited. Patients 50 years and older were eligible in primary care; patients 80 and older or 55 years and older with clinical markers of advanced chronic disease were recruited in hospital; patients aged 19 and older were recruited in cancer and renal dialysis centres. Outcomes: We assessed feasibility, acceptability and clinical sensibility of the ACP Engagement Survey using a 6-point scale. The ACP Engagement Survey measures ACP processes (knowledge, contemplation, self-efficacy, readiness) on 5-point Likert scales and actions (yes/no). Results: 196 patients (38 to 96 years old, 50.5% women) participated. Mean (±standard deviation) time to administer was 48.8 ±19.6 minutes. Mean acceptability scores ranged from 3.2±1.3 in hospital to 4.7±0.9 in primary care and mean relevance ranged from 3.5±1.0 in hospital to 4.9±0.9 in dialysis centres (p values <0.001 for both). The mean process score was 3.1±0.6 and the mean action score was 11.2±5.6 (of a possible 25). Conclusions: The ACP Engagement Survey demonstrated feasibility and acceptability in out-patient settings, but was less feasible and acceptable among hospitalized patients due to length. A shorter version may improve feasibility. Engagement in ACP was low to moderate. Usage notesREADMEThe Readme file includes a list of files in this data package, and a description of the variables that were removed from the dataset to protect participant identity. Please see the "Data dictionary" for a description of the variables that were included in the dataset, and the "Summary table of indirect identifier data" for a summary of values reported at removed variables.Data Dictionary - Canadian ACP engagement sample BMJ OpenThis file describes the variables that were included in the dataset, and their allowable values.Canadian ACP engagement sample BMJ Open_data dictionary.xlsxCanadian ACP engagement survey pilotThis file contains the responses of 196 patients in acute care, primary care, cancer care and renal care to a 108-item ACP engagement survey. Process Measures (knowledge, contemplation, self-efficacy, and readiness, 5-point Likert scales) and Action Measures (yes/no whether an ACP behavior was completed) are included.Canadian ACP engagement sample_BMJ Open_indirect identifiers removed.xlsxSummary table of indirect identifier data - Canadian ACP engagement_BMJ OpenThis file contains descriptive analysis summary tables of indirect identifiers that were removed from the dataset.Canadian ACP engagement_BMJ Open_summary table of indirect identifier data.docx
The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.
The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.
The survey covered the following regions: Western, Central, Greater Accra, Volta, Eastern, Ashanti, Brong Ahafo, Northern, Upper East and Upper West.
- Areas are classified as urban based on each country's official definition.
The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.
The target population for the Ghana STEP survey comprises all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban areas of the country at the time of data collection. This includes all residents except foreign diplomats and non-nationals working for international organizations. Exclusions : Military barracks were excluded from the Ghana target population.
Sample survey data [ssd]
The Ghana sample design is a four-stage sample design. There was no explicit stratification but the sample was implicitly stratified by Region. [Note: Implicit stratification was achieved by sorting the PSUs (i.e., EACode) by RegnCode and selecting a systematic sample of PSUs.]
First Stage Sample The primary sample unit (PSU) was a Census Enumeration Area (EA). Each PSU was uniquely defined by the sample frame variables Regncode, and EAcode. The sample frame was sorted by RegnCode to implicitly stratify the sample frame PSUs by region. The sampling objective was to select 250 PSUs, comprised of 200 Initial PSUs and 50 Reserve PSUs. Although 250 PSUs were selected, only 201 PSUs were activated. The PSUs were selected using a systematic probability proportional to size (PPS) sampling method, where the measure of size was the population size (i.e., EAPopn) in a PSU.
Second Stage Sample The second stage sample unit is a PSU partition. It was considered necessary to partition 'large' PSUs into smaller areas to facilitate the listing process. After the partitioning of the PSUs, the survey firm randomly selected one partition. The selected partition was fully listed for subsequent enumeration in accordance with the field procedures.
Third Stage Sample The third stage sample unit (SSU) is a household. The sampling objective was to obtain interviews at 15 households within each selected PSU. The households were selected in each PSU using a systematic random method.
Fourth Stage Sample The fourth stage sample unit was an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.
Sample Size The Ghana firm's sampling objective was to obtain interviews from 3000 individuals in the urban areas of the country. In order to provide sufficient sample to allow for a worst case scenario of a 50% response rate the number of sampled cases was doubled in each selected PSU. Although 50 extra PSUs were selected for use in case it was impossible to conduct any interviews in one or more initially selected PSUs only one reserve PSU was activated. Therefore, the Ghana firm conducted the STEP data collection in a total of 201 PSUs.
Sampling methodologies are described for each country in two documents: (i) The National Survey Design Planning Report (NSDPR) (ii) The weighting documentation
Face-to-face [f2f]
The STEP survey instruments include: (i) a Background Questionnaire developed by the WB STEP team (ii) a Reading Literacy Assessment developed by Educational Testing Services (ETS).
All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator. The WB STEP team and ETS collaborated closely with the survey firms during the process and reviewed the adaptation and translation (using a back translation). In the case of Ghana, no translation was necessary, but the adaptation process ensured that the English used in the Background Questionnaire and Reading Literacy Assessment closely reflected local use.
STEP Data Management Process 1. Raw data is sent by the survey firm 2. The WB STEP team runs data checks on the Background Questionnaire data. - ETS runs data checks on the Reading Literacy Assessment data. - Comments and questions are sent back to the survey firm. 3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm. 5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6. ETS scales the Reading Literacy Assessment data. 7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.
Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.
An overall response rate of 83.2% was achieved in the Ghana STEP Survey. Table 20 of the weighting documentation provides the detailed percentage distribution by final status code.
A weighting documentation was prepared for each participating country and provides some information on sampling errors. The weighting documentation is provided as an external resource.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
An unambiguous algorithm, added to the study of the applicability domain and appropriate measures of the goodness of fit and robustness, represent the key characteristics that should be ideally fulfilled for a QSAR model to be considered for regulatory purposes. In this paper, we propose a new algorithm (RINH) based on the rivality index for the construction of QSAR classification models. This index is capable of predicting the activity of the data set molecules by means of a measurement of the rivality between their nearest neighbors belonging to different classes, contributing with a robust measurement of the reliability of the predictions. In order to demonstrate the goodness of the proposed algorithm we have selected four independent and orthogonally different benchmark data sets (balanced/unbalanced and high/low modelable) and we have compared the results with those obtained using 12 different machine learning algorithms. These results have been validated using 20 data sets of different balancing and sizes, corroborating that the proposed algorithm is able to generate highly accurate classification models and contribute with valuable measurements of the reliability of the predictions and the applicability domain of the built models.
The study included four separate surveys:
The survey of Family Income Support (MOP in Serbian) recipients in 2002 These two datasets are published together separately from the 2003 datasets.
The LSMS survey of general population of Serbia in 2003 (panel survey)
The survey of Roma from Roma settlements in 2003 These two datasets are published together.
Objectives
LSMS represents multi-topical study of household living standard and is based on international experience in designing and conducting this type of research. The basic survey was carried out in 2002 on a representative sample of households in Serbia (without Kosovo and Metohija). Its goal was to establish a poverty profile according to the comprehensive data on welfare of households and to identify vulnerable groups. Also its aim was to assess the targeting of safety net programs by collecting detailed information from individuals on participation in specific government social programs. This study was used as the basic document in developing Poverty Reduction Strategy (PRS) in Serbia which was adopted by the Government of the Republic of Serbia in October 2003.
The survey was repeated in 2003 on a panel sample (the households which participated in 2002 survey were re-interviewed).
Analysis of the take-up and profile of the population in 2003 was the first step towards formulating the system of monitoring in the Poverty Reduction Strategy (PRS). The survey was conducted in accordance with the same methodological principles used in 2002 survey, with necessary changes referring only to the content of certain modules and the reduction in sample size. The aim of the repeated survey was to obtain panel data to enable monitoring of the change in the living standard within a period of one year, thus indicating whether there had been a decrease or increase in poverty in Serbia in the course of 2003. [Note: Panel data are the data obtained on the sample of households which participated in the both surveys. These data made possible tracking of living standard of the same persons in the period of one year.]
Along with these two comprehensive surveys, conducted on national and regional representative samples which were to give a picture of the general population, there were also two surveys with particular emphasis on vulnerable groups. In 2002, it was the survey of living standard of Family Income Support recipients with an aim to validate this state supported program of social welfare. In 2003 the survey of Roma from Roma settlements was conducted. Since all present experiences indicated that this was one of the most vulnerable groups on the territory of Serbia and Montenegro, but with no ample research of poverty of Roma population made, the aim of the survey was to compare poverty of this group with poverty of basic population and to establish which categories of Roma population were at the greatest risk of poverty in 2003. However, it is necessary to stress that the LSMS of the Roma population comprised potentially most imperilled Roma, while the Roma integrated in the main population were not included in this study.
The surveys were conducted on the whole territory of Serbia (without Kosovo and Metohija).
Sample survey data [ssd]
Sample frame for both surveys of general population (LSMS) in 2002 and 2003 consisted of all permanent residents of Serbia, without the population of Kosovo and Metohija, according to definition of permanently resident population contained in UN Recommendations for Population Censuses, which were applied in 2002 Census of Population in the Republic of Serbia. Therefore, permanent residents were all persons living in the territory Serbia longer than one year, with the exception of diplomatic and consular staff.
The sample frame for the survey of Family Income Support recipients included all current recipients of this program on the territory of Serbia based on the official list of recipients given by Ministry of Social affairs.
The definition of the Roma population from Roma settlements was faced with obstacles since precise data on the total number of Roma population in Serbia are not available. According to the last population Census from 2002 there were 108,000 Roma citizens, but the data from the Census are thought to significantly underestimate the total number of the Roma population. However, since no other more precise data were available, this number was taken as the basis for estimate on Roma population from Roma settlements. According to the 2002 Census, settlements with at least 7% of the total population who declared itself as belonging to Roma nationality were selected. A total of 83% or 90,000 self-declared Roma lived in the settlements that were defined in this way and this number was taken as the sample frame for Roma from Roma settlements.
Planned sample: In 2002 the planned size of the sample of general population included 6.500 households. The sample was both nationally and regionally representative (representative on each individual stratum). In 2003 the planned panel sample size was 3.000 households. In order to preserve the representative quality of the sample, we kept every other census block unit of the large sample realized in 2002. This way we kept the identical allocation by strata. In selected census block unit, the same households were interviewed as in the basic survey in 2002. The planned sample of Family Income Support recipients in 2002 and Roma from Roma settlements in 2003 was 500 households for each group.
Sample type: In both national surveys the implemented sample was a two-stage stratified sample. Units of the first stage were enumeration districts, and units of the second stage were the households. In the basic 2002 survey, enumeration districts were selected with probability proportional to number of households, so that the enumeration districts with bigger number of households have a higher probability of selection. In the repeated survey in 2003, first-stage units (census block units) were selected from the basic sample obtained in 2002 by including only even numbered census block units. In practice this meant that every second census block unit from the previous survey was included in the sample. In each selected enumeration district the same households interviewed in the previous round were included and interviewed. On finishing the survey in 2003 the cases were merged both on the level of households and members.
Stratification: Municipalities are stratified into the following six territorial strata: Vojvodina, Belgrade, Western Serbia, Central Serbia (Šumadija and Pomoravlje), Eastern Serbia and South-east Serbia. Primary units of selection are further stratified into enumeration districts which belong to urban type of settlements and enumeration districts which belong to rural type of settlement.
The sample of Family Income Support recipients represented the cases chosen randomly from the official list of recipients provided by Ministry of Social Affairs. The sample of Roma from Roma settlements was, as in the national survey, a two-staged stratified sample, but the units in the first stage were settlements where Roma population was represented in the percentage over 7%, and the units of the second stage were Roma households. Settlements are stratified in three territorial strata: Vojvodina, Beograd and Central Serbia.
Face-to-face [f2f]
In all surveys the same questionnaire with minimal changes was used. It included different modules, topically separate areas which had an aim of perceiving the living standard of households from different angles. Topic areas were the following: 1. Roster with demography. 2. Housing conditions and durables module with information on the age of durables owned by a household with a special block focused on collecting information on energy billing, payments, and usage. 3. Diary of food expenditures (weekly), including home production, gifts and transfers in kind. 4. Questionnaire of main expenditure-based recall periods sufficient to enable construction of annual consumption at the household level, including home production, gifts and transfers in kind. 5. Agricultural production for all households which cultivate 10+ acres of land or who breed cattle. 6. Participation and social transfers module with detailed breakdown by programs 7. Labour Market module in line with a simplified version of the Labour Force Survey (LFS), with special additional questions to capture various informal sector activities, and providing information on earnings 8. Health with a focus on utilization of services and expenditures (including informal payments) 9. Education module, which incorporated pre-school, compulsory primary education, secondary education and university education. 10. Special income block, focusing on sources of income not covered in other parts (with a focus on remittances).
During field work, interviewers kept a precise diary of interviews, recording both successful and unsuccessful visits. Particular attention was paid to reasons why some households were not interviewed. Separate marks were given for households which were not interviewed due to refusal and for cases when a given household could not be found on the territory of the chosen census block.
In 2002 a total of 7,491 households were contacted. Of this number a total of 6,386 households in 621 census rounds were interviewed. Interviewers did not manage to collect the data for 1,106 or 14.8% of selected households. Out of this number 634 households
The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.
The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.
Areas are classified as urban based on each country's official definition.Some STEP surveys had narrower urban sampling. In Yunnan Province the sample covered the urban areas of Kunming. - Detailed information is provided in the weighting documentation.
The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.
The STEP target population is the urban population aged 15 to 64 included, living in urban areas, as defined by each country's statistical office. The target population for the China-Yunnan STEP survey comprised all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban areas of Kunming at the time of data collection.
The following are excluded from the sample: - Residents of institutions (prisons, hospitals, etc) - Residents of senior homes and hospices - Residents of other group dwellings such as college dormitories, halfway homes, workers' quarters, etc - Persons living outside the country at the time of data collection In some countries, extremely remote villages or conflict-ridden regions could not be surveyed. These cases are listed in the weighting documentation.
Sample survey data [ssd]
The China-Yunnan survey firm implemented a partial literacy assessment design. The partial assessment required each selected person to attempt to complete a General Booklet comprising Reading Components and a set of Core Literacy Items. The partial assessment sampling objective was to have a minimum of about 2000 selected persons attempt the General Booklet. The target population for the China-Yunnan STEP survey comprised all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban areas of Kunming at the time of data collection. The sample frame for the selection of first stage sample units was the Excel file 'sampling frame for STEP _CHINA' that was provided by the China-Yunnan survey firm. The frame is a complete list of first stage sampling units in the urban areas of Kunming. The source of this sample frame is the National Population Census, November, 2010. The sample frame includes 5564 PSUs in 299 Census Enumeration Areas. According to the sample frame, there are 1,067,256 households in the 5564 PSUs.
The China-Yunnan sample design was a 3 stage cluster sample design.
First Stage Sample The primary sample unit (PSU) is a Census Enumeration Area (CEA) Block. The sampling objective was to conduct interviews in 135 CEA Blocks. At the first stage of sample selection, 27 additional PSUs were also selected as reserve PSUs to be used in the event that it was impossible to obtain any interviews in one or more of the initial PSUs. A total of 162 PSUs were selected with probability proportional to size, where the measure of size was the number of households in a PSU. Subsequently, from the file of 162 sampled PSUs, a PPS sample of 135 PSUs was selected to be the 'Initial' PSU sample. Note that none of the 27 reserve PSUs was activated during data collection.
Second Stage Sample The second stage sample unit (SSU) is a household. The sampling objective was to obtain interviews at 15 households within each selected PSU. At the second stage of sample selection, 30 households were selected in each PSU using a systematic random method. The 30 households were randomly divided into 15 'Initial' households, and 15 'Reserve' households that were ranked according to the random sample selection order.
Third Stage Sample The third stage sample unit was an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.
Face-to-face [f2f]
The STEP survey instruments include: - The background Questionnaire developed by the WB STEP team - Reading Literacy Assessment developed by Educational Testing Services (ETS).
All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator.
The WB STEP team and ETS collaborated closely with the Chinese survey firm during the process and reviewed the adaptation and translation to Mandarin using a back translation.
The survey instruments were both piloted as part of the survey pretest.
The adapted Background Questionnaires are provided in English as external resources. The Reading Literacy Assessment is protected by copyright and will not be published.
STEP Data Management Process: 1) Raw data is sent by the survey firm 2) The WB STEP team runs data checks on the Background Questionnaire data. - ETS runs data checks on the Reading Literacy Assessment data. - Comments and questions are sent back to the survey firm. 3) The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4) The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm. 5) Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6) ETS scales the Reading Literacy Assessment data. 7) The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.
Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.
The response rate for Yunnan Province (urban) was 98% (See STEP Methodology Note Table 4)
A weighting documentation was prepared for each participating country and provides some information on sampling errors. All country weighting documentations are provided as an external resource.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Measurements were done in collaboration with the National Marine Information and Research Centre (NatMIRC), Swakopmund, Namibia. Dataset update 2018-04-11: measurement of 2016-12-01 to 2017-11-24 were added.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the following data and source code and results.
https://edmond.mpg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.17617/3.2Chttps://edmond.mpg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.17617/3.2C
This collection contains data obtained with an unmanned aircraft equipped with an analyser measuring carbon dioxide dry air mole fraction, ambient temperature, humidity and pressure. Measurement flights were carried out as part of the ScaleX 2016 campaign in southern Germany (47.833 °N, 11.060 °E, WGS84) in July 2016. Evan Flatt, Richard H. Grant (both at Purdue University, West Lafayette, IN, USA), Martin Kunz and Jost V. Lavric (both at Max Planck Institute for Biogeochemistry, Jena, Germany) have contributed to these measurements. Furthermore, this collection contains output of the STILT (Stochastic Time-Inverted Lagrangian Transport) model, which was run on the basis of meteorological data from the ECMWF IFS (European Centre for Medium-Range Weather Forecasts Integrated Forecast System). Christoph Gerbig (Max Planck Institute for Biogeochemistry, Jena, Germany) and Frank-Thomas Koch (Deutscher Wetterdienst, Meteorologisches Observatorium Hohenpeissenberg, Germany) contributed this data. Finally, this collection also contains all necessary scripts to obtain estimates of surface flux from the aforementioned data by means of a nocturnal boundary layer budget approach. Author information is included in the script files. Instructions on how to run these scripts are give in the file "readme.txt".
In 1992, Bosnia-Herzegovina, one of the six republics in former Yugoslavia, became an independent nation. A civil war started soon thereafter, lasting until 1995 and causing widespread destruction and losses of lives. Following the Dayton accord, BosniaHerzegovina (BiH) emerged as an independent state comprised of two entities, namely, the Federation of Bosnia-Herzegovina (FBiH) and the Republika Srpska (RS), and the district of Brcko. In addition to the destruction caused to the physical infrastructure, there was considerable social disruption and decline in living standards for a large section of the population. Alongside these events, a period of economic transition to a market economy was occurring. The distributive impacts of this transition, both positive and negative, are unknown. In short, while it is clear that welfare levels have changed, there is very little information on poverty and social indicators on which to base policies and programs. In the post-war process of rebuilding the economic and social base of the country, the government has faced the problems created by having little relevant data at the household level. The three statistical organizations in the country (State Agency for Statistics for BiH -BHAS, the RS Institute of Statistics-RSIS, and the FBiH Institute of Statistics-FIS) have been active in working to improve the data available to policy makers: both at the macro and the household level. One facet of their activities is to design and implement a series of household series. The first of these surveys is the Living Standards Measurement Study survey (LSMS). Later surveys will include the Household Budget Survey (an Income and Expenditure Survey) and a Labour Force Survey. A subset of the LSMS households will be re-interviewed in the two years following the LSMS to create a panel data set.
The three statistical organizations began work on the design of the Living Standards Measurement Study Survey (LSMS) in 1999. The purpose of the survey was to collect data needed for assessing the living standards of the population and for providing the key indicators needed for social and economic policy formulation. The survey was to provide data at the country and the entity level and to allow valid comparisons between entities to be made. The LSMS survey was carried out in the Fall of 2001 by the three statistical organizations with financial and technical support from the Department for International Development of the British Government (DfID), United Nations Development Program (UNDP), the Japanese Government, and the World Bank (WB). The creation of a Master Sample for the survey was supported by the Swedish Government through SIDA, the European Commission, the Department for International Development of the British Government and the World Bank. The overall management of the project was carried out by the Steering Board, comprised of the Directors of the RS and FBiH Statistical Institutes, the Management Board of the State Agency for Statistics and representatives from DfID, UNDP and the WB. The day-to-day project activities were carried out by the Survey Management Team, made up of two professionals from each of the three statistical organizations. The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows: 1. To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs. 2. To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labour) at a given time, as well as within a household. 3. To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analysed data.
National coverage
Households
Sample survey data [ssd]
(a) SAMPLE SIZE A total sample of 5,400 households was determined to be adequate for the needs of the survey: with 2,400 in the Republika Srpska and 3,000 in the Federation of BiH. The difficulty was in selecting a probability sample that would be representative of the country's population. The sample design for any survey depends upon the availability of information on the universe of households and individuals in the country. Usually this comes from a census or administrative records. In the case of BiH the most recent census was done in 1991. The data from this census were rendered obsolete due to both the simple passage of time but, more importantly, due to the massive population displacements that occurred during the war. At the initial stages of this project it was decided that a master sample should be constructed. Experts from Statistics Sweden developed the plan for the master sample and provided the procedures for its construction. From this master sample, the households for the LSMS were selected. Master Sample [This section is based on Peter Lynn's note "LSMS Sample Design and Weighting - Summary". April, 2002. Essex University, commissioned by DfID.] The master sample is based on a selection of municipalities and a full enumeration of the selected municipalities. Optimally, one would prefer smaller units (geographic or administrative) than municipalities. However, while it was considered that the population estimates of municipalities were reasonably accurate, this was not the case for smaller geographic or administrative areas. To avoid the error involved in sampling smaller areas with very uncertain population estimates, municipalities were used as the base unit for the master sample. The Statistics Sweden team proposed two options based on this same method, with the only difference being in the number of municipalities included and enumerated.
(b) SAMPLE DESIGN For reasons of funding, the smaller option proposed by the team was used, or Option B. Stratification of Municipalities The first step in creating the Master Sample was to group the 146 municipalities in the country into three strata- Urban, Rural and Mixed - within each of the two entities. Urban municipalities are those where 65 percent or more of the households are considered to be urban, and rural municipalities are those where the proportion of urban households is below 35 percent. The remaining municipalities were classified as Mixed (Urban and Rural) Municipalities. Brcko was excluded from the sampling frame. Urban, Rural and Mixed Municipalities: It is worth noting that the urban-rural definitions used in BiH are unusual with such large administrative units as municipalities classified as if they were completely homogeneous. Their classification into urban, rural, mixed comes from the 1991 Census which used the predominant type of income of households in the municipality to define the municipality. This definition is imperfect in two ways. First, the distribution of income sources may have changed dramatically from the pre-war times: populations have shifted, large industries have closed, and much agricultural land remains unusable due to the presence of land mines. Second, the definition is not comparable to other countries' where villages, towns and cities are classified by population size into rural or urban or by types of services and infrastructure available. Clearly, the types of communities within a municipality vary substantially in terms of both population and infrastructure. However, these imperfections are not detrimental to the sample design (the urban/rural definition may not be very useful for analysis purposes, but that is a separate issue).
Face-to-face [f2f]
(a) DATA ENTRY
An integrated approach to data entry and fieldwork was adopted in Bosnia and Herzegovina. Data entry proceeded side by side with data gathering to ensure verification and correction in the field. Data entry stations were located in the regional offices of the entity institutes and were equipped with computers, modem and a dedicated telephone line. The completed questionnaires were delivered to these stations each day for data entry. Twenty data entry operators (10 from Federation and 10 from RS) were trained in two training sessions held for a week each in Sarajevo and Banja Luka. The trainers were the staff of the two entity institutes who had undergone training in the CSPro software earlier and had participated in the workshops of the Pilot survey. Prior to the training, laptop computers were provided to the entity institutes, and the CSPro software was installed in them. The training for the data entry operators covered the following elements:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We combine multivariate ratio analysis (MRA) of body measurements and analyses of mitochondrial and nuclear data to examine the status of several species of European paper wasps (Polistes Latreille, 1802) closely related to P. gallicus. Our analyses unambiguously reveal the presence of a cryptic species in Europe, as two distinct species can be recognized in what has hitherto been considered Polistes bischoffi Weyrauch, 1937. One species is almost as light coloured as P. gallicus, and is mainly recorded from Southern Europe and Western Asia. The other species is darker and has a more northern distribution in Central Europe. Both species occur syntopically in Switzerland. Given that the lost lectotype of P. bischoffi originated from Sardinia, we selected a female of the southern species as a neotype. The northern species is described as P. helveticus sp. n. here. We also provide a redescription of P. bischoffi rev. stat. and an identification key including three more closely related species, P. biglumis, P. gallicus and P. hellenicus.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”
A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org
Please cite this when using the dataset.
Detailed description of the dataset:
1 Film Dataset: Festival Programs
The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.
The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.
The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.
The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.
2 Survey Dataset
The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.
The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.
The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.
The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.
3 IMDb & Scripts
The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.
The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.
The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.
The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.
The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.
The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.
The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.
The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.
The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.
The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.
The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.
The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.
The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.
The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.
The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.
The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.
The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.
The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.
The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.
4 Festival Library Dataset
The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.
The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
CERN-LHC. The WZ production cross section in proton-proton collisions at sqrt(s) = 13 TeV is measured with the CMS experiment at the LHC using a data sample corresponding to an integrated luminosity of 2.3 inverse fb. The measurement is performed in the leptonic decay modes WZ to l nu l' l', where l, l' = e, mu. The measured cross section for the range 60 < m[l'l'] < 120 GeV is 39.9 +/- 3.2 (stat) +2.9/-3.1 (syst) +/- 0.4 (theo) +/- 1.3 (lumi) pb, consistent with the standard model prediction.
Fiducial region definition - leading Z lepton pt > 20 GeV - subleading Z lepton pt > 10 GeV - W lepton pt > 20 GeV - lepton |eta| < 2.5 - Z invariant mass between 60 and 120 GeV - any same-flavor opposite-sign lepton pair invariant mass > 4 GeV
Total cross section region definition - Z invariant mass between 60 and 120 GeV - any same-flavor opposite-sign lepton pair invariant mass > 4 GeV
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cronbach’s alpha (coefficient α) is the conventional statistic communication scholars use to estimate the reliability of multi-item measurement instruments. For many, if not most communication measures, α should not be calculated for reliability estimation. Instead, coefficient omega (ω) should be reported as it aligns with the definition of reliability itself. In this primer, we review α and ω, and explain why ω should be the new ‘gold standard’ in reliability estimation. Using Mplus, we demonstrate how ω is calculated on an available data set and show how preliminary scales can be revised with ‘ω if item deleted.’ We also list several easy-to-use resources to calculate ω in other software programs. Communication researchers should routinely report ω instead of α.