Data collected to assess water quality conditions in the natural creeks, aquifers and lakes in the Austin area. This is raw data, provided directly from our Water Resources Monitoring database (WRM) and should be considered provisional. Data may or may not have been reviewed by project staff. A map of site locations can be found by searching for LOCATION.WRM_SAMPLE_SITES; you may then use those WRM_SITE_IDs to filter in this dataset using the field SAMPLE_SITE_NO.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Alabama Real-time Coastal Observing System (ARCOS) with support of the Dauphin Island Sea Lab is a network of continuously sampling observing stations that collect observations of meteorological and hydrographic data from fixed stations operating across coastal Alabama. Data were collected from 2003 through the present and include parameters such as air temperature, relative humidity, solar and quantum radiation, barometric pressure, wind speed, wind direction, precipitation amounts, water temperature, salinity, dissolved oxygen, water height, and other water quality data. Stations, when possible, are designed to collect the same data in the same way, though there are exceptions given unique location needs (see individual accession abstracts for details). Stations are strategically placed to sample across salinity gradients, from delta to offshore, and the width of the coast.
This asset includes Superfund site-specific sampling information including location of samples, types of samples, and analytical chemistry characteristics of samples. Information is associated with a particular contaminated sate as there is no national database of this information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data and code for the paper "How Twitter Data Sampling Biases U.S. Voter Behavior Characterizations."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.
Within the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.
The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: -
· Prevalence of computers and access to the Internet. · Study the penetration and purpose of Technology use.
Palestine (West Bank and Gaza Strip) , type of locality (Urban, Rural, Refugee Camps) and governorate
Household. Person 10 years and over .
All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.
Sample survey data [ssd]
Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.
Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.
Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:
Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.
Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).
-
Face-to-face [f2f]
The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.
Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.
Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.
Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.
Data Entry: The data entry process started on 8 May 2014 and ended on 23 June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.
Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.
Response Rates= 79%
There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.
Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:
Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.
Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.
Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.
Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.
The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
Whole country
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 3950 households. At the time of the survey, the population of concern was estimated at around 500000 individuals.
Other [oth]
Questionaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity,well-being,COVID-19 Impact.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tool support in software engineering often depends on relationships, regularities, patterns, or rules, mined from sampled code. Examples are approaches to bug prediction, code recommendation, and code autocompletion. Samples are relevant to scale the analysis of data. Many such samples consist of software projects taken from GitHub; however, the specifics of sampling might influence the generalization of the patterns.
In this paper, we focus on how to sample software projects that are clients of libraries and frameworks, when mining for interlibrary usage patterns. We notice that when limiting the sample to a very specific library, inter-library patterns in the form of implications from one library to another may not generalize well. Using a simulation and a real case study, we analyze different sampling methods. Most importantly, our simulation shows that only when sampling for the disjunction of both libraries involved in the implication, the implication generalizes well. Second, we show that real empirical data sampled from GitHub does not behave as we would expect it from our simulation. This identifies a potential problem with the usage of such API for studying inter-library usage patterns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sampling intervals highlighted in bold numbers indicate the approximate vertical extent of the oxygen minimum zone (O2≤45 µmol kg−1). D = Discovery cruise, MSM = Maria S. Merian cruises, UTC = universal time code, O2 min = lowest oxygen concentration at the respective station, O2 min depth = depth of the oxygen minimum at the respective station, SST = sea surface temperature, n.d. = no data, * = stations analysed for copepod abundance.
This data release links fish survey data from a suite of programs in the Chesapeake Bay watershed to the NHDPlus High Resolution Region 02 networks, hereafter referred to as NHDPlusHR. The data set contains site name, survey program, coordinates of sample, ancillary information such as sample date and site location information where available, and HR Permanent Identifier. It also includes a confidence classification category for each of NHD assignment based on a set of pre-determined rules. In total there were 15 confidence categories ranging from high confidence to low confidence. We caution the use of sampling points which were given anything other than "high" confidence in their assignment to a given NHD catchment/flowline to avoid spurious/inappropriate attribution of geospatial data to fish data samples represented herein and refer the potential user to the Confidence Dictionary.csv which describes the criteria for each confidence category.
Dataset Card for "sampling-distill-train-data-kgw-k0-gamma0.25-delta1"
Training data for sampling-based watermark distillation using the KGW k=0,γ=0.25,δ=1k=0, \gamma=0.25, \delta=1k=0,γ=0.25,δ=1 watermarking strategy in the paper On the Learnability of Watermarks for Language Models. Llama 2 7B with decoding-based watermarking was used to generate 640,000 watermarked samples, each 256 tokens long. Each sample is prompted with 50-token prefixes from OpenWebText (prompts not… See the full description on the dataset page: https://huggingface.co/datasets/cygu/sampling-distill-train-data-kgw-k0-gamma0.25-delta1.
https://www.bgs.ac.uk/information-hub/licensing/https://www.bgs.ac.uk/information-hub/licensing/
This layer provides geochemical analysis associated with offshore sampling activities. It contains analysis of 38 elements and should be used as a baseline for chemical element concentrations in seabed sediments, against which samples collected in the future may be assessed. Related data in Offshore Sample Data - Activity & Scan collection.
Information about sampling locations for data from the California Environmental Data Exchange Network (CEDEN). This set of station/project combinations can be combined with other data sets from CEDEN to provide more information. CEDEN is the California State Water Board's data system for surface water quality in California, and seeks to include all available statewide data (such as that produced by research and volunteer organizations). Data in CEDEN include field, sediment and water column data collected from freshwater, estuarine, and marine environments. Examples of data in CEDEN come from laboratory, physical and biological analyses and include data types associated with chemical, toxicological, field, bioassessment, invertebrate, fish, and bacteriological assay assessments.
This archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Coral. The data include parameters of corals and sclerosponges with a geographic _location of New Caledonia, Melanesia. The time period coverage is from 8 to -1 in calendar years before present (BP). See metadata information for parameter and study _location details. Please cite this study when using the data.
The free energy landscapes of several fundamental processes are characterized by high barriers separating long-lived metastable states. In order to explore these type of landscapes enhanced sampling methods are used. While many such methods are able to obtain sufficient sampling in order to draw the free energy, the transition states are often sparsely sampled. We propose an approach based on the Variationally Enhanced Sampling Method to enhance sampling in the transition region. To this effect, we introduce a dynamic target distribution which uses the derivative of the instantaneous free energy surface to locate the transition regions on the fly and modulate the probability of sampling different regions. Finally, we exemplify the effectiveness of this approach in enriching the number of configurations in the transition state region in the cases of a chemical reaction and of a nucleation process.
The data was collected using the High Frequency Survey (HFS), the new regional data collection tool & methodology launched in the Americas. The survey allowed for better reaching populations of interest with new remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest’s demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
National coverage
Household
All people of concern.
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance. The total sample size was 183 refugee households.
Other [oth]
The questionnaire contained the following sections: journey, family composition, vulnerability, basic Needs, coping capacity, well-being, COVID-19 Impact.
Establishment specific sampling results for Raw Beef sampling projects. Current data is updated quarterly; archive data is updated annually. Data is split by FY. See the FSIS website for additional information.
The purpose of this analysis is the development of an efficient sampling protocol for plastic waste streams.
A model waste from different polymers was formulated, rich in ABS and containing PS, PP and PE in smaller proportions. Additionally, one bromine containing flame retardant is added to a final concentration of either 500ppm or 50ppm. Different sampling approaches were followed including extrusion and/or cryogenic grinding as a homogenization step. Each approach was assessed via various analytical techniques as to homogenization efficiency.
This dataset contains raw MFR data of the model waste from the different sampling approaches. The content is:
·One Excel file containing MFR data of the model waste, wherein the approach was based on extrusion and measurement protocol
·One Excel file containing MFR data of the model waste, wherein the approach was based on cryogenic grinding and measurement protocol
·One Word file containing further information about the methodology and nomenclature
This dataset was generated in the framework of PRecycling Horizon Europe project (101058670)
The documented dataset covers Enterprise Survey (ES) panel data collected in Paraguay in 2006, 2010 and 2017, as part of Latin America and the Caribbean Enterprise Surveys rollout, an initiative of the World Bank. The objective of the study is to obtain feedback from enterprises in client countries on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through face-to-face interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.
Enterprise Surveys target a sample consisting of longitudinal (panel) observations and new cross-sectional data. Panel firms are prioritized in the sample selection, comprising up to 50% of the sample. For all panel firms, regardless of the sample, current eligibility or operating status is determined and included in panel datasets.
Paraguay ES 2010 was conducted in June 2010 and April 2011, Paraguay ES 2006 was carried out in March and October 2006. Stratified random sampling was used to select the surveyed businesses. Data was collected using face-to-face interviews.
Data from 1,338 establishments was analyzed: 460 businesses were from 2006 only, 153 - from 2010 only, 246 - from 2017 only, 110 firms were from 2010 and 2017, 180 - from 2006 and 2010, 186 firms were from 2006, 2010 and 2017.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs and labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90 percent of the questions objectively measure characteristics of a country’s business environment. The remaining questions assess the survey respondents’ opinions on what are the obstacles to firm growth and performance.
National
The primary sampling unit of the study is an establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
Three levels of stratification were used in this country: industry, establishment size, and region.
Industry stratification was designed as follows: the universe was stratified into Manufacturing industries (ISIC Rev. 3.1 codes 15- 37), Retail industries (ISIC code 52) and Other Services (ISIC codes 45, 50, 51, 55, 60-64, and 72).
Size stratification was defined as follows: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
In 2010, two sample frames were used. The first was supplied by the World Bank and consists of enterprises interviewed in Paraguay 2006. The World Bank required that attempts should be made to re-interview establishments responding to the Paraguay 2006 survey where they were within the selected geographical locations and met eligibility criteria. That sample is referred to as the Panel.
The two sample frames were then used for the selection of a sample with the aim of obtaining interviews with 360 establishments with five or more employees.
Face-to-face [f2f]
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect "Refusal to respond" (-8) as a different option from "Don't know" (-9). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
GOCAL was an internationally coordinated long-term sampling program in the Gulf of California, designed to examine the temporal and spatial variability in the biogeochemical properties of the region. This program includes Drs. S. Alvarez Borrego, R. Lara Lara, G. Gaxiola and H. Maske from the Centro de Investigacion Cientifica y de Educacion Secondaria de Ensenada (CICESE), Ensenada, Mexico, Dr. E. Valdez from the University of Sonora, Hermosillo, Mexico, Drs. J Mueller and C. Trees of San Diego State University, San Diego, CA, Dr. Ron Zaneveld, Dr. Scott Pegau, and Andrew Barnard, Oregon State University, Corvallis, OR. Components of this program were funded by the NASA SIMBIOS initiative
Data collected to assess water quality conditions in the natural creeks, aquifers and lakes in the Austin area. This is raw data, provided directly from our Water Resources Monitoring database (WRM) and should be considered provisional. Data may or may not have been reviewed by project staff. A map of site locations can be found by searching for LOCATION.WRM_SAMPLE_SITES; you may then use those WRM_SITE_IDs to filter in this dataset using the field SAMPLE_SITE_NO.