Facebook
TwitterBiological sampling data is information that comes from biological samples of fish harvested in Virginia for aging purposes to aid in coastal stock assessments
Facebook
TwitterEstablishment specific sampling results for Raw Beef sampling projects. Current data is updated quarterly; archive data is updated annually. Data is split by FY. See the FSIS website for additional information.
Facebook
TwitterA data set of cross-nationally comparable microdata samples for 15 Economic Commission for Europe (ECE) countries (Bulgaria, Canada, Czech Republic, Estonia, Finland, Hungary, Italy, Latvia, Lithuania, Romania, Russia, Switzerland, Turkey, UK, USA) based on the 1990 national population and housing censuses in countries of Europe and North America to study the social and economic conditions of older persons. These samples have been designed to allow research on a wide range of issues related to aging, as well as on other social phenomena. A common set of nomenclatures and classifications, derived on the basis of a study of census data comparability in Europe and North America, was adopted as a standard for recoding. This series was formerly called Dynamics of Population Aging in ECE Countries. The recommendations regarding the design and size of the samples drawn from the 1990 round of censuses envisaged: (1) drawing individual-based samples of about one million persons; (2) progressive oversampling with age in order to ensure sufficient representation of various categories of older people; and (3) retaining information on all persons co-residing in the sampled individual''''s dwelling unit. Estonia, Latvia and Lithuania provided the entire population over age 50, while Finland sampled it with progressive over-sampling. Canada, Italy, Russia, Turkey, UK, and the US provided samples that had not been drawn specially for this project, and cover the entire population without over-sampling. Given its wide user base, the US 1990 PUMS was not recoded. Instead, PAU offers mapping modules, which recode the PUMS variables into the project''''s classifications, nomenclatures, and coding schemes. Because of the high sampling density, these data cover various small groups of older people; contain as much geographic detail as possible under each country''''s confidentiality requirements; include more extensive information on housing conditions than many other data sources; and provide information for a number of countries whose data were not accessible until recently. Data Availability: Eight of the fifteen participating countries have signed the standard data release agreement making their data available through NACDA/ICPSR (see links below). Hungary and Switzerland require a clearance to be obtained from their national statistical offices for the use of microdata, however the documents signed between the PAU and these countries include clauses stipulating that, in general, all scholars interested in social research will be granted access. Russia requested that certain provisions for archiving the microdata samples be removed from its data release arrangement. The PAU has an agreement with several British scholars to facilitate access to the 1991 UK data through collaborative arrangements. Statistics Canada and the Italian Institute of statistics (ISTAT) provide access to data from Canada and Italy, respectively. * Dates of Study: 1989-1992 * Study Features: International, Minority Oversamples * Sample Size: Approx. 1 million/country Links: * Bulgaria (1992), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/02200 * Czech Republic (1991), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06857 * Estonia (1989), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06780 * Finland (1990), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06797 * Romania (1992), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06900 * Latvia (1989), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/02572 * Lithuania (1989), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/03952 * Turkey (1990), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/03292 * U.S. (1990), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06219
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
GIRT-Data is the first and largest dataset of issue report templates (IRTs) in both YAML and Markdown format. This dataset and its corresponding open-source crawler tool are intended to support research in this area and to encourage more developers to use IRTs in their repositories. The stable version of the dataset, containing 1_084_300 repositories, that 50_032 of them support IRTs.
For more details see the GitHub page of the dataset: https://github.com/kargaranamir/girt-data
The dataset is accepted for MSR 2023 conference, under the title of "GIRT-Data: Sampling GitHub Issue Report Templates" Search in Google Scholar.
Facebook
TwitterMultiple sampling campaigns were conducted near Boulder, Colorado, to quantify constituent concentrations and loads in Boulder Creek and its tributary, South Boulder Creek. Diel sampling was initiated at approximately 1100 hours on September 17, 2019, and continued until approximately 2300 hours on September 18, 2019. During this time period, samples were collected at two locations on Boulder Creek approximately every 3.5 hours to quantify the diel variability of constituent concentrations at low flow. Synoptic sampling campaigns on South Boulder Creek and Boulder Creek were conducted October 15-18, 2019, to develop spatial profiles of concentration, streamflow, and load. Numerous main stem and inflow locations were sampled during each synoptic campaign using the simple grab technique (17 main stem and 2 inflow locations on South Boulder Creek; 34 main stem and 17 inflow locations on Boulder Creek). Streamflow at each main stem location was measured using acoustic doppler velocimetry. Bulk samples from all sampling campaigns were processed within one hour of sample collection. Processing steps included measurement of pH and specific conductance, and filtration using 0.45-micron filters. Laboratory analyses were subsequently conducted to determine dissolved and total recoverable constituent concentrations. Filtered samples were analyzed for a suite of dissolved anions using ion chromatography. Filtered, acidified samples and unfiltered acidified samples were analyzed by inductively coupled plasma-mass spectrometry and inductively coupled plasma-optical emission spectroscopy to determine dissolved and total recoverable cation concentrations, respectively. This data release includes three data tables, three photographs, and a kmz file showing the sampling locations. Additional information on the data table contents, including the presentation of data below the analytical detection limits, is provided in a Data Dictionary.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Alabama Real-time Coastal Observing System (ARCOS) with support of the Dauphin Island Sea Lab is a network of continuously sampling observing stations that collect observations of meteorological and hydrographic data from fixed stations operating across coastal Alabama. Data were collected from 2003 through the present and include parameters such as air temperature, relative humidity, solar and quantum radiation, barometric pressure, wind speed, wind direction, precipitation amounts, water temperature, salinity, dissolved oxygen, water height, and other water quality data. Stations, when possible, are designed to collect the same data in the same way, though there are exceptions given unique location needs (see individual accession abstracts for details). Stations are strategically placed to sample across salinity gradients, from delta to offshore, and the width of the coast.
Facebook
TwitterThis archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Coral. The data include parameters of corals and sclerosponges with a geographic _location of New Caledonia, Melanesia. The time period coverage is from 8 to -1 in calendar years before present (BP). See metadata information for parameter and study _location details. Please cite this study when using the data.
Facebook
TwitterData collected to assess water quality conditions in the natural creeks, aquifers and lakes in the Austin area. This is raw data, provided directly from our Water Resources Monitoring database (WRM) and should be considered provisional. Data may or may not have been reviewed by project staff. A map of site locations can be found by searching for LOCATION.WRM_SAMPLE_SITES; you may then use those WRM_SITE_IDs to filter in this dataset using the field SAMPLE_SITE_NO.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tool support in software engineering often depends on relationships, regularities, patterns, or rules, mined from sampled code. Examples are approaches to bug prediction, code recommendation, and code autocompletion. Samples are relevant to scale the analysis of data. Many such samples consist of software projects taken from GitHub; however, the specifics of sampling might influence the generalization of the patterns.
In this paper, we focus on how to sample software projects that are clients of libraries and frameworks, when mining for interlibrary usage patterns. We notice that when limiting the sample to a very specific library, inter-library patterns in the form of implications from one library to another may not generalize well. Using a simulation and a real case study, we analyze different sampling methods. Most importantly, our simulation shows that only when sampling for the disjunction of both libraries involved in the implication, the implication generalizes well. Second, we show that real empirical data sampled from GitHub does not behave as we would expect it from our simulation. This identifies a potential problem with the usage of such API for studying inter-library usage patterns.
Facebook
TwitterIncrease in the number of new chemicals synthesized in past decades has resulted in constant growth in the development and application of computational models for prediction of activity as well as safety profiles of the chemicals. Most of the time, such computational models and its application must deal with imbalanced chemical data. It is indeed a challenge to construct a classifier using imbalanced data set. In this study, we analyzed and validated the importance of different sampling methods over non-sampling method, to achieve a well-balanced sensitivity and specificity of a machine learning model trained on imbalanced chemical data. Additionally, this study has achieved an accuracy of 93.00%, an AUC of 0.94, F1 measure of 0.90, sensitivity of 96.00% and specificity of 91.00% using SMOTE sampling and Random Forest classifier for the prediction of Drug Induced Liver Injury (DILI). Our results suggest that, irrespective of data set used, sampling methods can have major influence on reducing the gap between sensitivity and specificity of a model. This study demonstrates the efficacy of different sampling methods for class imbalanced problem using binary chemical data sets.
Facebook
TwitterA building's plumbing system and water service line (pipes) can be made up of different types of materials. Each type of material can affect drinking water differently, so it is useful to conduct what is known as "sequential sampling". Sequential sampling is where all water usage in a building is stopped for several hours, known as "stagnation". Next, water is collected from the faucet in a series of bottles. This is done without wasting any water or running the water before filling the bottles. The first few bottles represent water that was in contact with the faucet or building plumbing during stagnation. The later bottles represent water that was in contact with the water service line. These sample results can help decide whether treatment is working.
Learn more at Michigan.gov/FlintWater
Facebook
TwitterThis was a series of four cruises with the aim of collecting replicate samples of specific biotopes. 2004 Mud biotope (Irish Sea and Celtic Deep) - Endeavour 09/2004 2005 Sand biotope (Bristol Channel) - Corystes 01/2005 2006 Gravel biotope and sand banks (Southern North Sea) - Endeavour 10a/2006 2007 Shell gravel (Western English Channel) - Endeavour 06/2007 Raw data, cruise reports and photos taken from these cruises can be found as links. Titles provided for papers written on the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Online imbalanced learning is an emerging topic that combines the challenges of class imbalance and concept drift. However, current works account for issues of class imbalance and concept drift. And only few works have considered these issues simultaneously. To this end, this paper proposes an entropy-based dynamic ensemble classification algorithm (EDAC) to consider data streams with class imbalance and concept drift simultaneously. First, to address the problem of imbalanced learning in training data chunks arriving at different times, EDAC adopts an entropy-based balanced strategy. It divides the data chunks into multiple balanced sample pairs based on the differences in the information entropy between classes in the sample data chunk. Additionally, we propose a density-based sampling method to improve the accuracy of classifying minority class samples into high quality samples and common samples via the density of similar samples. In this manner high quality and common samples are randomly selected for training the classifier. Finally, to solve the issue of concept drift, EDAC designs and implements an ensemble classifier that uses a self-feedback strategy to determine the initial weight of the classifier by adjusting the weight of the sub-classifier according to the performance on the arrived data chunks. The experimental results demonstrate that EDAC outperforms five state-of-the-art algorithms considering four synthetic and one real-world data streams.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of samples for each sampling interval.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data products are the sampling results from FSIS’ National Antimicrobial Resistance Monitoring System (NARMS) Cecal sampling program. Data for sampling results from NARMS Product sampling program is currently posted on the FSIS Website and are grouped by commodity (https://www.fsis.usda.gov/science-data/data-sets-visualizations/laboratory-sampling-data). The antimicrobials and bacteria tested under NARMS are selected are based on their importance to human health and use in food-producing animals (FDA Guidance for Industry # 152 (https://www.fda.gov/media/69949/download)). Cecal contents from cattle, swine, chicken, and turkeys were sampled as part of FSIS’s routine NARMS cecal sampling program for major species.
Facebook
TwitterIf the Substance Abuse and Mental Health Services Administration (SAMHSA) is to move NSDUH to a hybrid ABS/field-enumerated frame, several questions will need to be answered, procedures will need to be developed and tested, and costs and benefits will need to be weighed. This report outlines what is known to date, how it may be applied to NSDUH, and what additional considerations need to be addressed.
Facebook
TwitterDataset Card for "sampling-distill-train-data-kth-shift4"
Training data for sampling-based watermark distillation using the KTH s=4s=4s=4 watermarking strategy in the paper On the Learnability of Watermarks for Language Models. Llama 2 7Bwith decoding-based watermarking was used to generate 640,000 watermarked samples, each 256 tokens long. Each sample is prompted with 50-token prefixes from OpenWebText (prompts not included in the samples).
Facebook
TwitterSurvey research in the Global South has traditionally required large budgets and lengthy fieldwork. The expansion of digital connectivity presents an opportunity for researchers to engage global subject pools and study settings where in-person contact is challenging. This paper evaluates Facebook advertisements as a tool to recruit diverse survey samples in the Global South. Using Facebook's advertising platform we quota-sample respondents in Mexico, Kenya, and Indonesia and assess how well these samples perform on a range of survey indicators, identify sources of bias, replicate a canonical experiment, and highlight trade-offs for researchers to consider. This method can quickly and cheaply recruit respondents, but these samples tend to be more educated than corresponding national populations. Weighting ameliorates sample imbalances. This method generates comparable data to a commercial online sample for a fraction of the cost. Our analysis demonstrates the potential of Facebook advertisements to cost-effectively conduct research in diverse settings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
6261 Global import shipment records of Sampling Pump with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
NEON collects data that characterize a suite of terrestrial plants, animals, pathogens and microbes at terrestrial field sites across the continent. The project’s organismal sampling design captures the long-term dynamics of abundance, diversity, pathogen prevalence, phenology and productivity.NEON integrates terrestrial organismal sampling with tower sensor measurements, soil sensor measurements and sampling, and airborne remote sensing data, to support ecosystem level characterization of processes and conditions, such as carbon cycling, biodiversity and ecosystem productivity. Where logistically possible, NEON colocates aquatic sites with terrestrial sites to support understanding of linkages across atmospheric, terrestrial and aquatic ecosystems.
Facebook
TwitterBiological sampling data is information that comes from biological samples of fish harvested in Virginia for aging purposes to aid in coastal stock assessments