Facebook
TwitterBiological sampling data is information that comes from biological samples of fish harvested in Virginia for aging purposes to aid in coastal stock assessments
Facebook
TwitterEstablishment specific sampling results for Raw Beef sampling projects. Current data is updated quarterly; archive data is updated annually. Data is split by FY. See the FSIS website for additional information.
Facebook
TwitterData collected to assess water quality conditions in the natural creeks, aquifers and lakes in the Austin area. This is raw data, provided directly from our Water Resources Monitoring database (WRM) and should be considered provisional. Data may or may not have been reviewed by project staff. A map of site locations can be found by searching for LOCATION.WRM_SAMPLE_SITES; you may then use those WRM_SITE_IDs to filter in this dataset using the field SAMPLE_SITE_NO.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Despite the wide application of longitudinal studies, they are often plagued by missing data and attrition. The majority of methodological approaches focus on participant retention or modern missing data analysis procedures. This paper, however, takes a new approach by examining how researchers may supplement the sample with additional participants. First, refreshment samples use the same selection criteria as the initial study. Second, replacement samples identify auxiliary variables that may help explain patterns of missingness and select new participants based on those characteristics. A simulation study compares these two strategies for a linear growth model with five measurement occasions. Overall, the results suggest that refreshment samples lead to less relative bias, greater relative efficiency, and more acceptable coverage rates than replacement samples or not supplementing the missing participants in any way. Refreshment samples also have high statistical power. The comparative strengths of the refreshment approach are further illustrated through a real data example. These findings have implications for assessing change over time when researching at-risk samples with high levels of permanent attrition.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
GIRT-Data is the first and largest dataset of issue report templates (IRTs) in both YAML and Markdown format. This dataset and its corresponding open-source crawler tool are intended to support research in this area and to encourage more developers to use IRTs in their repositories. The stable version of the dataset, containing 1_084_300 repositories, that 50_032 of them support IRTs.
For more details see the GitHub page of the dataset: https://github.com/kargaranamir/girt-data
The dataset is accepted for MSR 2023 conference, under the title of "GIRT-Data: Sampling GitHub Issue Report Templates" Search in Google Scholar.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tool support in software engineering often depends on relationships, regularities, patterns, or rules, mined from sampled code. Examples are approaches to bug prediction, code recommendation, and code autocompletion. Samples are relevant to scale the analysis of data. Many such samples consist of software projects taken from GitHub; however, the specifics of sampling might influence the generalization of the patterns.
In this paper, we focus on how to sample software projects that are clients of libraries and frameworks, when mining for interlibrary usage patterns. We notice that when limiting the sample to a very specific library, inter-library patterns in the form of implications from one library to another may not generalize well. Using a simulation and a real case study, we analyze different sampling methods. Most importantly, our simulation shows that only when sampling for the disjunction of both libraries involved in the implication, the implication generalizes well. Second, we show that real empirical data sampled from GitHub does not behave as we would expect it from our simulation. This identifies a potential problem with the usage of such API for studying inter-library usage patterns.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Online imbalanced learning is an emerging topic that combines the challenges of class imbalance and concept drift. However, current works account for issues of class imbalance and concept drift. And only few works have considered these issues simultaneously. To this end, this paper proposes an entropy-based dynamic ensemble classification algorithm (EDAC) to consider data streams with class imbalance and concept drift simultaneously. First, to address the problem of imbalanced learning in training data chunks arriving at different times, EDAC adopts an entropy-based balanced strategy. It divides the data chunks into multiple balanced sample pairs based on the differences in the information entropy between classes in the sample data chunk. Additionally, we propose a density-based sampling method to improve the accuracy of classifying minority class samples into high quality samples and common samples via the density of similar samples. In this manner high quality and common samples are randomly selected for training the classifier. Finally, to solve the issue of concept drift, EDAC designs and implements an ensemble classifier that uses a self-feedback strategy to determine the initial weight of the classifier by adjusting the weight of the sub-classifier according to the performance on the arrived data chunks. The experimental results demonstrate that EDAC outperforms five state-of-the-art algorithms considering four synthetic and one real-world data streams.
Facebook
TwitterMultiple sampling campaigns were conducted near Boulder, Colorado, to quantify constituent concentrations and loads in Boulder Creek and its tributary, South Boulder Creek. Diel sampling was initiated at approximately 1100 hours on September 17, 2019, and continued until approximately 2300 hours on September 18, 2019. During this time period, samples were collected at two locations on Boulder Creek approximately every 3.5 hours to quantify the diel variability of constituent concentrations at low flow. Synoptic sampling campaigns on South Boulder Creek and Boulder Creek were conducted October 15-18, 2019, to develop spatial profiles of concentration, streamflow, and load. Numerous main stem and inflow locations were sampled during each synoptic campaign using the simple grab technique (17 main stem and 2 inflow locations on South Boulder Creek; 34 main stem and 17 inflow locations on Boulder Creek). Streamflow at each main stem location was measured using acoustic doppler velocimetry. Bulk samples from all sampling campaigns were processed within one hour of sample collection. Processing steps included measurement of pH and specific conductance, and filtration using 0.45-micron filters. Laboratory analyses were subsequently conducted to determine dissolved and total recoverable constituent concentrations. Filtered samples were analyzed for a suite of dissolved anions using ion chromatography. Filtered, acidified samples and unfiltered acidified samples were analyzed by inductively coupled plasma-mass spectrometry and inductively coupled plasma-optical emission spectroscopy to determine dissolved and total recoverable cation concentrations, respectively. This data release includes three data tables, three photographs, and a kmz file showing the sampling locations. Additional information on the data table contents, including the presentation of data below the analytical detection limits, is provided in a Data Dictionary.
Facebook
TwitterIncrease in the number of new chemicals synthesized in past decades has resulted in constant growth in the development and application of computational models for prediction of activity as well as safety profiles of the chemicals. Most of the time, such computational models and its application must deal with imbalanced chemical data. It is indeed a challenge to construct a classifier using imbalanced data set. In this study, we analyzed and validated the importance of different sampling methods over non-sampling method, to achieve a well-balanced sensitivity and specificity of a machine learning model trained on imbalanced chemical data. Additionally, this study has achieved an accuracy of 93.00%, an AUC of 0.94, F1 measure of 0.90, sensitivity of 96.00% and specificity of 91.00% using SMOTE sampling and Random Forest classifier for the prediction of Drug Induced Liver Injury (DILI). Our results suggest that, irrespective of data set used, sampling methods can have major influence on reducing the gap between sensitivity and specificity of a model. This study demonstrates the efficacy of different sampling methods for class imbalanced problem using binary chemical data sets.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Alabama Real-time Coastal Observing System (ARCOS) with support of the Dauphin Island Sea Lab is a network of continuously sampling observing stations that collect observations of meteorological and hydrographic data from fixed stations operating across coastal Alabama. Data were collected from 2003 through the present and include parameters such as air temperature, relative humidity, solar and quantum radiation, barometric pressure, wind speed, wind direction, precipitation amounts, water temperature, salinity, dissolved oxygen, water height, and other water quality data. Stations, when possible, are designed to collect the same data in the same way, though there are exceptions given unique location needs (see individual accession abstracts for details). Stations are strategically placed to sample across salinity gradients, from delta to offshore, and the width of the coast.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information on samples submitted for RNAseq
Rows are individual samples
Columns are: ID Sample Name Date sampled Species Sex Tissue Geographic location Date extracted Extracted by Nanodrop Conc. (ng/µl) 260/280 260/230 RIN Plate ID Position Index name Index Seq Qubit BR kit Conc. (ng/ul) BioAnalyzer Conc. (ng/ul) BioAnalyzer bp (region 200-1200) Submission reference Date submitted Conc. (nM) Volume provided PE/SE Number of reads Read length
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sampling intervals highlighted in bold numbers indicate the approximate vertical extent of the oxygen minimum zone (O2≤45 µmol kg−1). D = Discovery cruise, MSM = Maria S. Merian cruises, UTC = universal time code, O2 min = lowest oxygen concentration at the respective station, O2 min depth = depth of the oxygen minimum at the respective station, SST = sea surface temperature, n.d. = no data, * = stations analysed for copepod abundance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample names, sampling descriptions and contextual data.
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Sample Data technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets description of five large-scale public benchmark datasets.
Facebook
TwitterSurvey research in the Global South has traditionally required large budgets and lengthy fieldwork. The expansion of digital connectivity presents an opportunity for researchers to engage global subject pools and study settings where in-person contact is challenging. This paper evaluates Facebook advertisements as a tool to recruit diverse survey samples in the Global South. Using Facebook's advertising platform we quota-sample respondents in Mexico, Kenya, and Indonesia and assess how well these samples perform on a range of survey indicators, identify sources of bias, replicate a canonical experiment, and highlight trade-offs for researchers to consider. This method can quickly and cheaply recruit respondents, but these samples tend to be more educated than corresponding national populations. Weighting ameliorates sample imbalances. This method generates comparable data to a commercial online sample for a fraction of the cost. Our analysis demonstrates the potential of Facebook advertisements to cost-effectively conduct research in diverse settings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of samples for each sampling interval.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data products are the sampling results from FSIS’ National Antimicrobial Resistance Monitoring System (NARMS) Cecal sampling program. Data for sampling results from NARMS Product sampling program is currently posted on the FSIS Website and are grouped by commodity (https://www.fsis.usda.gov/science-data/data-sets-visualizations/laboratory-sampling-data). The antimicrobials and bacteria tested under NARMS are selected are based on their importance to human health and use in food-producing animals (FDA Guidance for Industry # 152 (https://www.fda.gov/media/69949/download)). Cecal contents from cattle, swine, chicken, and turkeys were sampled as part of FSIS’s routine NARMS cecal sampling program for major species.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two different problems, i.e. a low-dimensional (LD) and a high-dimensional (HD) problems are considered. The LD problem has 2 variables for a 4-ply symmetric square composite laminate. Similarly, the HD problem consists of 16 variables for a 32-ply symmetric square composite laminate. The value of h for LD and HD problems is taken as 0.005 and 0.04 respectively.
For each problem, three different types of sampling technique, i.e. random sampling (RS), Latin hypercube sampling (LHS) [1] and Hammersley sampling (HS) [2] are adopted. The RS, LHS and HS primarily differ in the uniformity of sample points over the design space such that RS has the least and HS has the maximum uniform distributions of sample points. Based on the recommendations of Jin et al. [3], and Zhao and Xue [4], 72 and 612 sample points are considered in each training dataset of LD and HD problems respectively.
Based on the FE formulation, several high-fidelity datasets for the LD and HD problems are generated, as presented in the Supplementary Material file “Predictive modelling of laminated composite plates.xlsx” in nine sheets that are organized as detailed out in Table 1.
References:
McKay, M. D.; Beckman, R. J.; Conover, W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 2000, 42, 55-61.
Hammersley, J. M. Monte Carlo methods for solving multivariable problems. Annals of the New York Academy of Sciences, 1960, 86, 844-874.
Jin, R.; Chen, W.; Simpson, T. W. Comparative studies of metamodelling techniques under multiple modelling criteria. Structural and Multidisciplinary Optimization, 2001, 23, 1-13.
Zhao, D.; Xue, D. A comparative study of metamodeling methods considering sample quality merits. Structural and Multidisciplinary Optimization, 2010, 42, 923-938.
Facebook
TwitterIf the Substance Abuse and Mental Health Services Administration (SAMHSA) is to move NSDUH to a hybrid ABS/field-enumerated frame, several questions will need to be answered, procedures will need to be developed and tested, and costs and benefits will need to be weighed. This report outlines what is known to date, how it may be applied to NSDUH, and what additional considerations need to be addressed.
Facebook
TwitterBiological sampling data is information that comes from biological samples of fish harvested in Virginia for aging purposes to aid in coastal stock assessments