100+ datasets found

f
Data from: Real data example.
plos.figshare.com
xlsx
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia Wang; Lili Tian; Li Yan (2024). Real data example. [Dataset]. http://doi.org/10.1371/journal.pone.0314705.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0314705.s001
Dataset updated
Dec 13, 2024
Dataset provided by
PLOS ONE
Authors
Jia Wang; Lili Tian; Li Yan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In genomic study, log transformation is a common prepossessing step to adjust for skewness in data. This standard approach often assumes that log-transformed data is normally distributed, and two sample t-test (or its modifications) is used for detecting differences between two experimental conditions. However, recently it was shown that two sample t-test can lead to exaggerated false positives, and the Wilcoxon-Mann-Whitney (WMW) test was proposed as an alternative for studies with larger sample sizes. In addition, studies have demonstrated that the specific distribution used in modeling genomic data has profound impact on the interpretation and validity of results. The aim of this paper is three-fold: 1) to present the Exp-gamma distribution (exponential-gamma distribution stands for log-transformed gamma distribution) as a proper biological and statistical model for the analysis of log-transformed protein abundance data from single-cell experiments; 2) to demonstrate the inappropriateness of two sample t-test and the WMW test in analyzing log-transformed protein abundance data; 3) to propose and evaluate statistical inference methods for hypothesis testing and confidence interval estimation when comparing two independent samples under the Exp-gamma distributions. The proposed methods are applied to analyze protein abundance data from a single-cell dataset.
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Aug 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Investment Bank (EIB) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
European Bank for Reconstruction and Developmenthttp://ebrd.com/
World Bank Grouphttp://www.worldbank.org/
World Bankhttp://worldbank.org/
European Investment Bank (EIB)
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
f
Data from: Mean and Variance Corrected Test Statistics for Structural...
tandf.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubin Tian; Ke-Hai Yuan (2023). Mean and Variance Corrected Test Statistics for Structural Equation Modeling with Many Variables [Dataset]. http://doi.org/10.6084/m9.figshare.10012976.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.10012976.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Yubin Tian; Ke-Hai Yuan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data in social and behavioral sciences are routinely collected using questionnaires, and each domain of interest is tapped by multiple indicators. Structural equation modeling (SEM) is one of the most widely used methods to analyze such data. However, conventional methods for SEM face difficulty when the number of variables (p) is large even when the sample size (N) is also rather large. This article addresses the issue of model inference with the likelihood ratio statistic Tml. Using the method of empirical modeling, mean-and-variance corrected statistics for SEM with many variables are developed. Results show that the new statistics not only perform much better than Tml but also are substantial improvements over other corrections to Tml. When combined with a robust transformation, the new statistics also perform well with non-normally distributed data.
Dummy upload for arXiv:1506.09008 supplementary data
zenodo.org
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T. E. Ouldridge; T. E. Ouldridge (2020). Dummy upload for arXiv:1506.09008 supplementary data [Dataset]. http://doi.org/10.5281/zenodo.1346252
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1346252
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
T. E. Ouldridge; T. E. Ouldridge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
###Preamble###

This upload contains initialization files and data for simulations reported in:
https://arxiv.org/abs/1506.09008: Coarse-grained modelling of strong DNA bending II: Cyclization

The initialization files allow a user to repeat the reported simulations using the oxDNA model. oxDNA is available for download from:
https://dna.physics.ox.ac.uk/index.php/Main_Page.
The use and meaning of the input and output files are documented extensively on this wiki.

This is only a practice upload, and so only a small part of the total material is included.

###Organisation###

Simulations are organised by system type. Folder DXXCYY corresponds to simulation of a cyclization system with Nd = XX and Nbp=YY, with the meaning of these symbols given in the text referenced above. "_seq" indicates simulation of a specific sequence, as listed in Table S1 of the reference.

For each system, a "closed" and an "open" folder are present. These correspond to the two windows of umbrella sampling that were performed separately.

###Content###

Within each folder are the necessary initialization files to run the simulations exactly as reported in the reference above, simply by calling oxDNA from within the folder, using "inputVMMC" as the input file.

Also included are output files for a single realisation of the simulation.

Note that the results in the reference above were all obtained from 5 independent replicas, using different initial conditions and different seeds. These can be (statistically) recreated simply by drawing random starting configurations from the single available traj_hist file.
d
Overview Metadata of Water-Quality Field Blank Data, Replicate Sample Data,...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Overview Metadata of Water-Quality Field Blank Data, Replicate Sample Data, Discharge Data, and Dissolved Solids Data [Dataset]. https://catalog.data.gov/dataset/overview-metadata-of-water-quality-field-blank-data-replicate-sample-data-discharge-data-a
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
Water quality replicate sample data and field blank data was collected at the Colorado River above Imperial Dam, Colorado River below Cooper Wasteway, Yuma Main Drain, and 242 Lateral during 2017 and 2018. Instantaneous discharge data was collected at the Cooper Wasteway, Yuma Main Drain, and 242 Lateral from January 2017 to March 2019. Instantaneous discharge readings were recorded at a fixed interval of 5 minutes. Mean daily discharge data was collected at the Colorado River above Imperial Dam, Cooper Wasteway, Yuma Main Drain, and 242 Lateral from January 2017 to March 2019. Instantaneous discharge and mean daily discharge data was provided to the USGS by the International Boundary and Water Commission (IBWC). Discrete water-quality samples were collected at the Colorado River above Imperial Dam, Colorado River below Cooper Wasteway, Yuma Main Drain, and 242 Lateral during 2017, 2018, through March 2019 and values were used to compute dissolved solids concentrations using BOR's method.
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Opinion on the meaning of the term 'fake news' in Turkey 2018
statista.com
Updated Sep 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2018). Opinion on the meaning of the term 'fake news' in Turkey 2018 [Dataset]. https://www.statista.com/statistics/913752/opinion-on-definition-of-fake-news-in-turkey/
Explore at:
Dataset updated
Sep 14, 2018
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 22, 2018 - Jul 6, 2018
Area covered
Turkey
Description
This statistic illustrates the results of a survey regarding the opinion on the meaning of the term fake news in Turkey in 2018. According to data published by IPSOS, ** percent of Turkish adults stated that they personally thought of politicians and the media using the term to discredit news they did not agree with.
1961 Census Microdata Individual File for Great Britain: 5% Sample
beta.ukdataservice.ac.uk
Updated 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UK Data Service (2023). 1961 Census Microdata Individual File for Great Britain: 5% Sample [Dataset]. http://doi.org/10.5255/ukda-sn-8272-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8272-1
Dataset updated
2023
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Area covered
United Kingdom
Description
The 1961 Census Microdata Individual File for Great Britain: 5% Sample dataset was created from existing digital records from the 1961 Census under a project known as Enhancing and Enriching Historic Census Microdata Samples (EEHCM), which was funded by the Economic and Social Research Council with input from the Office for National Statistics and National Records of Scotland. The project ran from 2012-2014 and was led from the UK Data Archive, University of Essex, in collaboration with the Cathie Marsh Institute for Social Research (CMIST) at the University of Manchester and the Census Offices. In addition to the 1961 data, the team worked on files from the 1971 Census and 1981 Census.

The original 1961 records preceded current data archival standards and were created before microdata sets for secondary use were anticipated. A process of data recovery and quality checking was necessary to maximise their utility for current researchers, though some imperfections remain (see the User Guide for details). Three other 1961 Census datasets have been created:
SN 8273 - 1961 Census Microdata Household File for Great Britain: 0.95% Sample, which links household members together to allow individuals to be understood within their household context, and is available to registered UK Data Service users based in the United Kingdom (see Access section for non-UK access restrictions);
SN 8274 - 1961 Census Microdata Teaching Dataset for Great Britain: 1% Sample: Open Access, which can be used as a taster file and is freely available for anyone to download under an Open Government Licence; and
SN 8275 - 1961 Census Microdata for Great Britain: 9% Sample: Secure Access, which comprises a larger population sample and so contains sufficient information to constitute personal data, meaning that it is only available to Accredited Researchers, under restrictive Secure Access conditions.
Opinion on the meaning of the term 'fake news' in Serbia 2018
statista.com
Updated Nov 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). Opinion on the meaning of the term 'fake news' in Serbia 2018 [Dataset]. https://www.statista.com/statistics/913764/opinion-on-definition-of-fake-news-in-serbia/
Explore at:
Dataset updated
Nov 26, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 22, 2018 - Jul 6, 2018
Area covered
Serbia
Description
This statistic illustrates the results of a survey regarding the opinion on the meaning of the term fake news in Serbia in 2018. According to data published by IPSOS, 66 percent of Serbian adults stated that they personally thought of politicians using the term to support their side of the argument.
m
Bells Test Cancellation Tools Data 311023
data.mendeley.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jamie Berry (2023). Bells Test Cancellation Tools Data 311023 [Dataset]. http://doi.org/10.17632/9rfxfhtfmd.1
Explore at:
Unique identifier
https://doi.org/10.17632/9rfxfhtfmd.1
Dataset updated
Oct 31, 2023
Authors
Jamie Berry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Whilst the outcome variables on the Bells test (e.g., total omissions) are easily quantified, process data (e.g., search strategy) have been more difficult to quantify objectively. CancellationTools is a software package that calculates cancellation task-based process variables following input of cancellation target identification order. The primary aim of the current study was to examine the psychometric properties of several CancellationTools process variables for the Bells test.

Method: The CancellationTools process variables: mean distance, standardized mean distance, standardized angle, best r, and intersections rate, were calculated for the Bells Test in a diverse neurological sample (n=101) and a healthy Australian sample (n=57). Ratings of cancellation path organization using an ordinal categorical variable (systematic, disorganized, indeterminate) was completed by two experienced clinicians. Construct validity, criterion validity, known groups validity, test-retest reliability and test operating characteristics were examined.

Results: Mean distance, standardized angle, best r and intersections rate, but not standardized mean distance, showed good construct (convergent) validity with clinician ratings. Standardized angle, best r and intersections rate showed good divergent validity when compared with outcome variables. Criterion validity was established for best r and intersections rate. The CancellationTools measures did not show known groups validity, although clinician ratings did. Good test-retest reliability was demonstrated for best r (ICC = .79) and clinician-rated search strategy (ICC = .75). Best r explained 88% of the area under the curve when classifying disorganized vs other search strategies based on clinician ratings.

Conclusion: Best r emerged as the most psychometrically robust of the CancellationTools measures.
m
Impact of limited data availability on the accuracy of project duration...
data.mendeley.com
Updated Nov 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naimeh Sadeghi (2022). Impact of limited data availability on the accuracy of project duration estimation in project networks [Dataset]. http://doi.org/10.17632/bjfdw6xbxw.3
Explore at:
Unique identifier
https://doi.org/10.17632/bjfdw6xbxw.3
Dataset updated
Nov 22, 2022
Authors
Naimeh Sadeghi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This database includes simulated data showing the accuracy of estimated probability distributions of project durations when limited data are available for the project activities. The base project networks are taken from PSPLIB. Then, various stochastic project networks are synthesized by changing the variability and skewness of project activity durations. Number of variables: 20 Number of cases/rows: 114240 Variable List: • Experiment ID: The ID of the experiment • Experiment for network: The ID of the experiment for each of the synthesized networks • Network ID: ID of the synthesized network • #Activities: Number of activities in the network, including start and finish activities • Variability: Variance of the activities in the network (this value can be either high, low, medium or rand, where rand shows a random combination of low, high and medium variance in the network activities.) • Skewness: Skewness of the activities in the network (Skewness can be either right, left, None or rand, where rand shows a random combination of right, left, and none skewed in the network activities)
• Fitted distribution type: Distribution type used to fit on sampled data • Sample size: Number of sampled data used for the experiment resembling limited data condition • Benchmark 10th percentile: 10th percentile of project duration in the benchmark stochastic project network • Benchmark 50th percentile: 50th project duration in the benchmark stochastic project network • Benchmark 90th percentile: 90th project duration in the benchmark stochastic project network • Benchmark mean: Mean project duration in the benchmark stochastic project network • Benchmark variance: Variance project duration in the benchmark stochastic project network • Experiment 10th percentile: 10th percentile of project duration distribution for the experiment • Experiment 50th percentile: 50th percentile of project duration distribution for the experiment • Experiment 90th percentile: 90th percentile of project duration distribution for the experiment • Experiment mean: Mean of project duration distribution for the experiment • Experiment variance: Variance of project duration distribution for the experiment • K-S: Kolmogorov–Smirnov test comparing benchmark distribution and project duration • distribution of the experiment • P_value: the P-value based on the distance calculated in the K-S test
d
Community Survey: 2021 Random Sample Results
catalog.data.gov
data.bloomington.in.gov
Updated May 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.bloomington.in.gov (2023). Community Survey: 2021 Random Sample Results [Dataset]. https://catalog.data.gov/dataset/community-survey-2021-random-sample-results-69942
Explore at:
Dataset updated
May 20, 2023
Dataset provided by
data.bloomington.in.gov
Description
A random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
4
Supplementary data for the paper "Why psychologists should not default to...
data.4tu.nl
zip
Updated Apr 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joost de Winter (2025). Supplementary data for the paper "Why psychologists should not default to Welch’s t-test instead of Student’s t-test (and why the Anderson–Darling test is an underused alternative)" [Dataset]. http://doi.org/10.4121/e8e6861a-7ab0-4b6d-bd67-5f95029322c5.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/e8e6861a-7ab0-4b6d-bd67-5f95029322c5.v4
Dataset updated
Apr 28, 2025
Dataset provided by
4TU.ResearchData
Authors
Joost de Winter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper evaluates the claim that Welch’s t-test (WT) should replace the independent-samples t-test (IT) as the default approach for comparing sample means. Simulations involving unequal and equal variances, skewed distributions, and different sample sizes were performed. For normal distributions, we confirm that the WT maintains the false positive rate close to the nominal level of 0.05 when sample sizes and standard deviations are unequal. However, the WT was found to yield inflated false positive rates under skewed distributions with unequal sample sizes. A complementary empirical study based on gender differences in two psychological scales corroborates these findings. Finally, we contend that the null hypothesis of unequal variances together with equal means lacks plausibility, and that empirically, a difference in means typically coincides with differences in variance and skewness. An additional analysis using the Kolmogorov-Smirnov and Anderson-Darling tests demonstrates that examining entire distributions, rather than just their means, can provide a more suitable alternative when facing unequal variances or skewed distributions. Given these results, researchers should remain cautious with software defaults, such as R favoring Welch’s test.
Opinion on the meaning of the term 'fake news' in Great Britain 2018
statista.com
Updated Nov 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). Opinion on the meaning of the term 'fake news' in Great Britain 2018 [Dataset]. https://www.statista.com/statistics/912648/opinion-on-definition-of-fake-news-in-great-britain/
Explore at:
Dataset updated
Nov 26, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 22, 2018 - Jul 6, 2018
Area covered
United Kingdom
Description
This statistic illustrates the results of a survey regarding the opinion on the meaning of the term fake news in Great Britain in 2018. According to data published by IPSOS, 42 percent of British adults stated that they personally thought of politicians and the media using the term to discredit news they did not agree with.
Supplement Sales Prediction
kaggle.com
Updated Sep 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A SURESH (2021). Supplement Sales Prediction [Dataset]. https://www.kaggle.com/sureshmecad/supplement-sales-prediction/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 17, 2021
Dataset provided by
Kaggle
Authors
A SURESH
Description
Context

Supplement Sales Prediction

Your Client WOMart is a leading nutrition and supplement retail chain that offers a comprehensive range of products for all your wellness and fitness needs.

WOMart follows a multi-channel distribution strategy with 350+ retail stores spread across 100+ cities.

Effective forecasting for store sales gives essential insight into upcoming cash flow, meaning WOMart can more accurately plan the cashflow at the store level.

Sales data for 18 months from 365 stores of WOMart is available along with information on Store Type, Location Type for each store, Region Code for every store, Discount provided by the store on every day, Number of Orders everyday etc.

Your task is to predict the store sales for each store in the test set for the next two months.

Content

Train Data |Variable |Definition | |-------------------------------|-------------------------------| |ID |Unique Identifier for a row | |Store_id |Unique id for each Store| |Store_Type |Type of the Store| |Location_Type |Type of the location where Store is located| |Region_Code |Code of the Region where Store is located| |Date |Information about the Date| |Holiday |If there is holiday on the given Date, 1 : Yes, 0 : No| |Discount |If discount is offered by store on the given Date, Yes/ No| |#Orders |Number of Orders received by the Store on the given Day| |Sales |Total Sale for the Store on the given Day|

Test Data |Variable |Definition | |-----------------------------|-------------------------| |ID |Unique Identifier for a row | |Store_id |Unique id for each Store | |Store_Type |Type of the Store | |Location_Type |Type of the location where Store is located | |Region_Code |Code of the Region where Store is located | |Date |Information about the Date | |Holiday |If there is holiday on the given Date, 1 : Yes, 0 : No | |Discount |If discount is offered by store on the given Date, Yes/ No |

Sample_Submission |Variable |Definition | |------------------------|----------------| |ID |Unique Identifier for a row | |Sales |Total Sale for the Store on the given Day |

Evaluation

The evaluation metric for this competition is MSLE * 1000 across all entries in the test set.

Public and Private Split

Test data is further divided into Public (First 20 Days) and Private (Last 41 Days). You will make the prediction for two months (61 days).

Your initial responses will be checked and scored on the Public data.

The final rankings would be based on your private score which will be published once the competition is over.

The sales column that we submit would be compared to the actual answer similar to the following. Instead of 8 items it is 22266 items(the function is avable in sklearn).

Sample Input :

actual = [27.5, 55.9, 25.8, 17.7, 27.6, 55.9, 25.7, 17.8] predicted = 24.0, 49.1, 21.0, 16.2, 23.3, 47.0, 12.1, 15.2*1000

Sample Output:

82.9949678377161

Public and Private Split

Test data is further divided into Public (First 20 Days) and Private (Last 41 Days). You will make the prediction for two months (61 days).

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
f
Data from: Linear Hypothesis Testing in Linear Models With High-Dimensional...
tandf.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Changcheng Li Runze Li (2023). Linear Hypothesis Testing in Linear Models With High-Dimensional Responses [Dataset]. http://doi.org/10.6084/m9.figshare.13756617.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13756617.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Changcheng Li Runze Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this article, we propose a new projection test for linear hypotheses on regression coefficient matrices in linear models with high-dimensional responses. We systematically study the theoretical properties of the proposed test. We first derive the optimal projection matrix for any given projection dimension to achieve the best power and provide an upper bound for the optimal dimension of projection matrix. We further provide insights into how to construct the optimal projection matrix. One- and two-sample mean problems can be formulated as special cases of linear hypotheses studied in this article. We both theoretically and empirically demonstrate that the proposed test can outperform the existing ones for one- and two-sample mean problems. We conduct Monte Carlo simulation to examine the finite sample performance and illustrate the proposed test by a real data example.
w
Population and Family Health Survey 1997 - Jordan
microdata.worldbank.org
datacatalog.ihsn.org
+2more
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Statistics (DOS) (2017). Population and Family Health Survey 1997 - Jordan [Dataset]. https://microdata.worldbank.org/index.php/catalog/1408
Explore at:
Dataset updated
Jun 26, 2017
Dataset authored and provided by
Department of Statistics (DOS)
Time period covered
1997
Area covered
Jordan
Description
Abstract

The 1997 Jordan Population and Family Health Survey (JPFHS) is a national sample survey carried out by the Department of Statistics (DOS) as part of its National Household Surveys Program (NHSP). The JPFHS was specifically aimed at providing information on fertility, family planning, and infant and child mortality. Information was also gathered on breastfeeding, on maternal and child health care and nutritional status, and on the characteristics of households and household members. The survey will provide policymakers and planners with important information for use in formulating informed programs and policies on reproductive behavior and health.

Geographic coverage

National

Analysis unit

Household

Children under five years

Women age 15-49

Men

Kind of data

Sample survey data

Sampling procedure

SAMPLE DESIGN AND IMPLEMENTATION

The 1997 JPFHS sample was designed to produce reliable estimates of major survey variables for the country as a whole, for urban and rural areas, for the three regions (each composed of a group of governorates), and for the three major governorates, Amman, Irbid, and Zarqa.

The 1997 JPFHS sample is a subsample of the master sample that was designed using the frame obtained from the 1994 Population and Housing Census. A two-stage sampling procedure was employed. First, primary sampling units (PSUs) were selected with probability proportional to the number of housing units in the PSU. A total of 300 PSUs were selected at this stage. In the second stage, in each selected PSU, occupied housing units were selected with probability inversely proportional to the number of housing units in the PSU. This design maintains a self-weighted sampling fraction within each governorate.

UPDATING OF SAMPLING FRAME

Prior to the main fieldwork, mapping operations were carried out and the sample units/blocks were selected and then identified and located in the field. The selected blocks were delineated and the outer boundaries were demarcated with special signs. During this process, the numbers on buildings and housing units were updated, listed and documented, along with the name of the owner/tenant of the unit or household and the name of the household head. These activities took place between January 7 and February 28, 1997.

Note: See detailed description of sample design in APPENDIX A of the survey report.

Mode of data collection

Face-to-face

Research instrument

The 1997 JPFHS used two questionnaires, one for the household interview and the other for eligible women. Both questionnaires were developed in English and then translated into Arabic. The household questionnaire was used to list all members of the sampled households, including usual residents as well as visitors. For each member of the household, basic demographic and social characteristics were recorded and women eligible for the individual interview were identified. The individual questionnaire was developed utilizing the experience gained from previous surveys, in particular the 1983 and 1990 Jordan Fertility and Family Health Surveys (JFFHS).

The 1997 JPFHS individual questionnaire consists of 10 sections: - Respondent’s background - Marriage - Reproduction (birth history) - Contraception - Pregnancy, breastfeeding, health and immunization - Fertility preferences - Husband’s background, woman’s work and residence - Knowledge of AIDS - Maternal mortality - Height and weight of children and mothers.

Cleaning operations

Fieldwork and data processing activities overlapped. After a week of data collection, and after field editing of questionnaires for completeness and consistency, the questionnaires for each cluster were packaged together and sent to the central office in Amman where they were registered and stored. Special teams were formed to carry out office editing and coding.

Data entry started after a week of office data processing. The process of data entry, editing, and cleaning was done by means of the ISSA (Integrated System for Survey Analysis) program DHS has developed especially for such surveys. The ISSA program allows data to be edited while being entered. Data entry was completed on November 14, 1997. A data processing specialist from Macro made a trip to Jordan in November and December 1997 to identify problems in data entry, editing, and cleaning, and to work on tabulations for both the preliminary and final report.

Response rate

A total of 7,924 occupied housing units were selected for the survey; from among those, 7,592 households were found. Of the occupied households, 7,335 (97 percent) were successfully interviewed. In those households, 5,765 eligible women were identified, and complete interviews were obtained with 5,548 of them (96 percent of all eligible women). Thus, the overall response rate of the 1997 JPFHS was 93 percent. The principal reason for nonresponse among the women was the failure of interviewers to find them at home despite repeated callbacks.

Note: See summarized response rates by place of residence in Table 1.1 of the survey report.

Sampling error estimates

The estimates from a sample survey are subject to two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the result of mistakes made in implementing data collection and data processing (such as failure to locate and interview the correct household, misunderstanding questions either by the interviewer or the respondent, and data entry errors). Although during the implementation of the 1997 JPFHS numerous efforts were made to minimize this type of error, nonsampling errors are not only impossible to avoid but also difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The respondents selected in the 1997 JPFHS constitute only one of many samples that could have been selected from the same population, given the same design and expected size. Each of those samples would have yielded results differing somewhat from the results of the sample actually selected. Sampling errors are a measure of the variability among all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, since the 1997 JDHS-II sample resulted from a multistage stratified design, formulae of higher complexity had to be used. The computer software used to calculate sampling errors for the 1997 JDHS-II was the ISSA Sampling Error Module, which uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics, such as fertility and mortality rates.

Note: See detailed estimate of sampling error calculation in APPENDIX B of the survey report.

Data appraisal

Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months

Note: See detailed tables in APPENDIX C of the survey report.
1971 Census Microdata for Great Britain: 9% Sample: Secure Access
beta.ukdataservice.ac.uk
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UK Data Service (2019). 1971 Census Microdata for Great Britain: 9% Sample: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-8271-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8271-1
Dataset updated
2019
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
datacite
Area covered
Great Britain, United Kingdom
Description
The 1971 Census Microdata for Great Britain: 9% Sample: Secure Access dataset was created from existing digital records from the 1971 Census. It comprises a larger population sample than the other files available from the 1971 Census (see below) and so contains sufficient information to constitute personal data, meaning that it is only available to Accredited Researchers, under restrictive Secure Access conditions. See Access section for further details.

The file was created under a project known as Enhancing and Enriching Historic Census Microdata Samples (EEHCM), which was funded by the Economic and Social Research Council with input from the Office for National Statistics and National Records of Scotland. The project ran from 2012-2014 and was led from the UK Data Archive, University of Essex, in collaboration with the Cathie Marsh Institute for Social Research (CMIST) at the University of Manchester and the Census Offices. In addition to the 1971 data, the team worked on files from the 1961 Census and 1981 Census.

The original 1971 records preceded current data archival standards and were created before microdata sets for secondary use were anticipated. A process of data recovery and quality checking was necessary to maximise their utility for current researchers, though some imperfections remain (see the User Guide for details).

Three other 1971 Census datasets have been created; users should obtain the other datasets in the series first to see whether they are sufficient for their research needs before considering making an application for this study (SN 8271), the Secure Access version:
SN 8268 - 1971 Census Microdata Individual File for Great Britain: 5% Sample, which contains information on individuals in larger local authorities;
SN 8269 - 1971 Census Microdata Household File for Great Britain: 0.95% Sample, which links household members together to allow individuals to be understood within their household context. SNs 8268 and 8269 are both available to registered UK Data Service users based in the United Kingdom (see Access section for non-UK access restrictions); and
SN 8270 - 1971 Census Microdata Teaching Dataset for Great Britain: 1% Sample: Open Access, which can be used as a taster file and is freely available for anyone to download under an Open Government Licence.
2010 American Community Survey: DP02 | SELECTED SOCIAL CHARACTERISTICS IN...
data.census.gov
Updated Apr 1, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS (2010). 2010 American Community Survey: DP02 | SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES (ACS 1-Year Estimates Data Profiles) [Dataset]. https://data.census.gov/table/ACSDP1Y2010.DP02?g=050XX00US39055
Explore at:
Dataset updated
Apr 1, 2010
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2010
Area covered
United States
Description
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2010, the 2010 Census provides the official counts of the population and housing units for the nation, states, counties, cities and towns..Explanation of Symbols:.An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2000 data. Boundaries for urban areas have not been updated since Census 2000. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2010 American Community Survey (ACS) data generally reflect the December 2009 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..The Census Bureau introduced a new set of disability questions in the 2008 ACS questionnaire. Accordingly, comparisons of disability data from 2008 or later with data from prior years are not recommended. For more information on these questions and their evaluation in the 2006 ACS Content Test, see the Evaluation Report Covering Disability..Data for year of entry of the native population reflect the year of entry into the U.S. by people who were born in Puerto Rico, U.S. Island Areas or born outside the U.S. to a U.S. citizen parent and who subsequently moved to the U.S..Ancestry listed in this table refers to the total number of people who responded with a particular ancestry; for example, the estimate given for Russian represents the number of people who listed Russian as either their first or second ancestry. This table lists only the largest ancestry groups; see the Detailed Tables for more categories. Race and Hispanic origin groups are not included in this table because official data for those groups come from the Race and Hispanic origin questions rather than the ancestry question (see Demographic Table)..Starting in 2008, the Scotch-Irish category does not include Irish-Scotch. People who reported Irish-Scotch ancestry are classified under "Other groups," whereas in 2007 and earlier they were classified as Scotch-Irish..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2010 American Community Survey
Z
COVID Fake News Dataset
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Nov 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumit Banik (2020). COVID Fake News Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4282521
Explore at:
Dataset updated
Nov 27, 2020
Dataset authored and provided by
Sumit Banik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

The dataset contains the list of COVID Fake News/Claims which is shared all over the internet.

Content

Headlines: String attribute consisting of the headlines/fact shared.

Outcome: It is binary data where 0 means the headline is fake and 1 means that it is true.

Inspiration

In many research portals, there was this common question in which the combined fake news dataset is available or not. This led to the publication of this dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jia Wang; Lili Tian; Li Yan (2024). Real data example. [Dataset]. http://doi.org/10.1371/journal.pone.0314705.s001

Data from: Real data example.

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0314705.s001

Dataset updated

Dec 13, 2024

Dataset provided by

PLOS ONE

Authors

Jia Wang; Lili Tian; Li Yan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In genomic study, log transformation is a common prepossessing step to adjust for skewness in data. This standard approach often assumes that log-transformed data is normally distributed, and two sample t-test (or its modifications) is used for detecting differences between two experimental conditions. However, recently it was shown that two sample t-test can lead to exaggerated false positives, and the Wilcoxon-Mann-Whitney (WMW) test was proposed as an alternative for studies with larger sample sizes. In addition, studies have demonstrated that the specific distribution used in modeling genomic data has profound impact on the interpretation and validity of results. The aim of this paper is three-fold: 1) to present the Exp-gamma distribution (exponential-gamma distribution stands for log-transformed gamma distribution) as a proper biological and statistical model for the analysis of log-transformed protein abundance data from single-cell experiments; 2) to demonstrate the inappropriateness of two sample t-test and the WMW test in analyzing log-transformed protein abundance data; 3) to propose and evaluate statistical inference methods for hypothesis testing and confidence interval estimation when comparing two independent samples under the Exp-gamma distributions. The proposed methods are applied to analyze protein abundance data from a single-cell dataset.

Clear search

Close search

Google apps

Main menu

Data from: Real data example.

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Data from: Mean and Variance Corrected Test Statistics for Structural...

Dummy upload for arXiv:1506.09008 supplementary data

Overview Metadata of Water-Quality Field Blank Data, Replicate Sample Data,...

Current Population Survey (CPS)

Opinion on the meaning of the term 'fake news' in Turkey 2018

1961 Census Microdata Individual File for Great Britain: 5% Sample

Opinion on the meaning of the term 'fake news' in Serbia 2018

Bells Test Cancellation Tools Data 311023

Impact of limited data availability on the accuracy of project duration...

Community Survey: 2021 Random Sample Results

Supplementary data for the paper "Why psychologists should not default to...

Opinion on the meaning of the term 'fake news' in Great Britain 2018

Supplement Sales Prediction

Context

Supplement Sales Prediction

Content

Evaluation

Public and Private Split

Acknowledgements

Inspiration

Data from: Linear Hypothesis Testing in Linear Models With High-Dimensional...

Population and Family Health Survey 1997 - Jordan

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

1971 Census Microdata for Great Britain: 9% Sample: Secure Access

2010 American Community Survey: DP02 | SELECTED SOCIAL CHARACTERISTICS IN...

COVID Fake News Dataset

Data from: Real data example.