100+ datasets found

Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Z
Data Analysis for the Systematic Literature Review of DL4SE
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Washington and Lee University
College of William and Mary
Authors
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Ten quick tips for getting the most scientific value out of numerical data
plos.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lars Ole Schwen; Sabrina Rueschenbaum (2023). Ten quick tips for getting the most scientific value out of numerical data [Dataset]. http://doi.org/10.1371/journal.pcbi.1006141
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006141
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Lars Ole Schwen; Sabrina Rueschenbaum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation.Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results.These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.
i
Household Health Survey 2012-2013, Economic Research Forum (ERF)...
catalog.ihsn.org
datacatalog.ihsn.org
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Kurdistan Regional Statistics Office (KRSO)
Economic Research Forum
Central Statistical Organization (CSO)
Time period covered
2012 - 2013
Area covered
Iraq
Description
Abstract

The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

The survey has six main objectives. These objectives are:

Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.

Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.

Provide data that meet the needs and requirements of national accounts.

Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.

Provide detailed indicators on the sources of households and individuals income.

Provide data necessary for formulation of a new consumer price index number.

The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

Geographic coverage

National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

Kind of data

Sample survey data [ssd]

Sampling procedure

----> Design:

Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

----> Sample frame:

Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

----> Sampling Stages:

In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

Mode of data collection

Face-to-face [f2f]

Research instrument

----> Preparation:

The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

----> Questionnaire Parts:

The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

Cleaning operations

----> Raw Data:

Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

----> Harmonized Data:

The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.

The harmonization process starts with raw data files received from the Statistical Office.

A program is generated for each dataset to create harmonized variables.

Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
i
Population and Family Health Survey 2002 - Jordan
catalog.ihsn.org
microdata.worldbank.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Statistics (DOS) (2019). Population and Family Health Survey 2002 - Jordan [Dataset]. http://catalog.ihsn.org/catalog/183
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
Department of Statistics (DOS)
Time period covered
2002
Area covered
Jordan
Description
Abstract

The JPFHS is part of the worldwide Demographic and Health Surveys Program, which is designed to collect data on fertility, family planning, and maternal and child health. The primary objective of the Jordan Population and Family Health Survey (JPFHS) is to provide reliable estimates of demographic parameters, such as fertility, mortality, family planning, fertility preferences, as well as maternal and child health and nutrition that can be used by program managers and policy makers to evaluate and improve existing programs. In addition, the JPFHS data will be useful to researchers and scholars interested in analyzing demographic trends in Jordan, as well as those conducting comparative, regional or crossnational studies.

The content of the 2002 JPFHS was significantly expanded from the 1997 survey to include additional questions on women’s status, reproductive health, and family planning. In addition, all women age 15-49 and children less than five years of age were tested for anemia.

Geographic coverage

National

Analysis unit

Household

Children under five years

Women age 15-49

Men

Kind of data

Sample survey data

Sampling procedure

The estimates from a sample survey are affected by two types of errors: 1) nonsampling errors and 2) sampling errors. Nonsampling errors are the result of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2002 JPFHS to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2002 JPFHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2002 JPFHS sample is the result of a multistage stratified design and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the 2002 JPFHS is the ISSA Sampling Error Module (ISSAS). This module used the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

Note: See detailed description of sample design in APPENDIX B of the survey report.

Mode of data collection

Face-to-face

Research instrument

The 2002 JPFHS used two questionnaires – namely, the Household Questionnaire and the Individual Questionnaire. Both questionnaires were developed in English and translated into Arabic. The Household Questionnaire was used to list all usual members of the sampled households and to obtain information on each member’s age, sex, educational attainment, relationship to the head of household, and marital status. In addition, questions were included on the socioeconomic characteristics of the household, such as source of water, sanitation facilities, and the availability of durable goods. The Household Questionnaire was also used to identify women who are eligible for the individual interview: ever-married women age 15-49. In addition, all women age 15-49 and children under five years living in the household were measured to determine nutritional status and tested for anemia.

The household and women’s questionnaires were based on the DHS Model “A” Questionnaire, which is designed for use in countries with high contraceptive prevalence. Additions and modifications to the model questionnaire were made in order to provide detailed information specific to Jordan, using experience gained from the 1990 and 1997 Jordan Population and Family Health Surveys. For each evermarried woman age 15 to 49, information on the following topics was collected:

Respondent’s background

Birth history

Knowledge and practice of family planning

Maternal care, breastfeeding, immunization, and health of children under five years of age

Marriage

Fertility preferences

Husband’s background and respondent’s employment

Knowledge of AIDS and STIs

In addition, information on births and pregnancies, contraceptive use and discontinuation, and marriage during the five years prior to the survey was collected using a monthly calendar.

Cleaning operations

Fieldwork and data processing activities overlapped. After a week of data collection, and after field editing of questionnaires for completeness and consistency, the questionnaires for each cluster were packaged together and sent to the central office in Amman where they were registered and stored. Special teams were formed to carry out office editing and coding of the open-ended questions.

Data entry and verification started after one week of office data processing. The process of data entry, including one hundred percent re-entry, editing and cleaning, was done by using PCs and the CSPro (Census and Survey Processing) computer package, developed specially for such surveys. The CSPro program allows data to be edited while being entered. Data processing operations were completed by the end of October 2002. A data processing specialist from ORC Macro made a trip to Jordan in October and November 2002 to follow up data editing and cleaning and to work on the tabulation of results for the survey preliminary report. The tabulations for the present final report were completed in December 2002.

Response rate

A total of 7,968 households were selected for the survey from the sampling frame; among those selected households, 7,907 households were found. Of those households, 7,825 (99 percent) were successfully interviewed. In those households, 6,151 eligible women were identified, and complete interviews were obtained with 6,006 of them (98 percent of all eligible women). The overall response rate was 97 percent.

Note: See summarized response rates by place of residence in Table 1.1 of the survey report.

Sampling error estimates

The estimates from a sample survey are affected by two types of errors: 1) nonsampling errors and 2) sampling errors. Nonsampling errors are the result of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2002 JPFHS to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2002 JPFHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2002 JPFHS sample is the result of a multistage stratified design and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the 2002 JPFHS is the ISSA Sampling Error Module (ISSAS). This module used the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

Note: See detailed
h
Data Acquisition & Analysis Software Market Statistics & Facts
htfmarketinsights.com
pdf & excel
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HTF Market Intelligence (2025). Data Acquisition & Analysis Software Market Statistics & Facts [Dataset]. https://htfmarketinsights.com/report/4390503-data-acquisition-analysis-software-market
Explore at:
pdf & excelAvailable download formats
Dataset updated
Oct 23, 2025
Dataset authored and provided by
HTF Market Intelligence
License
https://www.htfmarketinsights.com/privacy-policyhttps://www.htfmarketinsights.com/privacy-policy
Time period covered
2019 - 2031
Area covered
Global
Description
Global Data Acquisition & Analysis Software Market is segmented by Application (Pharma R&D, Clinical Trials, Academic Research, Quality Control, Environmental Testing), Type (Lab Data Management Software, Process Analytics Software, Statistical Analysis Software, Instrument Control Software, LIMS Integration Tools), and Geography (North America_ LATAM_ West Europe_Central & Eastern Europe_ Northern Europe_ Southern Europe_ East Asia_ Southeast Asia_ South Asia_ Central Asia_ Oceania_ MEA)
f
Data from: ODM Data Analysis—A tool for the automatic validation, monitoring...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Doods, Justin; Ständer, Sonja; Brix, Tobias Johannes; Bruland, Philipp; Ernsting, Jan; Dugas, Martin; Neuhaus, Philipp; Storck, Michael; Sarfraz, Saad (2018). ODM Data Analysis—A tool for the automatic validation, monitoring and generation of generic descriptive statistics of patient data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000711292
Explore at:
Dataset updated
Jun 22, 2018
Authors
Doods, Justin; Ständer, Sonja; Brix, Tobias Johannes; Bruland, Philipp; Ernsting, Jan; Dugas, Martin; Neuhaus, Philipp; Storck, Michael; Sarfraz, Saad
Description
IntroductionA required step for presenting results of clinical studies is the declaration of participants demographic and baseline characteristics as claimed by the FDAAA 801. The common workflow to accomplish this task is to export the clinical data from the used electronic data capture system and import it into statistical software like SAS software or IBM SPSS. This software requires trained users, who have to implement the analysis individually for each item. These expenditures may become an obstacle for small studies. Objective of this work is to design, implement and evaluate an open source application, called ODM Data Analysis, for the semi-automatic analysis of clinical study data.MethodsThe system requires clinical data in the CDISC Operational Data Model format. After uploading the file, its syntax and data type conformity of the collected data is validated. The completeness of the study data is determined and basic statistics, including illustrative charts for each item, are generated. Datasets from four clinical studies have been used to evaluate the application’s performance and functionality.ResultsThe system is implemented as an open source web application (available at https://odmanalysis.uni-muenster.de) and also provided as Docker image which enables an easy distribution and installation on local systems. Study data is only stored in the application as long as the calculations are performed which is compliant with data protection endeavors. Analysis times are below half an hour, even for larger studies with over 6000 subjects.DiscussionMedical experts have ensured the usefulness of this application to grant an overview of their collected study data for monitoring purposes and to generate descriptive statistics without further user interaction. The semi-automatic analysis has its limitations and cannot replace the complex analysis of statisticians, but it can be used as a starting point for their examination and reporting.
Market Research and Statistical Services in Australia - Market Research...
ibisworld.com
Updated Sep 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBISWorld (2024). Market Research and Statistical Services in Australia - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/au/industry/market-research-statistical-services/565/
Explore at:
Dataset updated
Sep 15, 2024
Dataset authored and provided by
IBISWorld
License
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Time period covered
2014 - 2029
Area covered
Australia
Description
The Market Research and Statistical Services industry has performed poorly because of mixed demand across years for market research and related services. Industry revenue is anticipated to shrink at an annualised 1.3% over the five years through 2024-25, totalling $3.6 billion, with revenue falling by 1.5% in the current year. The overall revenue decrease can be attributed to mixed growth in prior years because of uncertainty and demand changes in response to the COVID-19 pandemic and ABS funding volatility. Industry revenue displays significant volatility from year to year, mainly because of fluctuations in ABS funding by the Federal Government. As the next census is set to occur in 2026, ABS revenue over the past two years has been constrained. Some companies that previously used industry businesses have been increasingly performing market research and statistical analysis in-house. Many external companies have improved their technology and data collection capabilities, which has made it more cost-effective to perform these activities internally. While the introduction of artificial intelligence has provided cost-cutting opportunities for market research businesses, it has also encouraged clients to bring industry services in-house, reducing demand. Profitability has also waned because of heightened price competition and wage costs increasing as a share of revenue. Ongoing growth in online media and big data presents both challenges and opportunities for market research businesses. Mounting demand for research and statistics relating to new media audience numbers and advertising effectiveness represents a potential opportunity. Even so, market research businesses will face challenges in developing effective measurement systems, and competition from information technology specialists that are developing similar systems will intensify. Despite these challenges, industry revenue is forecast to increase at an annualised 2.0% through 2029-30 to reach $3.9 billion.
D
Statistical Tolerance Analysis Software Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Statistical Tolerance Analysis Software Market Research Report 2033 [Dataset]. https://dataintelo.com/report/statistical-tolerance-analysis-software-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Statistical Tolerance Analysis Software Market Outlook

According to our latest research, the global Statistical Tolerance Analysis Software market size reached USD 1.32 billion in 2024. The market is currently experiencing robust expansion, registering a compound annual growth rate (CAGR) of 9.1% from 2025 to 2033. By the end of 2033, the market is forecasted to attain a value of USD 2.87 billion, driven by increasing adoption across manufacturing, automotive, aerospace, and electronics sectors. The primary growth factor is the escalating demand for precision engineering and quality assurance in complex product designs, which is propelling organizations to invest in advanced statistical tolerance analysis solutions for enhanced efficiency and reduced production errors.

The growth of the Statistical Tolerance Analysis Software market is primarily fueled by the burgeoning trend toward digital transformation in the manufacturing sector. As industries transition from traditional manufacturing methods to Industry 4.0 paradigms, there is a heightened emphasis on integrating simulation and analysis tools into product development cycles. This shift is enabling manufacturers to predict potential assembly issues, minimize costly rework, and optimize design processes. Moreover, the proliferation of smart factories and the adoption of IoT-enabled devices are further augmenting the need for robust statistical analysis tools. These solutions facilitate real-time data collection and analysis, empowering engineers to make data-driven decisions that enhance product reliability and compliance with international quality standards.

Another significant growth driver is the increasing complexity of products, especially in sectors such as automotive, aerospace, and electronics. As products become more intricate, the need for precise tolerance analysis becomes paramount to ensure that all components fit and function seamlessly. Statistical tolerance analysis software enables engineers to simulate and analyze various assembly scenarios, accounting for manufacturing variations and environmental factors. This capability not only reduces the risk of part misalignment but also accelerates time-to-market by identifying potential issues early in the design phase. Furthermore, regulatory requirements for product safety and reliability are compelling organizations to adopt advanced tolerance analysis tools, thereby bolstering market growth.

Additionally, the growing focus on cost optimization and resource efficiency is encouraging enterprises to invest in statistical tolerance analysis software. By leveraging these tools, organizations can significantly reduce material wastage, minimize production downtime, and enhance overall operational efficiency. The integration of artificial intelligence and machine learning algorithms into these software solutions is further amplifying their value proposition, allowing for predictive analytics and automated decision-making. This technological evolution is expected to open new avenues for market expansion, particularly among small and medium enterprises seeking to enhance their competitive edge through digital innovation.

Regionally, North America remains the dominant market for Statistical Tolerance Analysis Software, owing to the presence of leading manufacturing and automotive companies, as well as a strong focus on innovation and quality control. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid industrialization, increasing investments in advanced manufacturing technologies, and the expansion of the automotive and electronics sectors in countries such as China, Japan, and South Korea. Europe also holds a significant share, supported by stringent regulatory standards and the presence of major aerospace and automotive OEMs. These regional dynamics are shaping the competitive landscape and influencing the adoption patterns of statistical tolerance analysis solutions worldwide.

Component Analysis

The component segment of the Statistical Tolerance Analysis Software market is bifurcated into software and services, each playing a pivotal role in the market’s value chain. The software segment dominates the market, accounting for a substantial share due to the increasing adoption of advanced simulation and analysis tools across various industries. These software solutions are designed to facilitate precise tolerance analysis, enabling engineers to predict and mitigate ass
Leading data collection methods among U.S. consumers 2023
statista.com
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading data collection methods among U.S. consumers 2023 [Dataset]. https://www.statista.com/statistics/1269920/first-party-data-usa/
Explore at:
Dataset updated
Nov 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2023
Area covered
United States
Description
During a 2023 survey carried out among working-age consumers from the United States, nearly ******* respondents stated that they preferred for their data to be collected via interactive surveys. Roughly a ***** name a loyalty card/program as their favored data collection method.
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
Software tools used for data collection and analysis.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John A. Borghi; Ana E. Van Gulick (2023). Software tools used for data collection and analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0252047.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0252047.t003
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
John A. Borghi; Ana E. Van Gulick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Software tools used to collect and analyze data. Parentheses for analysis software indicate the tools participants were taught to use as part of their education in research methods and statistics. “Other” responses for data collection software were largely comprised of survey tools (e.g. Survey Monkey, LimeSurvey) and tools for building and running behavioral experiments (e.g. Gorilla, JsPsych). “Other” responses for data analysis software largely consisted of neuroimaging-related tools (e.g. SPM, AFNI).
D
Quantitative Research Platform Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Quantitative Research Platform Market Research Report 2033 [Dataset]. https://dataintelo.com/report/quantitative-research-platform-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Quantitative Research Platform Market Outlook

As per our latest research, the global Quantitative Research Platform market size reached USD 5.2 billion in 2024, driven by the increasing demand for data-driven decision-making across industries. The market is projected to expand at a robust CAGR of 12.4% from 2025 to 2033, with the total market size anticipated to reach USD 14.8 billion by 2033. This significant growth is attributed to the proliferation of big data analytics, the rising adoption of cloud-based research solutions, and the growing emphasis on evidence-based strategies in both academic and corporate sectors.

One of the primary growth factors propelling the Quantitative Research Platform market is the escalating need for actionable insights derived from vast data sets. Organizations across sectors such as finance, healthcare, and retail are increasingly leveraging quantitative research tools to enhance their understanding of market trends, consumer behavior, and operational efficiency. The integration of advanced analytics, artificial intelligence, and machine learning within these platforms has further amplified their value proposition, enabling users to process complex data sets with greater accuracy and speed. Additionally, the surge in digital transformation initiatives globally has encouraged enterprises to invest in sophisticated research platforms that can provide a competitive edge through predictive analytics and real-time reporting.

Another key driver for the Quantitative Research Platform market is the growing adoption of cloud-based solutions. Cloud deployment offers several advantages, including scalability, cost-efficiency, and remote accessibility, making it an attractive option for organizations of all sizes. The shift towards cloud-based research platforms has been further accelerated by the increasing prevalence of remote work and the need for collaborative research environments. This trend is particularly pronounced in sectors such as academic research and financial services, where teams often span multiple geographies and require seamless access to data and analytical tools. As a result, vendors are continually enhancing their cloud offerings with improved security, integration capabilities, and user-friendly interfaces to cater to evolving customer needs.

Furthermore, the rising importance of regulatory compliance and data privacy is shaping the evolution of the Quantitative Research Platform market. With stricter regulations such as GDPR and HIPAA in place, organizations are prioritizing platforms that offer robust security features and compliance management tools. This has led to increased investments in platforms equipped with advanced encryption, audit trails, and data governance functionalities. Moreover, the growing focus on ethical research practices and transparency is prompting vendors to develop solutions that facilitate better documentation, reproducibility, and accountability in quantitative research processes. These factors collectively contribute to the sustained growth and innovation within the market.

From a regional perspective, North America continues to dominate the Quantitative Research Platform market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading technology providers, high adoption rates of advanced analytics, and robust research infrastructure have positioned North America as a key growth hub. Meanwhile, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, increasing research investments, and the expansion of corporate and academic sectors in countries like China, India, and Japan. Latin America and the Middle East & Africa, though smaller in market share, are witnessing steady growth as organizations in these regions recognize the value of quantitative research in driving business and policy decisions.

Component Analysis

The Quantitative Research Platform market is segmented by component into Software and Services, each playing a pivotal role in shaping the overall market landscape. The software segment encompasses a wide range of solutions including data collection tools, statistical analysis software, survey platforms, and visualization tools. These software products are designed to facilitate the end-to-end process of quantitative research, from data gathering and cleansing to advanced analytics a
Household Survey on Information and Communications Technology 2014 - West...
datacatalog.ihsn.org
catalog.ihsn.org
Updated Oct 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2021). Household Survey on Information and Communications Technology 2014 - West Bank and Gaza [Dataset]. https://datacatalog.ihsn.org/catalog/9840
Explore at:
Dataset updated
Oct 14, 2021
Dataset authored and provided by
Palestinian Central Bureau of Statisticshttps://pcbs.gov/
Time period covered
2014
Area covered
Gaza Strip, West Bank, Gaza
Description
Abstract

Within the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.

The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: - Prevalence of computers and access to the Internet. - Study the penetration and purpose of Technology use.

Geographic coverage

Palestine (West Bank and Gaza Strip), type of locality (urban, rural, refugee camps) and governorate.

Analysis unit

Household.

Persons 10 years and over .

Universe

All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.

Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.

Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:

Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.

Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).

Mode of data collection

Face-to-face [f2f]

Research instrument

The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.

Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.

Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.

Cleaning operations

Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.

Data Entry: The data entry process started on the 8th of May 2014 and ended on the 23rd of June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.

Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

Response rate

Response Rates: 79%

Sampling error estimates

There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.

Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:

Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.

Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.

Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.

Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.
S
children’s selective trust
scidb.cn
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenshan Huang (2025). children’s selective trust [Dataset]. http://doi.org/10.57760/sciencedb.30953
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.30953
Dataset updated
Nov 5, 2025
Dataset provided by
Science Data Bank
Authors
Chenshan Huang
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Data Source & Collection Process：The data originated from kindergartens, targeting child participants. Data collection adopted a one-on-one interaction mode—researchers conducted task-based tests with each child individually (e.g., cognitive tasks, behavioral observation tasks) to ensure the authenticity and accuracy of each child’s response, avoiding interference between participants.Data Processing Tool：The raw data (e.g., task completion scores, response time records) were organized and analyzed using SPSS (full name: Statistical Package for the Social Sciences). This tool was mainly used for basic data cleaning (e.g., removing invalid samples with missing key information) and preliminary statistical processing (e.g., calculating descriptive statistics), which is a common statistical tool in educational and psychological research involving child participants.
f
Project for Statistics on Living Standards and Development 1993 - South...
microdata.fao.org
catalog.ihsn.org
+2more
Updated Oct 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Southern Africa Labour and Development Research Unit (2020). Project for Statistics on Living Standards and Development 1993 - South Africa [Dataset]. https://microdata.fao.org/index.php/catalog/1527
Explore at:
Dataset updated
Oct 20, 2020
Dataset authored and provided by
Southern Africa Labour and Development Research Unit
Time period covered
1993
Area covered
South Africa
Description
Abstract

The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.

Geographic coverage

National

Analysis unit

Households

Universe

All Household members. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.

Kind of data

Sample survey data [ssd]

Sampling procedure

(a) SAMPLING DESIGN

Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added.

(b) SAMPLE FRAME

The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.

Mode of data collection

Face-to-face [f2f]

Cleaning operations

All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.

These responses are coded in the data files with the following values: VALUE MEANING -1 : The data was not available on the questionnaire or form -2 : The field is not applicable -3 : Respondent refused to answer -4 : Respondent did not know answer to question

Data appraisal

The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.
C
Clinical Data Management and Statistical Analysis Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Clinical Data Management and Statistical Analysis Report [Dataset]. https://www.archivemarketresearch.com/reports/clinical-data-management-and-statistical-analysis-29229
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Feb 15, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global clinical data management and statistical analysis market is projected to reach USD 32.4 billion by 2033, expanding at a CAGR of 10.1%. Increasing clinical trial outsourcing, rising adoption of electronic data capture (EDC) systems, and growing healthcare expenditure drive market growth. The market is segmented into data management and statistical analysis, with the former accounting for a larger share. Key players in the market include Clinipace, Charles River Laboratories, LabCorp, ICON PLC, and Parexel. North America is the largest market for clinical data management and statistical analysis, followed by Europe and Asia Pacific. The region's large population base, presence of major pharmaceutical companies, and extensive adoption of advanced technologies contribute to its dominance. The Asia Pacific region is expected to experience the fastest growth due to increasing investments in healthcare infrastructure, government initiatives to promote clinical research, and rising demand for clinical data outsourcing services.
d
Data from: A protocol for conducting and presenting results of...
search.dataone.org
datadryad.org
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alain F. Zuur; Elena N. Ieno (2025). A protocol for conducting and presenting results of regression-type analyses [Dataset]. http://doi.org/10.5061/dryad.v4t42
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.v4t42
Dataset updated
Apr 18, 2025
Dataset provided by
Dryad Digital Repository
Authors
Alain F. Zuur; Elena N. Ieno
Time period covered
Jan 1, 2017
Description
Scientific investigation is of value only insofar as relevant results are obtained and communicated, a task that requires organizing, evaluating, analysing and unambiguously communicating the significance of data. In this context, working with ecological data, reflecting the complexities and interactions of the natural world, can be a challenge. Recent innovations for statistical analysis of multifaceted interrelated data make obtaining more accurate and meaningful results possible, but key decisions of the analyses to use, and which components to present in a scientific paper or report, may be overwhelming. We offer a 10-step protocol to streamline analysis of data that will enhance understanding of the data, the statistical models and the results, and optimize communication with the reader with respect to both the procedure and the outcomes. The protocol takes the investigator from study design and organization of data (formulating relevant questions, visualizing data collection, data...
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
d
Data from: Lifestyle and sense of coherence: A comparative analysis among...
datadryad.org
data.niaid.nih.gov
zip
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Paulo Costa Braga; Eduardo Wolfgram; João Paulo Batista de Souza; Roberto de Almeida; Cezar Rangel Pestana (2023). Lifestyle and sense of coherence: A comparative analysis among university students in different areas of knowledge [Dataset]. http://doi.org/10.5061/dryad.bcc2fqzhd
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bcc2fqzhd
Dataset updated
Jun 14, 2023
Dataset provided by
Dryad
Authors
João Paulo Costa Braga; Eduardo Wolfgram; João Paulo Batista de Souza; Roberto de Almeida; Cezar Rangel Pestana
Time period covered
Jun 2, 2023
Description
Data gathering The researchers invited the students to answer an online form - through Google Forms virtual platform - containing the questionnaires: sociodemographic information, FANTASTIC questionnaire on Lifestyle, and a questionnaire on Sense of Coherence. The researchers clearly explained the research objectives and collection procedures on the home page, and the participants were given the Free and Informed Consent Form. The data gathered in the online form were transferred to a spreadsheet in Microsoft Excel. The results were filtered, classified, and treated in order to be in line with the desired statistical analysis and could feed the statistical programs used. Statistical analysis The statistical analyses were performed by the JASP statistical software, and part of the graphics by the SPSS software. First, the researchers submitted the results to normality (Shapiro Wilk) and homogeneity (Levene test) analysis. Next, the normal homogeneous data were submitted to the ANOVA anal...

Facebook

Twitter

Click to copy link

Link copied

Cite

Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24728073.v1

Dataset updated

Dec 4, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Kingsley Okoye; Samira Hosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Clear search

Close search

Google apps

Main menu

Collection of example datasets used for the book - R Programming -...

Data Analysis for the Systematic Literature Review of DL4SE

Ten quick tips for getting the most scientific value out of numerical data

Household Health Survey 2012-2013, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Population and Family Health Survey 2002 - Jordan

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data Acquisition & Analysis Software Market Statistics & Facts

Data from: ODM Data Analysis—A tool for the automatic validation, monitoring...

Market Research and Statistical Services in Australia - Market Research...

Statistical Tolerance Analysis Software Market Research Report 2033

Statistical Tolerance Analysis Software Market Outlook

Component Analysis

Leading data collection methods among U.S. consumers 2023

COVID-19 Combined Data-set with Improved Measurement Errors

Software tools used for data collection and analysis.

Quantitative Research Platform Market Research Report 2033

Quantitative Research Platform Market Outlook

Component Analysis

Household Survey on Information and Communications Technology 2014 - West...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

children’s selective trust

Project for Statistics on Living Standards and Development 1993 - South...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Data appraisal

Clinical Data Management and Statistical Analysis Report

Data from: A protocol for conducting and presenting results of...

UC_vs_US Statistic Analysis.xlsx

Data from: Lifestyle and sense of coherence: A comparative analysis among...

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research