Facebook
TwitterBackground Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset for Linear Regression with two Independent variables and one Dependent variable. Focused on Testing, Visualization and Statistical Analysis. The dataset is synthetic and contains 100 instances.
Facebook
Twitterhttps://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/WSU7I6
The dataset comprises three dynamic scenes characterized by both simple and complex lighting conditions. The quantity of cameras ranges from 4 to 512, including 4, 6, 8, 10, 12, 14, 16, 32, 64, 128, 256, and 512. The point clouds are randomly generated.
Facebook
TwitterDataset for the statistical analysis of the article "Empowerment through Participatory Game Creation: A Case Study with Adults with Intellectual Disability".
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global Multivariate Analysis Software market is poised for significant expansion, projected to reach an estimated market size of USD 4,250 million in 2025, with a robust Compound Annual Growth Rate (CAGR) of 12.5% anticipated through 2033. This growth is primarily fueled by the increasing adoption of advanced statistical techniques across a wide spectrum of industries, including the burgeoning pharmaceutical sector, sophisticated chemical research, and complex manufacturing processes. The demand for data-driven decision-making, coupled with the ever-growing volume of complex datasets, is compelling organizations to invest in powerful analytical tools. Key drivers include the rising need for predictive modeling in drug discovery and development, quality control in manufacturing, and risk assessment in financial applications. Emerging economies, particularly in the Asia Pacific region, are also contributing to this upward trajectory as they invest heavily in technological advancements and R&D, further amplifying the need for sophisticated analytical solutions. The market is segmented by application into Medical, Pharmacy, Chemical, Manufacturing, and Marketing. The Pharmacy and Medical applications are expected to witness the highest growth owing to the critical need for accurate data analysis in drug efficacy studies, clinical trials, and personalized medicine. In terms of types, the market encompasses a variety of analytical methods, including Multiple Linear Regression Analysis, Multiple Logistic Regression Analysis, Multivariate Analysis of Variance (MANOVA), Factor Analysis, and Cluster Analysis. While advanced techniques like MANOVA and Factor Analysis are gaining traction for their ability to uncover intricate relationships within data, the foundational Multiple Linear and Logistic Regression analyses remain widely adopted. Restraints, such as the high cost of specialized software and the need for skilled personnel to effectively utilize these tools, are being addressed by the emergence of more user-friendly interfaces and cloud-based solutions. Leading companies like Hitachi High-Tech America, OriginLab Corporation, and Minitab are at the forefront, offering comprehensive suites that cater to diverse analytical needs. This report provides an in-depth analysis of the global Multivariate Analysis Software market, encompassing a study period from 2019 to 2033, with a base and estimated year of 2025 and a forecast period from 2025 to 2033, building upon historical data from 2019-2024. The market is projected to witness significant expansion, driven by increasing data complexity and the growing need for advanced analytical capabilities across various industries. The estimated market size for Multivariate Analysis Software is expected to reach $2.5 billion by 2025, with projections indicating a substantial growth to $5.8 billion by 2033, demonstrating a robust compound annual growth rate (CAGR) of approximately 11.5% during the forecast period.
Facebook
TwitterOfficer Involved Shooting (OIS) Database and Statistical Analysis. Data is updated after there is an officer involved shooting.PIU#Incident # - the number associated with either the incident or used as reference to store the items in our evidence rooms Date of Occurrence Month - month the incident occurred (Note the year is labeled on the tab of the spreadsheet)Date of Occurrence Day - day of the month the incident occurred (Note the year is labeled on the tab of the spreadsheet)Time of Occurrence - time the incident occurredAddress of incident - the location the incident occurredDivision - the LMPD division in which the incident actually occurredBeat - the LMPD beat in which the incident actually occurredInvestigation Type - the type of investigation (shooting or death)Case Status - status of the case (open or closed)Suspect Name - the name of the suspect involved in the incidentSuspect Race - the race of the suspect involved in the incident (W-White, B-Black)Suspect Sex - the gender of the suspect involved in the incidentSuspect Age - the age of the suspect involved in the incidentSuspect Ethnicity - the ethnicity of the suspect involved in the incident (H-Hispanic, N-Not Hispanic)Suspect Weapon - the type of weapon the suspect used in the incidentOfficer Name - the name of the officer involved in the incidentOfficer Race - the race of the officer involved in the incident (W-White, B-Black, A-Asian)Officer Sex - the gender of the officer involved in the incidentOfficer Age - the age of the officer involved in the incidentOfficer Ethnicity - the ethnicity of the suspect involved in the incident (H-Hispanic, N-Not Hispanic)Officer Years of Service - the number of years the officer has been serving at the time of the incidentLethal Y/N - whether or not the incident involved a death (Y-Yes, N-No, continued-pending)Narrative - a description of what was determined from the investigationContact:Carol Boylecarol.boyle@louisvilleky.gov
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regression analysis of the business environment well-being index is presented.
Facebook
TwitterThe aim of this study is to provide datasets for teaching and testing the methods embedded in the Advanced Statistical Analysis. For each datafile, there is an accompanying document describing (i) which models could be run and tested with this particular data and (ii) the steps for doing so.
Facebook
TwitterDescriptive and statistical analysis of the mental health measures regarding the course levels students are currently enrolled in (n = 2,437).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The “Students Performance Data” dataset provides academic and demographic information of students. It includes their marks in Maths, Science, and English along with attendance and city details. This dataset is ideal for beginners learning data entry, analysis, and visualization using tools like Excel or Kaggle Notebooks.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This is the experiemental fostering care publication comprising of datasets.
Source agency: Office for Standards in Education, Children's Services and Skills
Designation: Experimental Official Statistics
Language: English
Alternative title: Experimental statistics: fostering care datasets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract SPEED Stat is a new spreadsheet program for univariate statistical analyses, focused on the dominant profile of agricultural experimentation. The program can perform analysis of variance; tests for normality, homoscedasticity, additivity, outliers; complex contrasts; multiple comparison tests; Scott-Knott's grouping analysis; regression analysis; and others. It has available at speedstatsoftware.wordpress.com.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication data for the article Eady, Gregory (2016) "The Statistical Analysis of Misreporting on Sensitive Survey Questions"
Facebook
TwitterThis page lists ad-hoc statistics released during the period April - June 2020. These are additional analyses not included in any of the Department for Digital, Culture, Media and Sport’s standard publications.
If you would like any further information please contact evidence@culture.gov.uk.
These are experimental estimates of the quarterly GVA in chained volume measures by DCMS sectors and subsectors between 2010 and 2018, which have been produced to help the department estimate the effect of shocks to the economy. Due to substantial revisions to the base data and methodology used to construct the tourism satellite account, estimates for the tourism sector are only available for 2017. For this reason “All DCMS Sectors” excludes tourism. Further, as chained volume measures are not available for Civil Society at present, this sector is also not included.
The methods used to produce these estimates are experimental. The data here are not comparable to those published previously and users should refer to the annual reports for estimates of GVA by businesses in DCMS sectors.
GVA generated by businesses in DCMS sectors (excluding Tourism and Civil Society) increased by 31.0% between the fourth quarters of 2010 and 2018. The UK economy grew by 16.7% over the same period.
All individual DCMS sectors (excluding Tourism and Civil Society) grew faster than the UK average between quarter 4 of 2010 and 2018, apart from the Telecoms sector, which decreased by 10.1%.
MS Excel Spreadsheet, 57KB
This data shows the proportion of the total turnover in DCMS sectors in 2017 that was generated by businesses according to individual businesses turnover, and by the number of employees.
In 2017 a larger share of total turnover was generated by DCMS sector businesses with an annual turnover of less than one million pounds (11.4%) than the UK average (8.6%). In general, individual DCMS sectors tended to have a higher proportion of total turnover generated by businesses with individual turnover of less than one million pounds, with the exception of the Gambling (0.2%), Digital (8.2%) and Telecoms (2.0%, wholly within Digital) sectors.
DCMS sectors tended to have a higher proportion of total turnover generated by large (250 employees or more) businesses (57.8%) than the UK average (51.4%). The exceptions were the Creative Industries (41.7%) and the Cultural sector (42.4%). Of all DCMS sectors, the Gambling sector had the highest proportion of total turnover generated by large businesses (97.5%).
MS Excel Spreadsheet, 43.4KB
Facebook
TwitterThis collection automatically captures metadata, the source of which is the GOVERNMENT OF THE REPUBLIC OF SLOVENIA STATISTICAL USE OF THE REPUBLIC OF SLOVENIA and corresponding to the source collection entitled “Statutory population and feminite index, statistical regions, Slovenia, half-yearly”.
Actual data are available in Px-Axis format (.px). With additional links, you can access the source portal page for viewing and selecting data, as well as the PX-Win program, which can be downloaded free of charge. Both allow you to select data for display, change the format of the printout, and store it in different formats, as well as view and print tables of unlimited size, as well as some basic statistical analyses and graphics.
Facebook
TwitterAbstract Structural design procedures are based on simplified hypotheses that attempt to approximate the actual behaviour. Depending on the adopted hypothesis, the design procedure may not satisfactorily describe the structural actual behaviour. Such condition occurs in the design of reinforced concrete pipes, where there are uncertainties related especially on the internal forces and the installation type of the pipe. Moreover, the main design hypothesis is that the cross section is plane and perpendicular to the deformed axis. Based on materials resistance principles it is known that this hypothesis is unsatisfactory to pipes with aspect ratio lower than ten. Note that the commercial reinforced concrete pipes usually present aspect ratio well below ten. In the light of the foregoing, the main objective of this paper is to analyse the accuracy of the design procedure for reinforced concrete pipes. Therefore, statistical processes were used to compare design values with experimental results. The comparisons in this paper showed that the design procedure results in oversized pipes.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
CO2 sorption in physical solvents is one of the promising approaches for carbon capture from highly concentrated CO2 streams at high pressures. Identifying an efficient solvent and evaluating its solubility data at different operating conditions are highly essential for effective capture, which generally involves expensive and time-consuming experimental procedures. This work presents a machine learning based ultrafast alternative for accurate prediction of CO2 solubility in physical solvents using their physical, thermodynamic, and structural properties data. First, a database is established with which several linear, nonlinear, and ensemble models were trained through a systematic cross-validation and grid search method and found that kernel ridge regression (KRR) is the optimum model. Second, the descriptors are ranked based on their complete decomposition contributions derived using principal component analysis. Further, optimum key descriptors (KDs) are evaluated through an iterative sequential addition method with the objective of maximizing the prediction accuracy of the reduced order KRR (r-KRR) model. Finally, the study resulted in the r-KRR model with nine KDs exhibiting the highest prediction accuracy with a minimum root-mean-square error (0.0023), mean absolute error (0.0016), and maximum R2 (0.999). Also, the validity of the database created and ML models developed is ensured through detailed statistical analysis.
Facebook
TwitterUranium (U) is a radiologically and chemically toxic element that occurs naturally in water, soil, and rock at generally low levels. However, anthropogenic uranium can also leach into groundwater sources due to mining, ore refining, and improper nuclear waste management. Over the last few decades, various methods for measuring uranium have emerged; however, most of these techniques require skilled scientists to run samples on expensive instrumentation for detection or require the pretreatment of samples in complex procedures. In this work, a Schiff base ligand (P1) is used to develop a simple spectrophotometric method for measuring the concentration of uranium (VI) with an accurate and affordable light-emitting diode (LED) spectrophotometer. A test for a higher-order polynomial relationship was used to objectively determine the calibration data’s linearity. This test was done with a Python program on a Raspberry Pi computer that captured the spectrophotometer’s calibration and sample measurement data.
Facebook
TwitterThe AnalysisToolbox set of Python tools for statistically analyzing correlated data. This includes aspects of lattice QCD applications related to QCD phenomenology. We advertise briefly here some features of the AnalysisToolbox: General statistics: Jackknife, bootstrap, Gaussian bootstrap, error propagation, estimate integrated autocorrelation time, and curve fitting with and without Bayesian priors. We stress that these methods are useful generally, independent of physics contexts. QCD physics: Hadron resonance gas model, HotQCD equation of state, QCD beta function, physical constants, and critical exponents for various univesality classes. These methods are useful for QCD phenomenology, independent of lattice contexts. Lattice QCD: Continuum-limit extrapolation, Polyakov loop observables, SU(3) gauge fields, reading in gauge fields, and the static quark-antiquark potential. These methods rather target lattice QCD.
Facebook
TwitterBackground Microarray experiments offer a potent solution to the problem of making and comparing large numbers of gene expression measurements either in different cell types or in the same cell type under different conditions. Inferences about the biological relevance of observed changes in expression depend on the statistical significance of the changes. In lieu of many replicates with which to determine accurate intensity means and variances, reliable estimates of statistical significance remain problematic. Without such estimates, overly conservative choices for significance must be enforced. Results A simple statistical method for estimating variances from microarray control data which does not require multiple replicates is presented. Comparison of datasets from two commercial entities using this difference-averaging method demonstrates that the standard deviation of the signal scales at a level intermediate between the signal intensity and its square root. Application of the method to a dataset related to the β-catenin pathway yields a larger number of biologically reasonable genes whose expression is altered than the ratio method. Conclusions The difference-averaging method enables determination of variances as a function of signal intensities by averaging over the entire dataset. The method also provides a platform-independent view of important statistical properties of microarray data.