Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pen-and-paper homework and project-based learning are both commonly used instructional methods in introductory statistics courses. However, there have been few studies comparing these two methods exclusively. In this case study, each was used in two different sections of the same introductory statistics course at a regional state university. Students’ statistical literacy was measured by exam scores across the course, including the final. The comparison of the two instructional methods includes using descriptive statistics and two-sample t-tests, as well authors’ reflections on the instructional methods. Results indicated that there is no statistically discernible difference between the two instructional methods in the introductory statistics course.
Facebook
TwitterA data set of cross-nationally comparable microdata samples for 15 Economic Commission for Europe (ECE) countries (Bulgaria, Canada, Czech Republic, Estonia, Finland, Hungary, Italy, Latvia, Lithuania, Romania, Russia, Switzerland, Turkey, UK, USA) based on the 1990 national population and housing censuses in countries of Europe and North America to study the social and economic conditions of older persons. These samples have been designed to allow research on a wide range of issues related to aging, as well as on other social phenomena. A common set of nomenclatures and classifications, derived on the basis of a study of census data comparability in Europe and North America, was adopted as a standard for recoding. This series was formerly called Dynamics of Population Aging in ECE Countries. The recommendations regarding the design and size of the samples drawn from the 1990 round of censuses envisaged: (1) drawing individual-based samples of about one million persons; (2) progressive oversampling with age in order to ensure sufficient representation of various categories of older people; and (3) retaining information on all persons co-residing in the sampled individual''''s dwelling unit. Estonia, Latvia and Lithuania provided the entire population over age 50, while Finland sampled it with progressive over-sampling. Canada, Italy, Russia, Turkey, UK, and the US provided samples that had not been drawn specially for this project, and cover the entire population without over-sampling. Given its wide user base, the US 1990 PUMS was not recoded. Instead, PAU offers mapping modules, which recode the PUMS variables into the project''''s classifications, nomenclatures, and coding schemes. Because of the high sampling density, these data cover various small groups of older people; contain as much geographic detail as possible under each country''''s confidentiality requirements; include more extensive information on housing conditions than many other data sources; and provide information for a number of countries whose data were not accessible until recently. Data Availability: Eight of the fifteen participating countries have signed the standard data release agreement making their data available through NACDA/ICPSR (see links below). Hungary and Switzerland require a clearance to be obtained from their national statistical offices for the use of microdata, however the documents signed between the PAU and these countries include clauses stipulating that, in general, all scholars interested in social research will be granted access. Russia requested that certain provisions for archiving the microdata samples be removed from its data release arrangement. The PAU has an agreement with several British scholars to facilitate access to the 1991 UK data through collaborative arrangements. Statistics Canada and the Italian Institute of statistics (ISTAT) provide access to data from Canada and Italy, respectively. * Dates of Study: 1989-1992 * Study Features: International, Minority Oversamples * Sample Size: Approx. 1 million/country Links: * Bulgaria (1992), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/02200 * Czech Republic (1991), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06857 * Estonia (1989), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06780 * Finland (1990), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06797 * Romania (1992), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06900 * Latvia (1989), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/02572 * Lithuania (1989), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/03952 * Turkey (1990), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/03292 * U.S. (1990), http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06219
Facebook
TwitterBACKGROUND The data contained in the compressed file has been extracted from the Marketing Carrier On-Time Performance (Beginning January 2018) data table of the "On-Time" database from the TranStats data library. The time period is indicated in the name of the compressed file; for example, XXX_XXXXX_2001_1 contains data of the first month of the year 2001.
RECORD LAYOUT Below are fields in the order that they appear on the records: Year Year Quarter Quarter (1-4) Month Month DayofMonth Day of Month DayOfWeek Day of Week FlightDate Flight Date (yyyymmdd) Marketing_Airline_Network Unique Marketing Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. Operated_or_Branded_Code_Share_Partners Reporting Carrier Operated or Branded Code Share Partners DOT_ID_Marketing_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Marketing_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Flight_Number_Marketing_Airline Flight Number Originally_Scheduled_Code_Share_Airline Unique Scheduled Operating Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users,for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. DOT_ID_Originally_Scheduled_Code_Share_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Originally_Scheduled_Code_Share_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Flight_Num_Originally_Scheduled_Code_Share_Airline Flight Number Operating_Airline Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. DOT_ID_Operating_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Operating_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Tail_Number Tail Number Flight_Number_Operating_Airline Flight Number OriginAirportID Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. OriginAirportSeqID Origin Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. OriginCityMarketID Origin Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. Origin Origin Airport OriginCityName Origin Airport, City Name OriginState Origin Airport, State Code OriginStateFips Origin Airport, State Fips OriginStateName Origin Airport, State Name OriginWac Origin Airport, World Area Code DestAirportID Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. DestAirportSeqID Destination Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. DestCityMarketID Destination Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. Dest Destination Airport DestCityName Destination Airport, City Name DestState Destination Airport, State Code DestStateFips D...
Facebook
TwitterThe basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.
The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.
The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.
1- Household/families. 2- Individuals.
The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.
Sample survey data [ssd]
The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.
The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.
The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)
The calculated sample size is 3,781 households.
The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.
Detailed information/formulas on the sampling design are available in the user manual.
Face-to-face [f2f]
The PECS questionnaire consists of two main sections:
First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:
Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.
Statement of the family members: Contains social, economic and demographic particulars of the selected family.
Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).
Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.
Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.
Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.
Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.
The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.
The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.
Facebook
TwitterSince the beginning of the 1960s, Statistics Sweden, in collaboration with various research institutions, has carried out follow-up surveys in the school system. These surveys have taken place within the framework of the IS project (Individual Statistics Project) at the University of Gothenburg and the UGU project (Evaluation through follow-up of students) at the University of Teacher Education in Stockholm, which since 1990 have been merged into a research project called 'Evaluation through Follow-up'. The follow-up surveys are part of the central evaluation of the school and are based on large nationally representative samples from different cohorts of students.
Evaluation through follow-up (UGU) is one of the country's largest research databases in the field of education. UGU is part of the central evaluation of the school and is based on large nationally representative samples from different cohorts of students. The longitudinal database contains information on nationally representative samples of school pupils from ten cohorts, born between 1948 and 2004. The sampling process was based on the student's birthday for the first two and on the school class for the other cohorts.
For each cohort, data of mainly two types are collected. School administrative data is collected annually by Statistics Sweden during the time that pupils are in the general school system (primary and secondary school), for most cohorts starting in compulsory school year 3. This information is provided by the school offices and, among other things, includes characteristics of school, class, special support, study choices and grades. Information obtained has varied somewhat, e.g. due to changes in curricula. A more detailed description of this data collection can be found in reports published by Statistics Sweden and linked to datasets for each cohort.
Survey data from the pupils is collected for the first time in compulsory school year 6 (for most cohorts). Questionnaire in survey in year 6 includes questions related to self-perception and interest in learning, attitudes to school, hobbies, school motivation and future plans. For some cohorts, questionnaire data are also collected in year 3 and year 9 in compulsory school and in upper secondary school.
Furthermore, results from various intelligence tests and standartized knowledge tests are included in the data collection year 6. The intelligence tests have been identical for all cohorts (except cohort born in 1987 from which questionnaire data were first collected in year 9). The intelligence test consists of a verbal, a spatial and an inductive test, each containing 40 tasks and specially designed for the UGU project. The verbal test is a vocabulary test of the opposite type. The spatial test is a so-called ‘sheet metal folding test’ and the inductive test are made up of series of numbers. The reliability of the test, intercorrelations and connection with school grades are reported by Svensson (1971).
For the first three cohorts (1948, 1953 and 1967), the standartized knowledge tests in year 6 consist of the standard tests in Swedish, mathematics and English that up to and including the beginning of the 1980s were offered to all pupils in compulsory school year 6. For the cohort 1972, specially prepared tests in reading and mathematics were used. The test in reading consists of 27 tasks and aimed to identify students with reading difficulties. The mathematics test, which was also offered for the fifth cohort, (1977) includes 19 assignments. After a changed version of the test, caused by the previously used test being judged to be somewhat too simple, has been used for the cohort born in 1982. Results on the mathematics test are not available for the 1987 cohort. The mathematics test was not offered to the students in the cohort in 1992, as the test did not seem to fully correspond with current curriculum intentions in mathematics. For further information, see the description of the dataset for each cohort.
For several of the samples, questionnaires were also collected from the students 'parents and teachers in year 6. The teacher questionnaire contains questions about the teacher, class size and composition, the teacher's assessments of the class' knowledge level, etc., school resources, working methods and parental involvement and questions about the existence of evaluations. The questionnaire for the guardians includes questions about the child's upbringing conditions, ambitions and wishes regarding the child's education, views on the school's objectives and the parents' own educational and professional situation.
The students are followed up even after they have left primary school. Among other things, data collection is done during the time they are in high school. Then school administrative data such as e.g. choice of upper secondary school line / program and grades after completing studies. For some of the cohorts, in addition to school administrative data, questionnaire data were also collected from the students.
he sample consisted of students born on the 5th, 15th and 25th of any month in 1953, a total of 10,723 students.
The data obtained in 1966 were: 1. School administrative data (school form, class type, year and grades). 2. Information about the parents' profession and education, number of siblings, the distance between home and school, etc.
This information was collected for 93% of all born on the current days. The reason for this is reduced resources for Statistics Sweden for follow-up work - reminders etc. Annual data for cohorts in 1953 were collected by Statistics Sweden up to and including academic year 1972/73.
Response rate for test and questionnaire data is 88% Standard test results were received for just over 85% of those who took the tests.
The sample included a total of 9955 students, for whom some form of information was obtained.
Part of the "Individual Statistics Project" together with cohort 1953.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Manufacturing: E-Commerce Statistics for the U.S.: 2022.Table ID.ECNECOMM2022.EC2231ECOMM.Survey/Program.Economic Census.Year.2022.Dataset.ECN Core Statistics Manufacturing: E-Commerce Statistics for the U.S.: 2022.Release Date.2025-01-23.Release Schedule.The Economic Census occurs every five years, in years ending in 2 and 7.The data in this file come from the 2022 Economic Census data files released on a flow basis starting in January 2024 with First Look Statistics. Preliminary U.S. totals released in January 2024 are superseded with final data shown in the releases of later economic census statistics through March 2026.For more information about economic census planned data product releases, see 2022 Economic Census Release Schedule..Dataset Universe.The dataset universe consists of all establishments that are in operation for at least some part of 2022, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS)..Methodology.Data Items and Other Identifying Records.Sales, value of shipments, or revenue ($1,000)E-Shipments value ($1,000) E-Shipments as percent of total sales, value of shipments, or revenue (%) Range indicating imputed percentage of total sales, value of shipments, or revenueDefinitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the economic census are employer establishments. An establishment is generally a single physical location where business is conducted or where services or industrial operations are performed. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization. For some industries, the reporting units are instead groups of all establishments in the same industry belonging to the same firm..Geography Coverage.The data are shown for the U.S. level only. For information about economic census geographies, including changes for 2022, see Geographies..Industry Coverage.The data are shown at the 2- through 3-digit 2022 NAICS code levels for the U.S. For information about NAICS, see Economic Census Code Lists..Sampling.The 2022 Economic Census sample includes all active operating establishments of multi-establishment firms and approximately 1.7 million single-establishment firms, stratified by industry and state. Establishments selected to the sample receive a questionnaire. For all data on this table, establishments not selected into the sample are represented with administrative data. For more information about the sample design, see 2022 Economic Census Methodology..Confidentiality.The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. 7504609, Disclosure Review Board (DRB) approval number: CBDRB-FY23-099).To protect confidentiality, the U.S. Census Bureau suppresses cell values to minimize the risk of identifying a particular business’ data or identity.To comply with disclosure avoidance guidelines, data rows with fewer than three contributing firms or three contributing establishments are not presented. Additionally, establishment counts are suppressed when other select statistics in the same row are suppressed. More information on disclosure avoidance is available in the 2022 Economic Census Methodology..Technical Documentation/Methodology.For detailed information about the methods used to collect data and produce statistics, survey questionnaires, Primary Business Activity/NAICS codes, NAPCS codes, and more, see Economic Census Technical Documentation..Weights.No weighting applied as establishments not sampled are represented with administrative data..Table Information.FTP Download.https://www2.census.gov/programs-surveys/economic-census/data/2022/sector31/.API Information.Economic census data are housed in the Census Bureau Application Programming Interface (API)..Symbols.D - Withheld to avoid disclosing data for individual companies; data are included in higher level totalsN - Not available or not comparableS - Estimate does not meet publication standards because of high sampling variability, poor response quality, or other concerns about the estimate quality. Unpublished estimates derived from this table by subtraction are subject to these same limitations and should not be attributed to the U.S. Census Bureau. For a description of publication standards and the total quantity response rate, see link to program methodology page.X - Not applicableA - Relative standard error of 100% or morer - Reviseds - Relative standard error exceeds 40%For a complete list of symbols, see Economic Census Data Dictionary..Data-Specific Notes.Data users who create their own es...
Facebook
TwitterWithin the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.
The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: -
· Prevalence of computers and access to the Internet. · Study the penetration and purpose of Technology use.
Palestine (West Bank and Gaza Strip) , type of locality (Urban, Rural, Refugee Camps) and governorate
Household. Person 10 years and over .
All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.
Sample survey data [ssd]
Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.
Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.
Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:
Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.
Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).
-
Face-to-face [f2f]
The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.
Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.
Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.
Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.
Data Entry: The data entry process started on 8 May 2014 and ended on 23 June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.
Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.
Response Rates= 79%
There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.
Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:
Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.
Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.
Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.
Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.
The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.
Facebook
TwitterThe dataset contains the analytical results for environmental and quality-control replicate sample sets and the computed relative percent differences (RPD) greater than 25 percent for the data collected during the surface-water sampling for the Triangle Area Water Supply Monitoring Project. The data are from samples collected during October 2017 through September 2019. Several study sites contained in this dataset were sampled for other USGS projects during the same time frame. Unless the samples at these sites were collected in conjunction with the Triangle Area Water Supply Monitoring Project, the data for other projects are not included in the dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project is built on the AdventureWorks dataset, originally provided by Microsoft for SQL Server samples. This comprehensive dataset models a bicycle manufacturer and its sales to global markets, offering a realistic foundation for a data analytics portfolio.
The raw data can be accessed and downloaded directly from the official Microsoft GitHub repository: https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/adventure-works
The work presented in this portfolio project demonstrates my end-to-end data analysis skills, from initial data cleaning and modeling to creating an interactive, insight-driven dashboard. Within this project, you will find examples of various data visualizations and a dashboard layout that follows the F-pattern for optimized user experience.
I encourage you to download the dataset and follow along with my analysis. Feel free to replicate my work, critique my methods, or build upon it with your own creative insights and improvements. Your feedback and engagement are highly welcomed!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project assessed sewage contamination using human fecal indicators detected by molecular methods. For harbour and coastal water samples obtained from 18 cities across five continents (n=442), nearly half had evidence of sewage contamination using human fecal bacteria as molecular indicators. In contrast, traditional measures using cultured E. coli or enterococci, were elevated in ~18% of the samples, with less than half confirmed for sewage contamination. Importantly, given the human health risk, loss of ecosystem services, and economic costs associated with contaminated coastal waters, more reliable methods are needed for quantifying sewage contamination in urban waterways.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterThis project provides a freely accessible three-dimensional statistical shape model (SSM) of the tibia, the MATLAB scripts for generating a SSM and the segmented surface models of the cortical and trabecular bone. Information on the use of code and data can be found in the read-me file contained within the download.
Further, this dataset and associated statistical shape models can be used in several ways to assist with skeletal focused research of the tibia-fibula. We do not have the scope to highlight each and every potential application, however have provided a series of example cases of where and how the shape models may be used. Our hope is that these examples can be directly used, or assist in guiding other uses.
Case 1: Generating Surface Samples — this example case demonstrates how to use the shape model data to reconstruct a randomly sampled 'population' of surfaces.
Case 2: Predicting and Generating Trabecular Volumes — this example case demonstrates how to combine the tibia and trabecular shape models to predict and generate the trabecular volume from a tibial surface.
Case 3: Generating Tibia-Fibula Surfaces from Landmarks — this example case demonstrates how to use the tibia-fibula shape model to estimate and reconstruct surfaces from palpable landmarks on the tibia and fibula.
Please cite our work if you use this code or data.
https://widgets.figshare.com/articles/20454462/embed?show_title=1
This project includes the following software/data packages:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These four labeled data sets are targeted at ordinal quantification. The goal of quantification is not to predict the label of each individual instance, but the distribution of labels in unlabeled sets of data.
With the scripts provided, you can extract CSV files from the UCI machine learning repository and from OpenML. The ordinal class labels stem from a binning of a continuous regression label.
We complement this data set with the indices of data items that appear in each sample of our evaluation. Hence, you can precisely replicate our samples by drawing the specified data items. The indices stem from two evaluation protocols that are well suited for ordinal quantification. To this end, each row in the files app_val_indices.csv, app_tst_indices.csv, app-oq_val_indices.csv, and app-oq_tst_indices.csv represents one sample.
Our first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ, is a variant thereof, where only the smoothest 20% of all APP samples are considered. This variant is targeted at ordinal quantification tasks, where classes are ordered and a similarity of neighboring classes can be assumed.
Usage
You can extract four CSV files through the provided script extract-oq.jl, which is conveniently wrapped in a Makefile. The Project.toml and Manifest.toml specify the Julia package dependencies, similar to a requirements file in Python.
Preliminaries: You have to have a working Julia installation. We have used Julia v1.6.5 in our experiments.
Data Extraction: In your terminal, you can call either
make
(recommended), or
julia --project="." --eval "using Pkg; Pkg.instantiate()" julia --project="." extract-oq.jl
Outcome: The first row in each CSV file is the header. The first column, named "class_label", is the ordinal class.
Further Reading
Implementation of our experiments: https://github.com/mirkobunse/regularized-oq
Facebook
TwitterThe sediment sampling data hosted on the Douglas Shoal Data Hub should be used in conjunction with the Sediment Sampling Data Dictionary and the Douglas Shoal Remediation Project: Site Assessment by Advisian Pty Ltd. The Data Dictionary provides comprehensive attribute names, alias's and descriptions of the sampling survey and analysis data. The Data Dictionary will be an ongoing resource as additional layers are made available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects), and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial—especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20–30) display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate) prediction methods and meta-analyses with related synthesis-oriented approaches.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The U.S. Geological Survey Groundwater Ambient Monitoring and Assessment-Priority Basin Project (USGS GAMA-PBP) collected samples to be analyzed for per-and polyfluoroalkyl substances (PFAS) from domestic and public supply wells from May 2019 to June 2021. The datasets presented here include identification of the 28 PFAS constituents monitored by the project, Identification and brief characterization of the 395 GAMA-PBP wells for which samples were analyzed for PFAS during the study period, and analytical results for those groundwater samples, along with results for quality control samples.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In accordance with the initial project design, an intensive Monthly Survey was initiated in August l998 in a subset of villages from the original sampling frame. In total there are 16 villages, four villages in each of the four original changwats. Specifically, one tambon per changwat was chosen from the 12 possibilities of the initial 1997 cross-section. That tambon displayed relatively little variation in the collected environmental variables across the four villages, thus allowing for the control, in a sense, of the environmental variation across villages and for the relatively large variation across the four villages in the collected economic institutional variables: informal networks, local village institutions, and/or use of national level ins titutions. Again, this selection was consistent with a primary goal of the overall project: a micro-level evaluation of family networks, markets, and formal institutions in credit and insurance. As the selected tambon in each changwat was also surveyed in the initial 1997 cross section, 15 of the households in each of the four villages had been interviewed previously, and soil samples taken. A target of 30 additional households was added so that the total would be 45 per village. Thus the overall target is 720 households. This monthly survey began as an initial village-wide census. Each structure and household was enumerated, and one individual per residential structure was interviewed concerning individuals who sleep or eat in that structure. This means that all individuals, households, and residential structures in each of the 16 villages can be identified in subsequent, monthly responses. The monthly survey itself began in August 1998 with a baseline interview on initial conditions of sampled households. These answers trigger further questions or forms which gather more specific information on the use of contracts and informal institutions, for example. Rosters provide an enumeration or list of items to be tracked in subsequent monthly interviews, and the monthly interviews themselves track inputs, outputs, and changing conditions. As the activities of a household may change, new forms are occasionally administered. In the Monthly Household Panel, each of the four-village clusters is assigned to a local team consisting of 12 enumerators, one field supervisor, one field editor, and one soil/environmental person. Much of the team consists of individuals hired from the local area, and they commute to work each day. The rest of the team, including those from the Bangkok office, reside in the local office. All interviewers speak the language of the households to which they are assigned - Thai, Lao, Khmer, or Sui. Common meals are eaten at the district office, and the first round of data entry takes place there. There are both periodic and random visits from the Bangkok staff, including the project director. Questionnaires, data disks, and environmental samples and measurements are sent to Bangkok, and about 10% of recently completed interviews are double-checked with random re-interviews of the surveyed households. Data are double blind entered into an ACCESS database that has dual Thai and English language capabilities. The enumerators themselves enter the data of another enumerator. All data entry is supervised and checked by the field supervisor and team leader. In Bangkok data entry takes place on ten PCs connected to a LAN system, with a separate data entry staff. Thai language answers are entered, translated into English, and then entered into a separate database.
Facebook
TwitterThe 2002 Vietnam Demographic and Health Survey (VNDHS 2002) is a nationally representative sample survey of 5,665 ever-married women age 15-49 selected from 205 sample points (clusters) throughout Vietnam. It provides information on levels of fertility, family planning knowledge and use, infant and child mortality, and indicators of maternal and child health. The survey included a Community/ Health Facility Questionnaire that was implemented in each of the sample clusters.
The survey was designed to measure change in reproductive health indicators over the five years since the VNDHS 1997, especially in the 18 provinces that were targeted in the Population and Family Health Project of the Committee for Population, Family and Children. Consequently, all provinces were separated into “project” and “nonproject” groups to permit separate estimates for each. Data collection for the survey took place from 1 October to 21 December 2002.
The Vietnam Demographic and Health Survey 2002 (VNDHS 2002) was the third DHS in Vietnam, with prior surveys implemented in 1988 and 1997. The VNDHS 2002 was carried out in the framework of the activities of the Population and Family Health Project of the Committee for Population, Family and Children (previously the National Committee for Population and Family Planning).
The main objectives of the VNDHS 2002 were to collect up-to-date information on family planning, childhood mortality, and health issues such as breastfeeding practices, pregnancy care, vaccination of children, treatment of common childhood illnesses, and HIV/AIDS, as well as utilization of health and family planning services. The primary objectives of the survey were to estimate changes in family planning use in comparison with the results of the VNDHS 1997, especially on issues in the scope of the project of the Committee for Population, Family and Children.
VNDHS 2002 data confirm the pattern of rapidly declining fertility that was observed in the VNDHS 1997. It also shows a sharp decline in child mortality, as well as a modest increase in contraceptive use. Differences between project and non-project provinces are generally small.
The 2002 Vietnam Demographic and Health Survey (VNDHS 2002) is a nationally representative sample survey. The VNDHS 1997 was designed to provide separate estimates for the whole country, urban and rural areas, for 18 project provinces and the remaining nonproject provinces as well. Project provinces refer to 18 focus provinces targeted for the strengthening of their primary health care systems by the Government's Population and Family Health Project to be implemented over a period of seven years, from 1996 to 2002 (At the outset of this project there were 15 focus provinces, which became 18 by the creation of 3 new provinces from the initial set of 15). These provinces were selected according to criteria based on relatively low health and family planning status, no substantial family planning donor presence, and regional spread. These criteria resulted in the selection of the country's poorer provinces. Nine of these provinces have significant proportions of ethnic minorities among their population.
The population covered by the 2002 VNDHS is defined as the universe of all women age 15-49 in Vietnam.
Sample survey data
The sample for the VNDHS 2002 was based on that used in the VNDHS 1997, which in turn was a subsample of the 1996 Multi-Round Demographic Survey (MRS), a semi-annual survey of about 243,000 households undertaken regularly by GSO. The MRS sample consisted of 1,590 sample areas known as enumeration areas (EAs) spread throughout the 53 provinces/cities of Vietnam, with 30 EAs in each province. On average, an EA comprises about 150 households. For the VNDHS 1997, a subsample of 205 EAs was selected, with 26 households in each urban EA and 39 households for each rural EA. A total of 7,150 households was selected for the survey. The VNDHS 1997 was designed to provide separate estimates for the whole country, urban and rural areas, for 18 project provinces and the remaining nonproject provinces as well. Because the main objective of the VNDHS 2002 was to measure change in reproductive health indicators over the five years since the VNDHS 1997, the sample design for the VNDHS 2002 was as similar as possible to that of the VNDHS 1997.
Although it would have been ideal to have returned to the same households or at least the same sample points as were selected for the VNDHS 1997, several factors made this undesirable. Revisiting the same households would have held the sample artificially rigid over time and would not allow for newly formed households. This would have conflicted with the other major survey objective, which was to provide up-to-date, representative data for the whole of Vietnam. Revisiting the same sample points that were covered in 1997 was complicated by the fact that the country had conducted a population census in 1999, which allowed for a more representative sample frame.
In order to balance the two main objectives of measuring change and providing representative data, it was decided to select enumeration areas from the 1999 Population Census, but to cover the same communes that were sampled in the VNDHS 1997 and attempt to obtain a sample point as close as possible to that selected in 1997. Consequently, the VNDHS 2002 sample also consisted of 205 sample points and reflects the oversampling in the 20 provinces that fall in the World Bank-supported Population and Family Health Project. The sample was designed to produce about 7,000 completed household interviews and 5,600 completed interviews with ever-married women age 15-49.
Face-to-face
As in the VNDHS 1997, three types of questionnaires were used in the 2002 survey: the Household Questionnaire, the Individual Woman's Questionnaire, and the Community/Health Facility Questionnaire. The first two questionnaires were based on the DHS Model A Questionnaire, with additions and modifications made during an ORC Macro staff visit in July 2002. The questionnaires were pretested in two clusters in Hanoi (one in a rural area and another in an urban area). After the pretest and consultation with ORC Macro, the drafts were revised for use in the main survey.
a) The Household Questionnaire was used to enumerate all usual members and visitors in selected households and to collect information on age, sex, education, marital status, and relationship to the head of household. The main purpose of the Household Questionnaire was to identify persons who were eligible for individual interview (i.e. ever-married women age 15-49). In addition, the Household Questionnaire collected information on characteristics of the household such as water source, type of toilet facilities, material used for the floor and roof, and ownership of various durable goods.
b) The Individual Questionnaire was used to collect information on ever-married women aged 15-49 in surveyed households. These women were interviewed on the following topics:
- Respondent's background characteristics (education, residential history, etc.);
- Reproductive history;
- Contraceptive knowledge and use;
- Antenatal and delivery care;
- Infant feeding practices;
- Child immunization;
- Fertility preferences and attitudes about family planning;
- Husband's background characteristics;
- Women's work information; and
- Knowledge of AIDS.
c) The Community/Health Facility Questionnaire was used to collect information on all communes in which the interviewed women lived and on services offered at the nearest health stations. The Community/Health Facility Questionnaire consisted of four sections. The first two sections collected information from community informants on some characteristics such as the major economic activities of residents, distance from people's residence to civic services and the location of the nearest sources of health care. The last two sections involved visiting the nearest commune health centers and intercommune health centers, if these centers were located within 30 kilometers from the surveyed cluster. For each visited health center, information was collected on the type of health services offered and the number of days services were offered per week; the number of assigned staff and their training; medical equipment and medicines available at the time of the visit.
The first stage of data editing was implemented by the field editors soon after each interview. Field editors and team leaders checked the completeness and consistency of all items in the questionnaires. The completed questionnaires were sent to the GSO headquarters in Hanoi by post for data processing. The editing staff of the GSO first checked the questionnaires for completeness. The data were then entered into microcomputers and edited using a software program specially developed for the DHS program, the Census and Survey Processing System, or CSPro. Data were verified on a 100 percent basis, i.e., the data were entered separately twice and the two results were compared and corrected. The data processing and editing staff of the GSO were trained and supervised for two weeks by a data processing specialist from ORC Macro. Office editing and processing activities were initiated immediately after the beginning of the fieldwork and were completed in late December 2002.
The results of the household and individual
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The underlying data is from Stack Overflow's 2019 Developer Survey Responses and can be found: https://stackoverflow.blog/2019/04/09/the-2019-stack-overflow-developer-survey-results-are-in/ Please note my intent with uploading this is to showcase my experience working with the datasets. My goal is to build a centralized portfolio.
Please note that we are using a randomized sample of 1/10th the original data set. Conclusions may not reflect real world.
The goal of this project was to explore, analyze, and visualize.
Follow this link to see the Cognos Dashboard I created: https://dataplatform.cloud.ibm.com/dashboards/ee7bf962-3882-4145-a41c-ecdda9323484/view/4427dc2d63b71c921ee1e6e4079c29002c362d5fe4bb860ad18c7b495d607297f3614099c82f4d5bde135661a7e8400f9d
Feel free to filter and play with the dashboard as you want.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the raw experimental data and supplementary materials for the "Asymmetry Effects in Virtual Reality Rod and Frame Test". The materials included are:
• Raw Experimental Data: older.csv and young.csv
• Mathematica Notebooks: a collection of Mathematica notebooks used for data analysis and visualization. These notebooks provide scripts for processing the experimental data, performing statistical analyses, and generating the figures used in the project.
• Unity Package: a Unity package featuring a sample scene related to the project. The scene was built using Unity’s Universal Rendering Pipeline (URP). To utilize this package, ensure that URP is enabled in your Unity project. Instructions for enabling URP can be found in the Unity URP Documentation.
Requirements:
• For Data Files: software capable of opening CSV files (e.g., Microsoft Excel, Google Sheets, or any programming language that can read CSV formats).
• For Mathematica Notebooks: Wolfram Mathematica software to run and modify the notebooks.
• For Unity Package: Unity Editor version compatible with URP (2019.3 or later recommended). URP must be installed and enabled in your Unity project.
Usage Notes:
• The dataset facilitates comparative studies between different age groups based on the collected variables.
• Users can modify the Mathematica notebooks to perform additional analyses.
• The Unity scene serves as a reference to the project setup and can be expanded or integrated into larger projects.
Citation: Please cite this dataset when using it in your research or publications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pen-and-paper homework and project-based learning are both commonly used instructional methods in introductory statistics courses. However, there have been few studies comparing these two methods exclusively. In this case study, each was used in two different sections of the same introductory statistics course at a regional state university. Students’ statistical literacy was measured by exam scores across the course, including the final. The comparison of the two instructional methods includes using descriptive statistics and two-sample t-tests, as well authors’ reflections on the instructional methods. Results indicated that there is no statistically discernible difference between the two instructional methods in the introductory statistics course.