85 datasets found
  1. f

    Project for Statistics on Living Standards and Development 1993 - South...

    • microdata.fao.org
    • catalog.ihsn.org
    • +2more
    Updated Oct 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southern Africa Labour and Development Research Unit (2020). Project for Statistics on Living Standards and Development 1993 - South Africa [Dataset]. https://microdata.fao.org/index.php/catalog/1527
    Explore at:
    Dataset updated
    Oct 20, 2020
    Dataset authored and provided by
    Southern Africa Labour and Development Research Unit
    Time period covered
    1993
    Area covered
    South Africa
    Description

    Abstract

    The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.

    Geographic coverage

    National

    Analysis unit

    Households

    Universe

    All Household members. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    (a) SAMPLING DESIGN

    Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added.

    (b) SAMPLE FRAME

    The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.

    These responses are coded in the data files with the following values: VALUE MEANING -1 : The data was not available on the questionnaire or form -2 : The field is not applicable -3 : Respondent refused to answer -4 : Respondent did not know answer to question

    Data appraisal

    The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.

  2. f

    Data from: A Case Study of an Evaluation of Pen-and-Paper Homework and...

    • tandf.figshare.com
    pdf
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristin Lilly; Basil M. Conway (2025). A Case Study of an Evaluation of Pen-and-Paper Homework and Project-Based Learning of Statistical Literacy in an Introductory Statistics Course [Dataset]. http://doi.org/10.6084/m9.figshare.28351452.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Kristin Lilly; Basil M. Conway
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pen-and-paper homework and project-based learning are both commonly used instructional methods in introductory statistics courses. However, there have been few studies comparing these two methods exclusively. In this case study, each was used in two different sections of the same introductory statistics course at a regional state university. Students’ statistical literacy was measured by exam scores across the course, including the final. The comparison of the two instructional methods includes using descriptive statistics and two-sample t-tests, as well authors’ reflections on the instructional methods. Results indicated that there is no statistically discernible difference between the two instructional methods in the introductory statistics course.

  3. Excel projects

    • kaggle.com
    zip
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BTaffetani (2024). Excel projects [Dataset]. https://www.kaggle.com/datasets/btaffetani/excel-projects
    Explore at:
    zip(189455 bytes)Available download formats
    Dataset updated
    Jul 23, 2024
    Authors
    BTaffetani
    Description

    This is a collection of statistical projects where I used Microsoft Excel. The definition of each project was given by ProfessionAI, while the statistical analysis part was done by me. More specifically: - customer_complaints_assignment is an example of Introduction to Data Analytics where, given a dataset with complaints of customers of financial companies, tasks about filtering, counting and basic analytics were done; - trades_on_exchanges is a project for Advanced Data Analytics where statistical analysis about trading operations where done; - progetto_finale_inferenza is a project about Statistica Inference where, from a toy dataset about the population of a city, inference analysis was made.

  4. Flight Delay Statistics Project 2024

    • kaggle.com
    zip
    Updated Nov 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cindy Zhao (2025). Flight Delay Statistics Project 2024 [Dataset]. https://www.kaggle.com/datasets/cindyxingzhao/flight-delay-statistics-project-2024
    Explore at:
    zip(162728114 bytes)Available download formats
    Dataset updated
    Nov 5, 2025
    Authors
    Cindy Zhao
    Description

    BACKGROUND The data contained in the compressed file has been extracted from the Marketing Carrier On-Time Performance (Beginning January 2018) data table of the "On-Time" database from the TranStats data library. The time period is indicated in the name of the compressed file; for example, XXX_XXXXX_2001_1 contains data of the first month of the year 2001.

    RECORD LAYOUT Below are fields in the order that they appear on the records: Year Year Quarter Quarter (1-4) Month Month DayofMonth Day of Month DayOfWeek Day of Week FlightDate Flight Date (yyyymmdd) Marketing_Airline_Network Unique Marketing Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. Operated_or_Branded_Code_Share_Partners Reporting Carrier Operated or Branded Code Share Partners DOT_ID_Marketing_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Marketing_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Flight_Number_Marketing_Airline Flight Number Originally_Scheduled_Code_Share_Airline Unique Scheduled Operating Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users,for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. DOT_ID_Originally_Scheduled_Code_Share_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Originally_Scheduled_Code_Share_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Flight_Num_Originally_Scheduled_Code_Share_Airline Flight Number Operating_Airline Unique Carrier Code. When the same code has been used by multiple carriers, a numeric suffix is used for earlier users, for example, PA, PA(1), PA(2). Use this field for analysis across a range of years. DOT_ID_Operating_Airline An identification number assigned by US DOT to identify a unique airline (carrier). A unique airline (carrier) is defined as one holding and reporting under the same DOT certificate regardless of its Code, Name, or holding company/corporation. IATA_Code_Operating_Airline Code assigned by IATA and commonly used to identify a carrier. As the same code may have been assigned to different carriers over time, the code is not always unique. For analysis, use the Unique Carrier Code. Tail_Number Tail Number Flight_Number_Operating_Airline Flight Number OriginAirportID Origin Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. OriginAirportSeqID Origin Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. OriginCityMarketID Origin Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. Origin Origin Airport OriginCityName Origin Airport, City Name OriginState Origin Airport, State Code OriginStateFips Origin Airport, State Fips OriginStateName Origin Airport, State Name OriginWac Origin Airport, World Area Code DestAirportID Destination Airport, Airport ID. An identification number assigned by US DOT to identify a unique airport. Use this field for airport analysis across a range of years because an airport can change its airport code and airport codes can be reused. DestAirportSeqID Destination Airport, Airport Sequence ID. An identification number assigned by US DOT to identify a unique airport at a given point of time. Airport attributes, such as airport name or coordinates, may change over time. DestCityMarketID Destination Airport, City Market ID. City Market ID is an identification number assigned by US DOT to identify a city market. Use this field to consolidate airports serving the same city market. Dest Destination Airport DestCityName Destination Airport, City Name DestState Destination Airport, State Code DestStateFips D...

  5. Methodological aspects in the development of research projects in Clinical...

    • scielo.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deyliane Aparecida De almeida Pereira; Sarah Aparecida Vieira; Aline Siqueira Fogal; Andréia Queiroz Ribeiro; Sylvia do Carmo Castro Franceschini (2023). Methodological aspects in the development of research projects in Clinical Nutrition [Dataset]. http://doi.org/10.6084/m9.figshare.20018318.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Deyliane Aparecida De almeida Pereira; Sarah Aparecida Vieira; Aline Siqueira Fogal; Andréia Queiroz Ribeiro; Sylvia do Carmo Castro Franceschini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This text aims to foster the reflection and criticism in the process of developing research projects in clinical nutrition. We present aspects regarding the evidence, validity, and reliability of results of studies in this field. Appropriate study planning is critical, from defining the design and type of experiment, going through the ethical aspects, population choice, and calculation of sample size, to the assessment of the feasibility of the risks involved in study execution. Once the information is collected, the next stages correspond to the description of the results, statistical analyses, verification of the consistency of these results, and ultimately their correct interpretation.

  6. i

    Grant Giving Statistics for Metro Ideas Project

    • instrumentl.com
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Grant Giving Statistics for Metro Ideas Project [Dataset]. https://www.instrumentl.com/990-report/metro-ideas-project
    Explore at:
    Dataset updated
    Jan 6, 2022
    Variables measured
    Total Assets, Total Giving
    Description

    Financial overview and grant giving statistics of Metro Ideas Project

  7. 5-Minute Projects and Design Ideas's YouTube Channel Statistics

    • vidiq.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vidIQ, 5-Minute Projects and Design Ideas's YouTube Channel Statistics [Dataset]. https://vidiq.com/youtube-stats/channel/UCfa9hkgsTHfK8bpDoiQZI5g/
    Explore at:
    Dataset authored and provided by
    vidIQ
    Time period covered
    Nov 1, 2025 - Dec 1, 2025
    Area covered
    YouTube, US
    Variables measured
    subscribers, video count, video views, engagement rate, upload frequency, estimated earnings
    Description

    Comprehensive YouTube channel statistics for 5-Minute Projects and Design Ideas, featuring 313,000 subscribers and 48,555,113 total views. This dataset includes detailed performance metrics such as subscriber growth, video views, engagement rates, and estimated revenue. The channel operates in the Lifestyle category and is based in US. Track 1,212 videos with daily and monthly performance data, including view counts, subscriber changes, and earnings estimates. Analyze growth trends, engagement patterns, and compare performance against similar channels in the same category.

  8. S

    Statistical Shape Model of the Tibia

    • simtk.org
    Updated Aug 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meghan Keast; Aaron Fox (2022). Statistical Shape Model of the Tibia [Dataset]. https://simtk.org/frs/?group_id=2166
    Explore at:
    (0)Available download formats
    Dataset updated
    Aug 9, 2022
    Dataset provided by
    Deakin University
    Authors
    Meghan Keast; Aaron Fox
    Description

    This project provides a freely accessible three-dimensional statistical shape model (SSM) of the tibia, the MATLAB scripts for generating a SSM and the segmented surface models of the cortical and trabecular bone. Information on the use of code and data can be found in the read-me file contained within the download.

    Further, this dataset and associated statistical shape models can be used in several ways to assist with skeletal focused research of the tibia-fibula. We do not have the scope to highlight each and every potential application, however have provided a series of example cases of where and how the shape models may be used. Our hope is that these examples can be directly used, or assist in guiding other uses.

    Case 1: Generating Surface Samples — this example case demonstrates how to use the shape model data to reconstruct a randomly sampled 'population' of surfaces.

    Case 2: Predicting and Generating Trabecular Volumes — this example case demonstrates how to combine the tibia and trabecular shape models to predict and generate the trabecular volume from a tibial surface.

    Case 3: Generating Tibia-Fibula Surfaces from Landmarks — this example case demonstrates how to use the tibia-fibula shape model to estimate and reconstruct surfaces from palpable landmarks on the tibia and fibula.

    Please cite our work if you use this code or data.

    https://widgets.figshare.com/articles/20454462/embed?show_title=1



    This project includes the following software/data packages:

    • Statistical Shape Model Tibia : This file contains the main shape model code and data associated with the project, it also contains three example cases. For a complete description, view the read-me file contained within the archive.

  9. Community Survey 2007 - South Africa

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated May 28, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics South Africa (2019). Community Survey 2007 - South Africa [Dataset]. https://microdata.worldbank.org/index.php/catalog/918
    Explore at:
    Dataset updated
    May 28, 2019
    Dataset authored and provided by
    Statistics South Africahttp://www.statssa.gov.za/
    Time period covered
    2007
    Area covered
    South Africa
    Description

    Abstract

    The Community Survey (CS) is a nationally representative, large-scale household survey which was conducted from February to March 2007. The Community Survey is designed to provide information on the trends and levels of demographic and socio-economic data, such as population size and distribution; the extent of poor households; access to facilities and services, and the levels of employment/unemployment at national, provincial and municipality level. The data can be used to assist government and the private sector in the planning, evaluation and monitoring of programmes and policies. The information collected can also be used to assess the impact of socio-economic policies and provide an indication as to how far the country has gone in its strides to eradicate poverty.

    Censuses 1996 and 2001 are the only all-inclusive censuses that Statistics South Africa has thus far conducted under the new democratic dispensation. Demographic and socio-economic data were collected and the results have enabled government and all other users of this information to make informed decisions. When cabinet took a decision that Stats SA should not conduct a census in 2006, it created a gap in information or data between Census 2001 and the next Census scheduled to be carried out in 2011. A decision was therefore taken to carry out the Community Survey in 2007.

    The main objectives of the survey were: · To provide estimates at lower geographical levels than existing household surveys; · To build human, management and logistical capacities for Census 2011; and · To provide inputs into the preparation of the mid-year population projections.

    The wider project strategic theme is to provide relevant statistical information that meets user needs and aspirations. Some of the main topics that are covered by the survey include demography, migration, disability and social grants, educational levels, employment and economic activities.

    Geographic coverage

    The survey covered the whole of South Africa, including all nine provinces as well as the four settlement types - urban-formal, urban-informal, rural-formal (commercial farms) and rural-informal (tribal areas).

    Analysis unit

    Households

    Universe

    The Community Survey covered all de jure household members (usual residents) in South Africa. The survey excluded collective living quarters (institutions) and some households in EAs classified as recreational areas or institutions. However, an approximation of the out-of-scope population was made from the 2001 Census and added to the final estimates of the CS 2007 results.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sample Design

    The sampling procedure that was adopted for the CS was a two-stage stratified random sampling process. Stage one involved the selection of enumeration areas, and stage tow was the selection of dwelling units.

    Since the data are required for each local municipality, each municipality was considered as an explicit stratum. The stratification is done for those municipalities classified as category B municipalities (local municipalities) and category A municipalities (metropolitan areas) as proclaimed at the time of Census 2001. However, the newly proclaimed boundaries as well as any other higher level of geography such as province or district municipality, were considered as any other domain variable based on their link to the smallest geographic unit - the enumeration area.

    The Frame

    The Census 2001 enumeration areas were used because they give a full geographic coverage of the country without any overlap. Although changes in settlement type, growth or movement of people have occurred, the enumeration areas assisted in getting a spatial comparison over time. Out of 80 787 enumeration areas countrywide, 79 466 were considered in the frame. A total of 1 321 enumeration areas were excluded (919 covering institutions and 402 recreational areas).

    On the second level, the listing exercise yielded the dwelling frame which facilitated the selection of dwellings to be visited. The dwelling unit is a structure or part of a structure or group of structures occupied or meant to be occupied by one or more households. Some of these structures may be vacant and/or under construction, but can be lived in at the time of the survey. A dwelling unit may also be within collective living quarters where applicable (examples of each are a house, a group of huts, a flat, hostels, etc.).

    The Community Survey universe at the second-level frame is dependent on whether the different structures are classified as dwelling units (DUs) or not. Structures where people stay/live were listed and classified as dwelling units. However, there are special cases of collective living quarters that were also included in the CS frame. These are religious institutions such as convents or monasteries, and guesthouses where people stay for an extended period (more than a month). Student residences - based on how long people have stayed (more than a month) - and old-age homes not similar to hospitals (where people are living in a communal set-up) were treated the same as hostels, thereby listing either the bed or room. In addition, any other family staying in separate quarters within the premises of an institution (like wardens' quarters, military family quarters, teachers' quarters and medical staff quarters) were considered as part of the CS frame. The inclusion of such group quarters in the frame is based on the living circumstances within these structures. Members are independent of each other with the exception that they sleep under one roof.

    The remaining group quarters were excluded from the CS frame because they are difficult to access and have no stable composition. Excluded dwelling types were prisons, hotels, hospitals, military barracks, etc. This is in addition to the exclusion on first level of the enumeration areas (EAs) classified as institutions (military bases) or recreational areas (national parks).

    The Selection of Enumeration Areas (EAs)

    The EAs within each municipality were ordered by geographic type and EA type. The selection was done by using systematic random sampling. The criteria used were as follows: In municipalities with fewer than 30 EAs, all EAs were automatically selected. In municipalities with 30 or more EAs, the sample selection used a fixed proportion of 19% of all sampled EAs. However, if the selected EAs in a municipality were less than 30 EAs, the sample in the municipality was increased to 30 EAs.

    The Selection of Dwelling Units

    The second level of the frame required a full re-listing of dwelling units. The listing exercise was undertaken before the selection of DUs. The adopted listing methodology ensured that the listing route was determined by the lister. Thisapproach facilitated the serpentine selection of dwelling units. The listing exercise provided a complete list of dwelling units in the selected EAs. Only those structures that were classified as dwelling units were considered for selection, whether vacant or occupied. This exercise yielded a total of 2 511 314 dwelling units.

    The selection of the dwelling units was also based on a fixed proportion of 10% of the total listed dwellings in an EA. A constraint was imposed on small-size EAs where, if the listed dwelling units were less than 10 dwellings, the selection was increased to 10 dwelling units. All households within the selected dwelling units were covered. There was no replacement of refusals, vacant dwellings or non-contacts owing to their impact on the probability of selection.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Consultation on Questionnaire Design Ten stakeholder workshops were held across the country during August and September 2004. Approximately 367 stakeholders, predominantly from national, provincial and local government departments, as well as from research and educational institutions, attended. The workshops aimed to achieve two objectives, namely to better understand the type of information stakeholders need to meet their objectives, and to consider the proposed data items to be included in future household surveys. The output from this process was a set of data items relating to a specific, defined focus area and outcomes that culminated with the data collection instrument (see Annexure B for all the data items).

    Questionnaire Design The design of the CS questionnaire was household-based and intended to collect information on 10 people. It was developed in line with the household-based survey questionnaires conducted by Stats SA. The questions were based on the data items generated out of the consultation process described above. Both the design and questionnaire layout were pre-tested in October 2005 and adjustments were made for the pilot in February 2006. Further adjustments were done after the pilot results had been finalised.

    Cleaning operations

    Editing The automated cleaning was implemented based on an editing rules specification defined with reference to the approved questionnaire. Most of the editing rules were categorised into structural edits looking into the relationship between different record type, the minimum processability rules that removed false positive readings or noise, the logical editing that determine the inconsistency between fields of the same statistical unit, and the inferential editing that search similarities across the domain. The edit specifications document for the structural, population, mortality and housing edits was developed by a team of Stats SA subject-matter specialists, demographers, and programmers. The process was successfully

  10. Random Data Analysis and Linear Regression

    • kaggle.com
    zip
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tommaso Ruzza (2025). Random Data Analysis and Linear Regression [Dataset]. https://www.kaggle.com/datasets/tommasoruzza/random-data-analysis-and-linear-regression
    Explore at:
    zip(82141 bytes)Available download formats
    Dataset updated
    Nov 5, 2025
    Authors
    Tommaso Ruzza
    Description

    Created a multi-tab Excel statistical project where I generated synthetic normally-distributed data, built random sample extraction logic, calculated descriptive and inferential statistics, analysed variable correlations and performed linear regression with visualisation.

  11. Cement Craft Ideas - DIY Projects's YouTube Channel Statistics

    • vidiq.com
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vidIQ (2025). Cement Craft Ideas - DIY Projects's YouTube Channel Statistics [Dataset]. https://vidiq.com/youtube-stats/channel/UCL44bjEQeiu3xOzCwy6wr0A/
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset authored and provided by
    vidIQ
    Time period covered
    Nov 1, 2025 - Nov 30, 2025
    Area covered
    US
    Variables measured
    subscribers, video count, video views, engagement rate, upload frequency, estimated earnings
    Description

    Comprehensive YouTube channel statistics for Cement Craft Ideas - DIY Projects, featuring 758,000 subscribers and 209,665,371 total views. This dataset includes detailed performance metrics such as subscriber growth, video views, engagement rates, and estimated revenue. The channel operates in the Lifestyle category and is based in US. Track 253 videos with daily and monthly performance data, including view counts, subscriber changes, and earnings estimates. Analyze growth trends, engagement patterns, and compare performance against similar channels in the same category.

  12. w

    National Agricultural Sample Census Pilot (Private Farmer) Fishery 2007 -...

    • microdata.worldbank.org
    • microdata.fao.org
    • +2more
    Updated Oct 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (2024). National Agricultural Sample Census Pilot (Private Farmer) Fishery 2007 - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/6382
    Explore at:
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    National Bureau of Statistics, Nigeria
    Authors
    National Bureau of Statistics
    Time period covered
    2007
    Area covered
    Nigeria
    Description

    Abstract

    The programme for the World Census of Agriculture 2000 is the eighth in the series for promoting a global approach to agricultural census taking. The first and second programmes were sponsored by the International Institute for Agriculture (IITA) in 1930 and 1940. Subsequent ones up to 1990 were promoted by the Food and Agriculture Organization of the United Nations(FAO). FAO recommends that each country should conduct at least one agricultural census in each census programme decade and its programme for the World Census of Agriculture 2000 for instance corresponds to agricultural census to be undertaken during the decade 1996 to 2005. Many countries do not have sufficient resources for conducting an agricultural census. It therefore became an acceptable practice since 1960 to conduct agricultural census on sample basis for those countries lacking the resources required for a complete enumeration.

    In Nigeria's case, a combination of complete enumeration and sample enumeration is adopted whereby the rural (peasant) holdings are covered on sample basis while the modern holdings are covered on complete enumeration. The project named “National Agricultural Sample Census” derives from this practice. Nigeria through the National Agricultural Sample Census (NASC) participated in the 1970's, 1980's, 1990's programmes of the World Census of Agriculture. Nigeria failed to conduct the Agricultural Census in 2003/2004 because of lack of funding. The NBS regular annual agriculture surveys since 1996 had been epileptic and many years of backlog of data set are still unprocessed. The baseline agricultural data is yet to be updated while the annual regular surveys suffered set back. There is an urgent need by the governments (Federal, State, LGA), sector agencies, FAO and other International Organizations to come together to undertake the agricultural census exercise which is long overdue. The conduct of 2006/2008 National Agricultural Sample Census Survey is now on course with the pilot exercise carried out in the third quarter of 2007.

    The National Agricultural Sample Census (NASC) 2006/08 is imperative to the strengthening of the weak agricultural data in Nigeria. The project is phased into three sub-projects for ease of implementation; the Pilot Survey, Modern Agricultural Holding and the Main Census. It commenced in the third quarter of 2006 and to terminate in the first quarter of 2008. The pilot survey was implemented collaboratively by National Bureau of Statistics.

    The main objective of the pilot survey was to test the adequacy of the survey instruments, equipments and administration of questionnaires, data processing arrangement and report writing. The pilot survey conducted in July 2007 covered the two NBS survey system-the National Integrated Survey of Households (NISH) and National Integrated Survey of Establishment (NISE). The survey instruments were designed to be applied using the two survey systems while the use of Geographic Positioning System (GPS) was introduced as additional new tool for implementing the project.

    The Stakeholders workshop held at Kaduna on 21st-23rd May 2007 was one of the initial bench marks for the take off of the pilot survey. The pilot survey implementation started with the first level training (training of trainers) at the NBS headquarters between 13th - 15th June 2007. The second level training for all levels of field personnels was implemented at headquarters of the twelve (12) concerned states between 2nd - 6th July 2007. The field work of the pilot survey commenced on the 9th July and ended on the 13th of July 07. The IMPS and SPSS were the statistical packages used to develop the data entry programme.

    Geographic coverage

    State

    Analysis unit

    Household based of fish farmers

    Universe

    The survey covered all de jure household members (usual residents), who were into fish production

    Kind of data

    Census/enumeration data [cen]

    Sampling procedure

    The survey was carried out in 12 states falling under 6 geo-political zones. 2 states were covered in each geo-political zone. 2 local government areas per selected state were studied. 2 Rural enumeration areas per local government area were covered and 3 Fishing farming housing units were systematically selected and canvassed .

    Sampling deviation

    There was deviations from the original sample design

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The NASC fishery questionnaire was divided into the following sections: - Holding identification: This is to identify the holder through HU serial number, HH serial number, and demographic characteristics. - Type of fishing sites used by holder. - Sources and quantities of fishing inputs. - Quantity of aquatic production by type. - Quantity sold and value of sale of aquatic products. - Funds committed to fishing by source and others

    Cleaning operations

    The data processing and analysis plan involved five main stages: training of data processing staff; manual editing and coding; development of data entry programme; data entry and editing and tabulation. Census and Surveys Processing System (CSPro) software were used for data entry, Statistical Package for Social Sciences (SPSS) and CSPro for editing and a combination of SPSS, Statistical Analysis Software (SAS) and EXCEL for table generation. The subject-matter specialists and computer personnel from the NBS and CBN implemented the data processing work. Tabulation Plans were equally developed by these officers for their areas and topics covered in the three-survey system used for the exercise. The data editing is in 2 phases namely manual editing before the data entry were done. This involved using editors at the various zones to manually edit and ensure consistency in the information on the questionnaire. The second editing is the computer editing, this is the cleaning of the already enterd data. The completed questionnaires were collated and edited manually (a) Office editing and coding were done by the editor using visual control of the questionnaire before data entry (b) Cspro was used to design the data entry template provided as external resource (c) Ten operator plus two suppervissor and two progammer were used (d) Ten machines were used for data entry (e) After data entry data entry supervisor runs fequency on each section to see that all the questionnaire were enterd

    Response rate

    Both Enumeration Area (EA) and Fish holders' level Response Rate was 100 per cent.

    Sampling error estimates

    No computation of sampling error

    Data appraisal

    The Quality Control measures were carried out during the survey, essentially to ensure quality of data

  13. f

    Living Standards Measurement Survey 2001 (Wave 1 Panel) - Bosnia and...

    • microdata.fao.org
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Agency for Statistics (BHAS) (2022). Living Standards Measurement Survey 2001 (Wave 1 Panel) - Bosnia and Herzegovina [Dataset]. https://microdata.fao.org/index.php/catalog/1532
    Explore at:
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    Federation of BiH Institute of Statistics (FIS)
    State Agency for Statistics (BHAS)
    Republika Srpska Institute of Statistics (RSIS)
    Time period covered
    2001
    Area covered
    Bosnia and Herzegovina
    Description

    Abstract

    In 1992, Bosnia-Herzegovina, one of the six republics in former Yugoslavia, became an independent nation. A civil war started soon thereafter, lasting until 1995 and causing widespread destruction and losses of lives. Following the Dayton accord, BosniaHerzegovina (BiH) emerged as an independent state comprised of two entities, namely, the Federation of Bosnia-Herzegovina (FBiH) and the Republika Srpska (RS), and the district of Brcko. In addition to the destruction caused to the physical infrastructure, there was considerable social disruption and decline in living standards for a large section of the population. Alongside these events, a period of economic transition to a market economy was occurring. The distributive impacts of this transition, both positive and negative, are unknown. In short, while it is clear that welfare levels have changed, there is very little information on poverty and social indicators on which to base policies and programs. In the post-war process of rebuilding the economic and social base of the country, the government has faced the problems created by having little relevant data at the household level. The three statistical organizations in the country (State Agency for Statistics for BiH -BHAS, the RS Institute of Statistics-RSIS, and the FBiH Institute of Statistics-FIS) have been active in working to improve the data available to policy makers: both at the macro and the household level. One facet of their activities is to design and implement a series of household series. The first of these surveys is the Living Standards Measurement Study survey (LSMS). Later surveys will include the Household Budget Survey (an Income and Expenditure Survey) and a Labour Force Survey. A subset of the LSMS households will be re-interviewed in the two years following the LSMS to create a panel data set.

    The three statistical organizations began work on the design of the Living Standards Measurement Study Survey (LSMS) in 1999. The purpose of the survey was to collect data needed for assessing the living standards of the population and for providing the key indicators needed for social and economic policy formulation. The survey was to provide data at the country and the entity level and to allow valid comparisons between entities to be made. The LSMS survey was carried out in the Fall of 2001 by the three statistical organizations with financial and technical support from the Department for International Development of the British Government (DfID), United Nations Development Program (UNDP), the Japanese Government, and the World Bank (WB). The creation of a Master Sample for the survey was supported by the Swedish Government through SIDA, the European Commission, the Department for International Development of the British Government and the World Bank. The overall management of the project was carried out by the Steering Board, comprised of the Directors of the RS and FBiH Statistical Institutes, the Management Board of the State Agency for Statistics and representatives from DfID, UNDP and the WB. The day-to-day project activities were carried out by the Survey Management Team, made up of two professionals from each of the three statistical organizations. The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows: 1. To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs. 2. To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labour) at a given time, as well as within a household. 3. To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analysed data.

    Geographic coverage

    National coverage

    Analysis unit

    Households

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    (a) SAMPLE SIZE A total sample of 5,400 households was determined to be adequate for the needs of the survey: with 2,400 in the Republika Srpska and 3,000 in the Federation of BiH. The difficulty was in selecting a probability sample that would be representative of the country's population. The sample design for any survey depends upon the availability of information on the universe of households and individuals in the country. Usually this comes from a census or administrative records. In the case of BiH the most recent census was done in 1991. The data from this census were rendered obsolete due to both the simple passage of time but, more importantly, due to the massive population displacements that occurred during the war. At the initial stages of this project it was decided that a master sample should be constructed. Experts from Statistics Sweden developed the plan for the master sample and provided the procedures for its construction. From this master sample, the households for the LSMS were selected. Master Sample [This section is based on Peter Lynn's note "LSMS Sample Design and Weighting - Summary". April, 2002. Essex University, commissioned by DfID.] The master sample is based on a selection of municipalities and a full enumeration of the selected municipalities. Optimally, one would prefer smaller units (geographic or administrative) than municipalities. However, while it was considered that the population estimates of municipalities were reasonably accurate, this was not the case for smaller geographic or administrative areas. To avoid the error involved in sampling smaller areas with very uncertain population estimates, municipalities were used as the base unit for the master sample. The Statistics Sweden team proposed two options based on this same method, with the only difference being in the number of municipalities included and enumerated.

    (b) SAMPLE DESIGN For reasons of funding, the smaller option proposed by the team was used, or Option B. Stratification of Municipalities The first step in creating the Master Sample was to group the 146 municipalities in the country into three strata- Urban, Rural and Mixed - within each of the two entities. Urban municipalities are those where 65 percent or more of the households are considered to be urban, and rural municipalities are those where the proportion of urban households is below 35 percent. The remaining municipalities were classified as Mixed (Urban and Rural) Municipalities. Brcko was excluded from the sampling frame. Urban, Rural and Mixed Municipalities: It is worth noting that the urban-rural definitions used in BiH are unusual with such large administrative units as municipalities classified as if they were completely homogeneous. Their classification into urban, rural, mixed comes from the 1991 Census which used the predominant type of income of households in the municipality to define the municipality. This definition is imperfect in two ways. First, the distribution of income sources may have changed dramatically from the pre-war times: populations have shifted, large industries have closed, and much agricultural land remains unusable due to the presence of land mines. Second, the definition is not comparable to other countries' where villages, towns and cities are classified by population size into rural or urban or by types of services and infrastructure available. Clearly, the types of communities within a municipality vary substantially in terms of both population and infrastructure. However, these imperfections are not detrimental to the sample design (the urban/rural definition may not be very useful for analysis purposes, but that is a separate issue).

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    (a) DATA ENTRY

    An integrated approach to data entry and fieldwork was adopted in Bosnia and Herzegovina. Data entry proceeded side by side with data gathering to ensure verification and correction in the field. Data entry stations were located in the regional offices of the entity institutes and were equipped with computers, modem and a dedicated telephone line. The completed questionnaires were delivered to these stations each day for data entry. Twenty data entry operators (10 from Federation and 10 from RS) were trained in two training sessions held for a week each in Sarajevo and Banja Luka. The trainers were the staff of the two entity institutes who had undergone training in the CSPro software earlier and had participated in the workshops of the Pilot survey. Prior to the training, laptop computers were provided to the entity institutes, and the CSPro software was installed in them. The training for the data entry operators covered the following elements:

    • Introduction to the LSMS Survey questionnaire; Introduction to the personal computers/ lap top computers; Copying data on diskette and printing of output
    • The Data entry programme (CSPro). Understanding of the Round 1 data entry screens (Modules 1-10)
    • Practice of Round 1 (data entry trainees enter questionnaires completed by interviewer trainees during practice interviews)
    • Understanding of Round 2 Data entry screen (Modules 11-13)
    • Practice of Round 2 Data entry screens (data entry trainees entered the questionnaires completed by interviewer trainees)
    • Control Procedures; Copying
  14. i

    Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7662
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    The Hashemite Kingdom of Jordan Department of Statistics (DOS)
    Time period covered
    2010 - 2011
    Area covered
    Jordan
    Description

    Abstract

    The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

    Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

    Geographic coverage

    National

    Analysis unit

    • Households
    • Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

    A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

    It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    • General form
    • Expenditure on food commodities form
    • Expenditure on non-food commodities form

    Cleaning operations

    Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

    Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.

  15. w

    Multiple Indicator Cluster Survey 2000 - Viet Nam

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    General Statistics Office (2023). Multiple Indicator Cluster Survey 2000 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/722
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    General Statistics Office
    Time period covered
    2000
    Area covered
    Vietnam
    Description

    Abstract

    The Viet Nam Multiple Indicator Cluster Survey (MICS) was carried by General Statistics Office of Viet Nam (GSO) in collaboration with Viet Nam Committee for Population, Family and Children (VCPFC). Financial and technical support by the United Nations Children's Fund (UNICEF).

    In the World Summit for children held in New York in 1990, the Government of Vietnam committed itself to the implementation of the World Declaration and Plan of Action for children.

    In implementation of directive 34/1999/CT-TTg on 27 December 1999 on promoting the implementation of the end-decade goals for children, reviewing the National Plan of Action for children, 1991-2000 and designing the National Plan of Action for children, 2001-2010, in the framework of the “Development of Social Indicators” project, the General Statistical Office (GSO) has chaired and coordinated with the Viet Nam Committee for the Protection and Care for Children (CPCC) to conduct the survey evaluating the end- decade goals for children, 1991-2000 (MICS). MICS has covered a sample size of 7628 households in 240 communes and wards representing the whole country, the urban area, the rural area and the 8 geographical areas in 61 towns/provinces. Field activities to collect data lasted 2 months, May- June/2000. The survey was technically supported by statisticians from EAPRO, UNICEF regional offices, UNICEF Hanoi on sample and questionnaire designing, data input software, not least the software analyzing and calculating the estimates generalizing the results of survey.

    Survey Objectives: The end-decade survey on children is aimed at. · Providing up-to-date and reliable data to analyse the situation of children and women in 2000. · Providing data to assess the implementation of the World summit goals for children and of the National Plan of Action for Vietnamese Children, 1991-2000. · Serving as a basis (with baseline data and information) for development of the National Plan of Action for Children, 2001-2010. · Building professional capacity in monitoring, managing and evaluating all the goals of child protection, care and education at all levels.

    Geographic coverage

    The 2000 MICS of Vietnam was a nationally representative sample survey.

    Analysis unit

    Households, Women, Child.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for the Viet Nam Multiple Indicator Cluster Survey (MICSII) was designed to provide reliable estimates on a large number of indicators on the situation of children and women at the national level, for urban and rural areas, and for 8 regions: Red River Delta, North West, North East, North Central Coast, South Central Coast, Central Highlands, South East, and Mekong River Delta. Regions were identified as the main sampling domains and the sample was selected in two stages: At the first stage, 240 EAs are sellected. After a household listing was carried out within the selected enumeration areas, a systematic sample of 1/3 of households in each EA was drawn. The survey managed to visit all of 240 selected EAs during the fieldwork period. The sample was stratified by region and is not self-weighting. For reporting national level results, sample weights are used.

    Sampling deviation

    No major deviations from the original sample design were made. All sample enumeration areas were accessed and successfully interviewed with good response rates.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaires for MICS in Vietnam are based on the New York UNICEF module questionnaires with some modifications and additions to fit in with Vietnam's context and to evaluate the goals set out in the National Plan of Action. The questionnaires have been arranged in such a way as to prevent the loss of questionnaire sheets and to facilitate the logic control between the items in the modules. Questionnaires include 3 sections. Section 1: general questions to be administered to families and family members. Section 2: questions for child bearing-age women (aged 15-49). Section 3: for children under 5.

    Section 1: Household questionnaire Part A: Household information panel Part B: Household listing form Part C: Education Part D: Child labour Part E: Maternal mortality Part F: Water and sanitation Part G: Salt iodization

    Section 2: Questionnaire for child bearing-age women Part A: Child mortality Part B: Tetanus toxoid (TT) Part C: Maternal and newborn health Part D: Contraceptive use Part E: HIV/AIDS

    Section 3: Questionnaire for children under five Part A:Birth registration and early learning Part B: Vitamin A Part C: Breastfeeding Part D: Care of illness Part E: Malaria Part F: Immunization Part G: Anthropometry

    Apart from the questionnaires to collect information at family level, questionnaires are also designed to gather information at community level supplementary to some indicators that can not have data collected at family level. The information garnered includes local population, socio-economic and physical conditions, education, health and progress of projects/plans of actions for children.

    Cleaning operations

    To minimize the errors made by data entry staff members, all the records were double- entered by two different members. Any error detected between the two entries was re-checked to find out which one is wrong. Data cleaning started in to early September. This process was closely observed to ensure the accuracy, quality and practicality of all the data collected.

    To minimize the errors due to wrong statements of respondents or wrong registration by interviewers, a cleaning programme was used to check the consistency and logic in the items of questionnaires and between the questionnaires. The cleaning programme printed out all the errors, then questionnaires were checked by qualified officials.

    Response rate

    8356 households were selected for the sample. Of these all were found to be occupied households and 8355 were successfully interviewed for a response rate of 100%. Within these households, 10063 eligible women aged 15-49 were identified for interview, of which 9473 were successfully interviewed (response rate 94.1%), and 2707 children aged 0-4 were identified for whom the mother or caretaker was successfully interviewed for 2680 children (response rate 99%).

    Sampling error estimates

    Estimates from a sample survey are affected by two types of errors: 1) non-sampling errors and 2) sampling errors. Non-sampling errors are the results of mistakes made in the implementation of data collection and data processing. Numerous efforts were made during implementation of the MICS - 3 to minimize this type of error, however, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors can be evaluated statistically. The sample of respondents to the MICS - 3 is only one of many possible samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that different somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability in the results of the survey between all possible samples, and, although, the degree of variability is not known exactly, it can be estimated from the survey results. The sampling errors are measured in terms of the standard error for a particular statistic (mean or percentage), which is the square root of the variance. Confidence intervals are calculated for each statistic within which the true value for the population can be assumed to fall. Plus or minus two standard errors of the statistic is used for key statistics presented in MICS, equivalent to a 95 percent confidence interval.

    If the sample of respondents had been a simple random sample, it would have been possible to use straightforward formulae for calculating sampling errors. However, the MICS - 3 sample is the result of a two-stage stratified design, and consequently needs to use more complex formulae. The SPSS complex samples module has been used to calculate sampling errors for the MICS - 3. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. This method is documented in the SPSS file CSDescriptives.pdf found under the Help, Algorithms options in SPSS.

    Sampling errors have been calculated for a select set of statistics (all of which are proportions due to the limitations of the Taylor linearization method) for the national sample, urban and rural areas, and for each of the five regions. For each statistic, the estimate, its standard error, the coefficient of variation (or relative error -- the ratio between the standard error and the estimate), the design effect, and the square root design effect (DEFT -- the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used), as well as the 95 percent confidence intervals (+/-2 standard errors).

    Data appraisal

    A series of data quality tables and graphs are available to review the quality of the data and include the following:

    Age distribution of the household population Age distribution of eligible women and interviewed women Age distribution of eligible children and children for whom the mother or caretaker was interviewed Age distribution of children under age 5 by 3 month groups Age and period ratios at boundaries of eligibility Percent of observations with missing information on selected variables Presence of mother in

  16. Kickstarter Project Statistics

    • kaggle.com
    zip
    Updated Nov 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cathie So (2019). Kickstarter Project Statistics [Dataset]. https://www.kaggle.com/socathie/kickstarter-project-statistics
    Explore at:
    zip(1270675 bytes)Available download formats
    Dataset updated
    Nov 14, 2019
    Authors
    Cathie So
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Crowdfunding has become one of the main sources of initial capital for small businesses and start-up companies that are looking to launch their first products. Websites like Kickstarter and Indiegogo provide a platform for millions of creators to present their innovative ideas to the public. This is a win-win situation where creators could accumulate initial fund while the public get access to cutting-edge prototypical products that are not available in the market yet.

    At any given point, Indiegogo has around 10,000 live campaigns while Kickstarter has 6,000. It has become increasingly difficult for projects to stand out of the crowd. Of course, advertisements via various channels are by far the most important factor to a successful campaign. However, for creators with a smaller budget, this leaves them wonder,

    "How do we increase the probability of success of our campaign starting from the very moment we create our project on these websites?"

    Data Sources

    All of my raw data are scraped from Kickstarter.com.

    1. First 4000 live projects that are currently campaigning on Kickstarter (live.csv)

      • Last updated: 2016-10-29 5pm PDT
      • amt.pledged: amount pledged (float)
      • blurb: project blurb (string)
      • by: project creator (string)
      • country: abbreviated country code (string of length 2)
      • currency: currency type of amt.pledged (string of length 3)
      • end.time: campaign end time (string "YYYY-MM-DDThh:mm:ss-TZD")
      • location: mostly city (string)
      • pecentage.funded: unit % (int)
      • state: mostly US states (string of length 2) and others (string)
      • title: project title (string)
      • type: type of location (string: County/Island/LocalAdmin/Suburb/Town/Zip)
      • url: project url after domain (string)
    2. Top 4000 most backed projects ever on Kickstarter (most_backed.csv)

      • Last updated: 2016-10-30 10pm PDT
      • amt.pledged
      • blurb
      • by
      • category: project category (string)
      • currency
      • goal: original pledge goal (float)
      • location
      • num.backers: total number of backers (int)
      • num.backers.tier: number of backers corresponds to the pledge amount in pledge.tier (int[len(pledge.tier)])
      • pledge.tier: pledge tiers in USD (float[])
      • title
      • url

    See more at http://datapolymath.paperplane.io/

  17. Data from: Statistical Thinking in quality improvement: use, difficulties...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Carlos de Toledo; Fabiane Letícia Lizarelli; Adriana Barbosa dos Santos; Artur Ishizaka (2023). Statistical Thinking in quality improvement: use, difficulties and benefits of its implantation in industries of the Brazilian State of São Paulo [Dataset]. http://doi.org/10.6084/m9.figshare.7418576.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Jose Carlos de Toledo; Fabiane Letícia Lizarelli; Adriana Barbosa dos Santos; Artur Ishizaka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    State of São Paulo
    Description

    Abstract Paper aims Identify the use of statistical thinking and techniques and their main difficulties and benefits in the Brazilian State of São Paulo industries. Originality There are few empirical studies on the application of the statistical thinking and techniques which study their difficulties of implementation and their benefits in manufacturing companies. Research method A web survey of a sample of 243 manufacturing companies. Main findings The companies, in general, use some statistical principles and basic techniques for process control and improvement, however, companies that use principles and techniques more consistently have greater operational and team benefits. The main difficulties are associated to lack of culture and knowledge. Implications for theory and practice The statistical application enables effective processes improvements and it is associated with motivation for further improvements, consolidation of improvement programs and culture of quality. This finding suggests managerial implications such as to plan actions to deploy and disseminate the culture of statistical thinking in an evolutionary way, training and support for use and to overcome barriers.

  18. The relation between statistical power and inference in fMRI

    • plos.figshare.com
    qt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henk R. Cremers; Tor D. Wager; Tal Yarkoni (2023). The relation between statistical power and inference in fMRI [Dataset]. http://doi.org/10.1371/journal.pone.0184923
    Explore at:
    qtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Henk R. Cremers; Tor D. Wager; Tal Yarkoni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistically underpowered studies can result in experimental failure even when all other experimental considerations have been addressed impeccably. In fMRI the combination of a large number of dependent variables, a relatively small number of observations (subjects), and a need to correct for multiple comparisons can decrease statistical power dramatically. This problem has been clearly addressed yet remains controversial—especially in regards to the expected effect sizes in fMRI, and especially for between-subjects effects such as group comparisons and brain-behavior correlations. We aimed to clarify the power problem by considering and contrasting two simulated scenarios of such possible brain-behavior correlations: weak diffuse effects and strong localized effects. Sampling from these scenarios shows that, particularly in the weak diffuse scenario, common sample sizes (n = 20–30) display extremely low statistical power, poorly represent the actual effects in the full sample, and show large variation on subsequent replications. Empirical data from the Human Connectome Project resembles the weak diffuse scenario much more than the localized strong scenario, which underscores the extent of the power problem for many studies. Possible solutions to the power problem include increasing the sample size, using less stringent thresholds, or focusing on a region-of-interest. However, these approaches are not always feasible and some have major drawbacks. The most prominent solutions that may help address the power problem include model-based (multivariate) prediction methods and meta-analyses with related synthesis-oriented approaches.

  19. Law Enforcement Management and Administrative Statistics (LEMAS): 2000...

    • icpsr.umich.edu
    ascii, delimited, sas +2
    Updated Dec 8, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics (2008). Law Enforcement Management and Administrative Statistics (LEMAS): 2000 Sample Survey of Law Enforcement Agencies [Dataset]. http://doi.org/10.3886/ICPSR03565.v2
    Explore at:
    stata, spss, sas, ascii, delimitedAvailable download formats
    Dataset updated
    Dec 8, 2008
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/3565/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3565/terms

    Time period covered
    2000
    Area covered
    United States
    Description

    This survey, the sixth in the Bureau of Justice Statistics' program on Law Enforcement and Administrative Statistics (LEMAS), presents information on law enforcement agencies in the United States: state police, county police, special police (state and local), municipal police, and sheriff's departments. Variables include size of the population served by the police or sheriff's department, levels of employment and spending, various functions of the department, average salary levels for uniformed officers, policies and programs, and other matters related to management and personnel.This survey, the sixth in the Bureau of Justice Statistics' program on Law Enforcement and Administrative Statistics (LEMAS), presents information on law enforcement agencies in the United States: state police, county police, special police (state and local), municipal police, and sheriff's departments. Variables include size of the population served by the police or sheriff's department, levels of employment and spending, various functions of the department, average salary levels for uniformed officers, policies and programs, and other matters related to management and personnel.

  20. r

    Evaluation through follow-up - pupils born in 1953

    • researchdata.se
    Updated Aug 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kjell Härnqvist; Sven-Erik Reuterberg; Allan Svensson; Airi Rovio-Johansson (2024). Evaluation through follow-up - pupils born in 1953 [Dataset]. https://researchdata.se/en/catalogue/dataset/snd0480-2
    Explore at:
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    University of Gothenburg
    Authors
    Kjell Härnqvist; Sven-Erik Reuterberg; Allan Svensson; Airi Rovio-Johansson
    Time period covered
    1966 - 1973
    Area covered
    Sweden
    Description

    Since the beginning of the 1960s, Statistics Sweden, in collaboration with various research institutions, has carried out follow-up surveys in the school system. These surveys have taken place within the framework of the IS project (Individual Statistics Project) at the University of Gothenburg and the UGU project (Evaluation through follow-up of students) at the University of Teacher Education in Stockholm, which since 1990 have been merged into a research project called 'Evaluation through Follow-up'. The follow-up surveys are part of the central evaluation of the school and are based on large nationally representative samples from different cohorts of students.

    Evaluation through follow-up (UGU) is one of the country's largest research databases in the field of education. UGU is part of the central evaluation of the school and is based on large nationally representative samples from different cohorts of students. The longitudinal database contains information on nationally representative samples of school pupils from ten cohorts, born between 1948 and 2004. The sampling process was based on the student's birthday for the first two and on the school class for the other cohorts.

    For each cohort, data of mainly two types are collected. School administrative data is collected annually by Statistics Sweden during the time that pupils are in the general school system (primary and secondary school), for most cohorts starting in compulsory school year 3. This information is provided by the school offices and, among other things, includes characteristics of school, class, special support, study choices and grades. Information obtained has varied somewhat, e.g. due to changes in curricula. A more detailed description of this data collection can be found in reports published by Statistics Sweden and linked to datasets for each cohort.

    Survey data from the pupils is collected for the first time in compulsory school year 6 (for most cohorts). Questionnaire in survey in year 6 includes questions related to self-perception and interest in learning, attitudes to school, hobbies, school motivation and future plans. For some cohorts, questionnaire data are also collected in year 3 and year 9 in compulsory school and in upper secondary school.

    Furthermore, results from various intelligence tests and standartized knowledge tests are included in the data collection year 6. The intelligence tests have been identical for all cohorts (except cohort born in 1987 from which questionnaire data were first collected in year 9). The intelligence test consists of a verbal, a spatial and an inductive test, each containing 40 tasks and specially designed for the UGU project. The verbal test is a vocabulary test of the opposite type. The spatial test is a so-called ‘sheet metal folding test’ and the inductive test are made up of series of numbers. The reliability of the test, intercorrelations and connection with school grades are reported by Svensson (1971).

    For the first three cohorts (1948, 1953 and 1967), the standartized knowledge tests in year 6 consist of the standard tests in Swedish, mathematics and English that up to and including the beginning of the 1980s were offered to all pupils in compulsory school year 6. For the cohort 1972, specially prepared tests in reading and mathematics were used. The test in reading consists of 27 tasks and aimed to identify students with reading difficulties. The mathematics test, which was also offered for the fifth cohort, (1977) includes 19 assignments. After a changed version of the test, caused by the previously used test being judged to be somewhat too simple, has been used for the cohort born in 1982. Results on the mathematics test are not available for the 1987 cohort. The mathematics test was not offered to the students in the cohort in 1992, as the test did not seem to fully correspond with current curriculum intentions in mathematics. For further information, see the description of the dataset for each cohort.

    For several of the samples, questionnaires were also collected from the students 'parents and teachers in year 6. The teacher questionnaire contains questions about the teacher, class size and composition, the teacher's assessments of the class' knowledge level, etc., school resources, working methods and parental involvement and questions about the existence of evaluations. The questionnaire for the guardians includes questions about the child's upbringing conditions, ambitions and wishes regarding the child's education, views on the school's objectives and the parents' own educational and professional situation.

    The students are followed up even after they have left primary school. Among other things, data collection is done during the time they are in high school. Then school administrative data such as e.g. choice of upper secondary school line / program and grades after completing studies. For some of the cohorts, in addition to school administrative data, questionnaire data were also collected from the students.

    he sample consisted of students born on the 5th, 15th and 25th of any month in 1953, a total of 10,723 students.

    The data obtained in 1966 were: 1. School administrative data (school form, class type, year and grades). 2. Information about the parents' profession and education, number of siblings, the distance between home and school, etc.

    This information was collected for 93% of all born on the current days. The reason for this is reduced resources for Statistics Sweden for follow-up work - reminders etc. Annual data for cohorts in 1953 were collected by Statistics Sweden up to and including academic year 1972/73.

    1. Answers to certain questions that shed light on students' school motivation, leisure activities and study and career plans. Some of the questions changed significantly compared to the cohort in 1948 due to the fact that they did not function satisfactorily from a metrological point of view.
    2. Results on three aptitude tests, one verbal, one spatial and one inductive.
    3. Standard test results in reading, writing, mathematics and English, which were offered to the students who belonged to year 6.

    Response rate for test and questionnaire data is 88% Standard test results were received for just over 85% of those who took the tests.

    The sample included a total of 9955 students, for whom some form of information was obtained.

    Part of the "Individual Statistics Project" together with cohort 1953.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Southern Africa Labour and Development Research Unit (2020). Project for Statistics on Living Standards and Development 1993 - South Africa [Dataset]. https://microdata.fao.org/index.php/catalog/1527

Project for Statistics on Living Standards and Development 1993 - South Africa

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 20, 2020
Dataset authored and provided by
Southern Africa Labour and Development Research Unit
Time period covered
1993
Area covered
South Africa
Description

Abstract

The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.

Geographic coverage

National

Analysis unit

Households

Universe

All Household members. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.

Kind of data

Sample survey data [ssd]

Sampling procedure

(a) SAMPLING DESIGN

Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added.

(b) SAMPLE FRAME

The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.

Mode of data collection

Face-to-face [f2f]

Cleaning operations

All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.

These responses are coded in the data files with the following values: VALUE MEANING -1 : The data was not available on the questionnaire or form -2 : The field is not applicable -3 : Respondent refused to answer -4 : Respondent did not know answer to question

Data appraisal

The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.

Search
Clear search
Close search
Google apps
Main menu