100+ datasets found
  1. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  2. V

    Statistics review 2: Samples and populations

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Statistics review 2: Samples and populations [Dataset]. https://data.virginia.gov/dataset/statistics-review-2-samples-and-populations
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    The previous review in this series introduced the notion of data description and outlined some of the more common summary measures used to describe a dataset. However, a dataset is typically only of interest for the information it provides regarding the population from which it was drawn. The present review focuses on estimation of population values from a sample.

  3. f

    Project for Statistics on Living Standards and Development 1993 - South...

    • microdata.fao.org
    • catalog.ihsn.org
    • +2more
    Updated Oct 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southern Africa Labour and Development Research Unit (2020). Project for Statistics on Living Standards and Development 1993 - South Africa [Dataset]. https://microdata.fao.org/index.php/catalog/1527
    Explore at:
    Dataset updated
    Oct 20, 2020
    Dataset authored and provided by
    Southern Africa Labour and Development Research Unit
    Time period covered
    1993
    Area covered
    South Africa
    Description

    Abstract

    The Project for Statistics on Living standards and Development was a countrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.

    Geographic coverage

    National

    Analysis unit

    Households

    Universe

    All Household members. Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    (a) SAMPLING DESIGN

    Sample size is 9,000 households. The sample design adopted for the study was a two-stage self-weighting design in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households. The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained, and weights had to be added.

    (b) SAMPLE FRAME

    The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups. In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one. In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases, questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.

    These responses are coded in the data files with the following values: VALUE MEANING -1 : The data was not available on the questionnaire or form -2 : The field is not applicable -3 : Respondent refused to answer -4 : Respondent did not know answer to question

    Data appraisal

    The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.

  4. Confidence Interval Examples

    • figshare.com
    application/cdfv2
    Updated Jun 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Rollinson (2016). Confidence Interval Examples [Dataset]. http://doi.org/10.6084/m9.figshare.3466364.v2
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 28, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Emily Rollinson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.

  5. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Central Statistical Organization (CSO)
    Economic Research Forum
    Kurdistan Regional Statistics Office (KRSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  6. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  7. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  8. example 1 - time series - USD RUB 1 year data

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denis Andrikov (2024). example 1 - time series - USD RUB 1 year data [Dataset]. https://www.kaggle.com/datasets/denisandrikov/example-1-time-series-usd-rub-1-year-data
    Explore at:
    zip(675 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Denis Andrikov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A simple table time series for school probability and statistics. We have to learn how to investigate data: value via time. What we try to do: - mean: average is the sum of all values divided by the number of values. It is also sometimes referred to as mean. - median is the middle number, when in order. Mode is the most common number. Range is the largest number minus the smallest number. - standard deviation s a measure of how dispersed the data is in relation to the mean.

  9. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  10. U

    Example Investigator Collected Data for Students Learning Statistics...

    • dataverse-staging.rdmc.unc.edu
    tsv
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyra Christina Mehta; Cyra Christina Mehta; Renee' H. Moore; Renee' H. Moore (2022). Example Investigator Collected Data for Students Learning Statistics Collaboration Skills [Dataset]. http://doi.org/10.15139/S3/JKLBZF
    Explore at:
    tsv(2825)Available download formats
    Dataset updated
    May 5, 2022
    Dataset provided by
    UNC Dataverse
    Authors
    Cyra Christina Mehta; Cyra Christina Mehta; Renee' H. Moore; Renee' H. Moore
    License

    https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.15139/S3/JKLBZFhttps://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.15139/S3/JKLBZF

    Description

    This Excel file contains example data as would be provided by an investigator to a collaborative statistician to analyze. Data are a permuted and edited version of real data provided to the authors during a statistical collaboration. The data are presented as commonly collected by investigators prior to working with a statistician, including several tabs of data in different domains (Set1, Set2, Demographics), colored cells, merged cells, cells with more than one data type, etc. as well as incomplete data and two systems of ID numbers. The file also includes a tab to link the different ID systems as well as tabs that have a "cleaned" version of the data (REVISEDSet1, REVISEDSet2) that would typically be provided after quality control identified some issues with the data that were then resolved by the investigator.

  11. f

    Examples of descriptive statistics that can be gleaned from Tracker data...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Eugene Z.; Griffith, Leslie C.; Slawson, Justin B.; Vecsey, Christopher G.; Donelson, Nathan; Huber, Robert (2013). Examples of descriptive statistics that can be gleaned from Tracker data that could not be determined from standard beam cross data (Track CASK-β N = 30, Track Control N = 29). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001718438
    Explore at:
    Dataset updated
    Feb 20, 2013
    Authors
    Kim, Eugene Z.; Griffith, Leslie C.; Slawson, Justin B.; Vecsey, Christopher G.; Donelson, Nathan; Huber, Robert
    Description

    Examples of descriptive statistics that can be gleaned from Tracker data that could not be determined from standard beam cross data (Track CASK-β N = 30, Track Control N = 29).

  12. Demographic and Health Survey 1998 - Ghana

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jun 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghana Statistical Service (GSS) (2017). Demographic and Health Survey 1998 - Ghana [Dataset]. https://microdata.worldbank.org/index.php/catalog/1385
    Explore at:
    Dataset updated
    Jun 6, 2017
    Dataset provided by
    Ghana Statistical Services
    Authors
    Ghana Statistical Service (GSS)
    Time period covered
    1998 - 1999
    Area covered
    Ghana
    Description

    Abstract

    The 1998 Ghana Demographic and Health Survey (GDHS) is the latest in a series of national-level population and health surveys conducted in Ghana and it is part of the worldwide MEASURE DHS+ Project, designed to collect data on fertility, family planning, and maternal and child health.

    The primary objective of the 1998 GDHS is to provide current and reliable data on fertility and family planning behaviour, child mortality, children’s nutritional status, and the utilisation of maternal and child health services in Ghana. Additional data on knowledge of HIV/AIDS are also provided. This information is essential for informed policy decisions, planning and monitoring and evaluation of programmes at both the national and local government levels.

    The long-term objectives of the survey include strengthening the technical capacity of the Ghana Statistical Service (GSS) to plan, conduct, process, and analyse the results of complex national sample surveys. Moreover, the 1998 GDHS provides comparable data for long-term trend analyses within Ghana, since it is the third in a series of demographic and health surveys implemented by the same organisation, using similar data collection procedures. The GDHS also contributes to the ever-growing international database on demographic and health-related variables.

    Geographic coverage

    National

    Analysis unit

    • Household
    • Children under five years
    • Women age 15-49
    • Men age 15-59

    Kind of data

    Sample survey data

    Sampling procedure

    The major focus of the 1998 GDHS was to provide updated estimates of important population and health indicators including fertility and mortality rates for the country as a whole and for urban and rural areas separately. In addition, the sample was designed to provide estimates of key variables for the ten regions in the country.

    The list of Enumeration Areas (EAs) with population and household information from the 1984 Population Census was used as the sampling frame for the survey. The 1998 GDHS is based on a two-stage stratified nationally representative sample of households. At the first stage of sampling, 400 EAs were selected using systematic sampling with probability proportional to size (PPS-Method). The selected EAs comprised 138 in the urban areas and 262 in the rural areas. A complete household listing operation was then carried out in all the selected EAs to provide a sampling frame for the second stage selection of households. At the second stage of sampling, a systematic sample of 15 households per EA was selected in all regions, except in the Northern, Upper West and Upper East Regions. In order to obtain adequate numbers of households to provide reliable estimates of key demographic and health variables in these three regions, the number of households in each selected EA in the Northern, Upper West and Upper East regions was increased to 20. The sample was weighted to adjust for over sampling in the three northern regions (Northern, Upper East and Upper West), in relation to the other regions. Sample weights were used to compensate for the unequal probability of selection between geographically defined strata.

    The survey was designed to obtain completed interviews of 4,500 women age 15-49. In addition, all males age 15-59 in every third selected household were interviewed, to obtain a target of 1,500 men. In order to take cognisance of non-response, a total of 6,375 households nation-wide were selected.

    Note: See detailed description of sample design in APPENDIX A of the survey report.

    Mode of data collection

    Face-to-face

    Research instrument

    Three types of questionnaires were used in the GDHS: the Household Questionnaire, the Women’s Questionnaire, and the Men’s Questionnaire. These questionnaires were based on model survey instruments developed for the international MEASURE DHS+ programme and were designed to provide information needed by health and family planning programme managers and policy makers. The questionnaires were adapted to the situation in Ghana and a number of questions pertaining to on-going health and family planning programmes were added. These questionnaires were developed in English and translated into five major local languages (Akan, Ga, Ewe, Hausa, and Dagbani).

    The Household Questionnaire was used to enumerate all usual members and visitors in a selected household and to collect information on the socio-economic status of the household. The first part of the Household Questionnaire collected information on the relationship to the household head, residence, sex, age, marital status, and education of each usual resident or visitor. This information was used to identify women and men who were eligible for the individual interview. For this purpose, all women age 15-49, and all men age 15-59 in every third household, whether usual residents of a selected household or visitors who slept in a selected household the night before the interview, were deemed eligible and interviewed. The Household Questionnaire also provides basic demographic data for Ghanaian households. The second part of the Household Questionnaire contained questions on the dwelling unit, such as the number of rooms, the flooring material, the source of water and the type of toilet facilities, and on the ownership of a variety of consumer goods.

    The Women’s Questionnaire was used to collect information on the following topics: respondent’s background characteristics, reproductive history, contraceptive knowledge and use, antenatal, delivery and postnatal care, infant feeding practices, child immunisation and health, marriage, fertility preferences and attitudes about family planning, husband’s background characteristics, women’s work, knowledge of HIV/AIDS and STDs, as well as anthropometric measurements of children and mothers.

    The Men’s Questionnaire collected information on respondent’s background characteristics, reproduction, contraceptive knowledge and use, marriage, fertility preferences and attitudes about family planning, as well as knowledge of HIV/AIDS and STDs.

    Response rate

    A total of 6,375 households were selected for the GDHS sample. Of these, 6,055 were occupied. Interviews were completed for 6,003 households, which represent 99 percent of the occupied households. A total of 4,970 eligible women from these households and 1,596 eligible men from every third household were identified for the individual interviews. Interviews were successfully completed for 4,843 women or 97 percent and 1,546 men or 97 percent. The principal reason for nonresponse among individual women and men was the failure of interviewers to find them at home despite repeated callbacks.

    Note: See summarized response rates by place of residence in Table 1.1 of the survey report.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors, and (2) sampling errors. Nonsampling errors are the results of shortfalls made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 1998 GDHS to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 1998 GDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

    A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

    If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 1998 GDHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the 1998 GDHS is the ISSA Sampling Error Module. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

    Data appraisal

    Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Age distribution of eligible and interviewed men - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months

    Note: See detailed tables in APPENDIX C of the survey report.

  13. European Union Statistics on Income and Living Conditions 2013 -...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2019). European Union Statistics on Income and Living Conditions 2013 - Cross-Sectional User Database - Netherlands [Dataset]. https://catalog.ihsn.org/index.php/catalog/7684
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    Time period covered
    2013
    Area covered
    Netherlands
    Description

    Abstract

    In 2013, the EU-SILC instrument covered all EU Member States plus Iceland, Turkey, Norway, Switzerland and Croatia. EU-SILC has become the EU reference source for comparative statistics on income distribution and social exclusion at European level, particularly in the context of the "Program of Community action to encourage cooperation between Member States to combat social exclusion" and for producing structural indicators on social cohesion for the annual spring report to the European Council. The first priority is to be given to the delivery of comparable, timely and high quality cross-sectional data.

    There are two types of datasets: 1) Cross-sectional data pertaining to fixed time periods, with variables on income, poverty, social exclusion and living conditions. 2) Longitudinal data pertaining to individual-level changes over time, observed periodically - usually over four years.

    Social exclusion and housing-condition information is collected at household level. Income at a detailed component level is collected at personal level, with some components included in the "Household" section. Labor, education and health observations only apply to persons aged 16 and over. EU-SILC was established to provide data on structural indicators of social cohesion (at-risk-of-poverty rate, S80/S20 and gender pay gap) and to provide relevant data for the two 'open methods of coordination' in the field of social inclusion and pensions in Europe.

    This is the 1st version of the 2013 Cross-Sectional User Database as released in July 2015.

    Geographic coverage

    The survey covers following countries: Austria; Belgium; Bulgaria; Croatia; Cyprus; Czech Republic; Denmark; Estonia; Finland; France; Germany; Greece; Spain; Ireland; Italy; Latvia; Lithuania; Luxembourg; Hungary; Malta; Netherlands; Poland; Portugal; Romania; Slovenia; Slovakia; Serbia; Sweden; United Kingdom; Iceland; Norway; Turkey; Switzerland

    Small parts of the national territory amounting to no more than 2% of the national population and the national territories listed below may be excluded from EU-SILC: France - French Overseas Departments and territories; Netherlands - The West Frisian Islands with the exception of Texel; Ireland - All offshore islands with the exception of Achill, Bull, Cruit, Gorumna, Inishnee, Lettermore, Lettermullan and Valentia; United Kingdom - Scotland north of the Caledonian Canal, the Scilly Islands.

    Analysis unit

    • Households;
    • Individuals 16 years and older.

    Universe

    The survey covered all household members over 16 years old. Persons living in collective households and in institutions are generally excluded from the target population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    On the basis of various statistical and practical considerations and the precision requirements for the most critical variables, the minimum effective sample sizes to be achieved were defined. Sample size for the longitudinal component refers, for any pair of consecutive years, to the number of households successfully interviewed in the first year in which all or at least a majority of the household members aged 16 or over are successfully interviewed in both the years.

    For the cross-sectional component, the plans are to achieve the minimum effective sample size of around 131.000 households in the EU as a whole (137.000 including Iceland and Norway). The allocation of the EU sample among countries represents a compromise between two objectives: the production of results at the level of individual countries, and production for the EU as a whole. Requirements for the longitudinal data will be less important. For this component, an effective sample size of around 98.000 households (103.000 including Iceland and Norway) is planned.

    Member States using registers for income and other data may use a sample of persons (selected respondents) rather than a sample of complete households in the interview survey. The minimum effective sample size in terms of the number of persons aged 16 or over to be interviewed in detail is in this case taken as 75 % of the figures shown in columns 3 and 4 of the table I, for the cross-sectional and longitudinal components respectively.

    The reference is to the effective sample size, which is the size required if the survey were based on simple random sampling (design effect in relation to the 'risk of poverty rate' variable = 1.0). The actual sample sizes will have to be larger to the extent that the design effects exceed 1.0 and to compensate for all kinds of non-response. Furthermore, the sample size refers to the number of valid households which are households for which, and for all members of which, all or nearly all the required information has been obtained. For countries with a sample of persons design, information on income and other data shall be collected for the household of each selected respondent and for all its members.

    At the beginning, a cross-sectional representative sample of households is selected. It is divided into say 4 sub-samples, each by itself representative of the whole population and similar in structure to the whole sample. One sub-sample is purely cross-sectional and is not followed up after the first round. Respondents in the second sub-sample are requested to participate in the panel for 2 years, in the third sub-sample for 3 years, and in the fourth for 4 years. From year 2 onwards, one new panel is introduced each year, with request for participation for 4 years. In any one year, the sample consists of 4 sub-samples, which together constitute the cross-sectional sample. In year 1 they are all new samples; in all subsequent years, only one is new sample. In year 2, three are panels in the second year; in year 3, one is a panel in the second year and two in the third year; in subsequent years, one is a panel for the second year, one for the third year, and one for the fourth (final) year.

    According to the Commission Regulation on sampling and tracing rules, the selection of the sample will be drawn according to the following requirements:

    1. For all components of EU-SILC (whether survey or register based), the crosssectional and longitudinal (initial sample) data shall be based on a nationally representative probability sample of the population residing in private households within the country, irrespective of language, nationality or legal residence status. All private households and all persons aged 16 and over within the household are eligible for the operation.
    2. Representative probability samples shall be achieved both for households, which form the basic units of sampling, data collection and data analysis, and for individual persons in the target population.
    3. The sampling frame and methods of sample selection shall ensure that every individual and household in the target population is assigned a known and non-zero probability of selection.
    4. By way of exception, paragraphs 1 to 3 shall apply in Germany exclusively to the part of the sample based on probability sampling according to Article 8 of the Regulation of the European Parliament and of the Council (EC) No 1177/2003 concerning

    Community Statistics on Income and Living Conditions. Article 8 of the EU-SILC Regulation of the European Parliament and of the Council mentions: 1. The cross-sectional and longitudinal data shall be based on nationally representative probability samples. 2. By way of exception to paragraph 1, Germany shall supply cross-sectional data based on a nationally representative probability sample for the first time for the year 2008. For the year 2005, Germany shall supply data for one fourth based on probability sampling and for three fourths based on quota samples, the latter to be progressively replaced by random selection so as to achieve fully representative probability sampling by 2008. For the longitudinal component, Germany shall supply for the year 2006 one third of longitudinal data (data for year 2005 and 2006) based on probability sampling and two thirds based on quota samples. For the year 2007, half of the longitudinal data relating to years 2005, 2006 and 2007 shall be based on probability sampling and half on quota sample. After 2007 all of the longitudinal data shall be based on probability sampling.

    Detailed information about sampling is available in Quality Reports in Related Materials.

    Mode of data collection

    Mixed

  14. Normal and Skewed Example Data

    • figshare.com
    txt
    Updated Dec 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesus Rogel-Salazar (2021). Normal and Skewed Example Data [Dataset]. http://doi.org/10.6084/m9.figshare.17306285.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 21, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jesus Rogel-Salazar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data for normally distributed and skewed datasets.

  15. Z

    Data from: A 24-hour dynamic population distribution dataset based on mobile...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
    Explore at:
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    Elisa Corporation
    Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
    Unit of Urban Research and Statistics, City of Helsinki / Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
    Department of Built Environment, Aalto University / Centre for Advanced Spatial Analysis, University College London
    Authors
    Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Helsinki Metropolitan Area, Finland
    Description

    Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

    In this dataset:

    We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

    Please cite this dataset as:

    Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

    Organization of data

    The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

    HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

    HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

    HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

    target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

    Column names

    YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

    H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

    In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

    License Creative Commons Attribution 4.0 International.

    Related datasets

    Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

    Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564

  16. Data from: Data Fission: Splitting a Single Data Point

    • tandf.figshare.com
    txt
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Leiner; Boyan Duan; Larry Wasserman; Aaditya Ramdas (2023). Data Fission: Splitting a Single Data Point [Dataset]. http://doi.org/10.6084/m9.figshare.24328745.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    James Leiner; Boyan Duan; Larry Wasserman; Aaditya Ramdas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Suppose we observe a random vector X from some distribution in a known family with unknown parameters. We ask the following question: when is it possible to split X into two pieces f(X) and g(X) such that neither part is sufficient to reconstruct X by itself, but both together can recover X fully, and their joint distribution is tractable? One common solution to this problem when multiple samples of X are observed is data splitting, but Rasines and Young offers an alternative approach that uses additive Gaussian noise—this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this article, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on several prototypical applications, such as post-selection inference for trend filtering and other regression problems, and effect size estimation after interactive multiple testing. Supplementary materials for this article are available online.

  17. D

    Replication Data for: A Three-Year Mixed Methods Study of Undergraduates’...

    • dataverse.no
    • dataverse.azure.uit.no
    • +2more
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ellen Nierenberg; Ellen Nierenberg (2024). Replication Data for: A Three-Year Mixed Methods Study of Undergraduates’ Information Literacy Development: Knowing, Doing, and Feeling [Dataset]. http://doi.org/10.18710/SK0R1N
    Explore at:
    txt(21865), txt(19475), csv(55030), txt(14751), txt(26578), txt(16861), txt(28211), pdf(107685), pdf(657212), txt(12082), txt(16243), text/x-fixed-field(55030), pdf(65240), txt(8172), pdf(634629), txt(31896), application/x-spss-sav(51476), txt(4141), pdf(91121), application/x-spss-sav(31612), txt(35011), txt(23981), text/x-fixed-field(15653), txt(25369), txt(17935), csv(15653)Available download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    DataverseNO
    Authors
    Ellen Nierenberg; Ellen Nierenberg
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Aug 8, 2019 - Jun 10, 2022
    Area covered
    Norway
    Description

    This data set contains the replication data and supplements for the article "Knowing, Doing, and Feeling: A three-year, mixed-methods study of undergraduates’ information literacy development." The survey data is from two samples: - cross-sectional sample (different students at the same point in time) - longitudinal sample (the same students and different points in time)Surveys were distributed via Qualtrics during the students' first and sixth semesters. Quantitative and qualitative data were collected and used to describe students' IL development over 3 years. Statistics from the quantitative data were analyzed in SPSS. The qualitative data was coded and analyzed thematically in NVivo. The qualitative, textual data is from semi-structured interviews with sixth-semester students in psychology at UiT, both focus groups and individual interviews. All data were collected as part of the contact author's PhD research on information literacy (IL) at UiT. The following files are included in this data set: 1. A README file which explains the quantitative data files. (2 file formats: .txt, .pdf)2. The consent form for participants (in Norwegian). (2 file formats: .txt, .pdf)3. Six data files with survey results from UiT psychology undergraduate students for the cross-sectional (n=209) and longitudinal (n=56) samples, in 3 formats (.dat, .csv, .sav). The data was collected in Qualtrics from fall 2019 to fall 2022. 4. Interview guide for 3 focus group interviews. File format: .txt5. Interview guides for 7 individual interviews - first round (n=4) and second round (n=3). File format: .txt 6. The 21-item IL test (Tromsø Information Literacy Test = TILT), in English and Norwegian. TILT is used for assessing students' knowledge of three aspects of IL: evaluating sources, using sources, and seeking information. The test is multiple choice, with four alternative answers for each item. This test is a "KNOW-measure," intended to measure what students know about information literacy. (2 file formats: .txt, .pdf)7. Survey questions related to interest - specifically students' interest in being or becoming information literate - in 3 parts (all in English and Norwegian): a) information and questions about the 4 phases of interest; b) interest questionnaire with 26 items in 7 subscales (Tromsø Interest Questionnaire - TRIQ); c) Survey questions about IL and interest, need, and intent. (2 file formats: .txt, .pdf)8. Information about the assignment-based measures used to measure what students do in practice when evaluating and using sources. Students were evaluated with these measures in their first and sixth semesters. (2 file formats: .txt, .pdf)9. The Norwegain Centre for Research Data's (NSD) 2019 assessment of the notification form for personal data for the PhD research project. In Norwegian. (Format: .pdf)

  18. f

    Summary statistics for the study sample (raw data, not log transformed).

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 27, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pomeroy, Emma; Stock, Jay T.; Wells, Jonathan C. K.; O'Callaghan, Michael; Cole, Tim J. (2014). Summary statistics for the study sample (raw data, not log transformed). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001202647
    Explore at:
    Dataset updated
    Aug 27, 2014
    Authors
    Pomeroy, Emma; Stock, Jay T.; Wells, Jonathan C. K.; O'Callaghan, Michael; Cole, Tim J.
    Description

    a = 1 missing data point.b = 2 missing data points.c = 3 missing data points.Summary statistics for the study sample (raw data, not log transformed).

  19. w

    Fire statistics data tables

    • gov.uk
    • s3.amazonaws.com
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2025). Fire statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/fire-statistics-data-tables
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    GOV.UK
    Authors
    Ministry of Housing, Communities and Local Government
    Description

    On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.

    This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.

    MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/">Northern Ireland: Fire and Rescue Statistics.

    If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@communities.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Fire statistics guidance
    Fire statistics incident level datasets

    Incidents attended

    https://assets.publishing.service.gov.uk/media/68f0f810e8e4040c38a3cf96/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 143 KB) Previous FIRE0101 tables

    https://assets.publishing.service.gov.uk/media/68f0ffd528f6872f1663ef77/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 2.12 MB) Previous FIRE0102 tables

    https://assets.publishing.service.gov.uk/media/68f20a3e06e6515f7914c71c/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 197 KB) Previous FIRE0103 tables

    https://assets.publishing.service.gov.uk/media/68f20a552f0fc56403a3cfef/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 443 KB) Previous FIRE0104 tables

    Dwelling fires attended

    https://assets.publishing.service.gov.uk/media/68f100492f0fc56403a3cf94/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, 192 KB) Previous FIRE0201 tables

    <span class="gem

  20. An example data set to demonstrate the usage of M.o.R., a shiny app for...

    • data.nist.gov
    • datasets.ai
    • +2more
    Updated Jan 23, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Alexander Henn (2018). An example data set to demonstrate the usage of M.o.R., a shiny app for model-based metrology. [Dataset]. http://doi.org/10.18434/T4/1426859
    Explore at:
    Dataset updated
    Jan 23, 2018
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    Mark Alexander Henn
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    This data set consists of several files that were created to accompany M.o.R., a shiny app created by the Surface & Nanostructure Metrology Group in the Engineering Physics Division of the Physical Measurement Laboratory (PML) at the National Institute of Standards and Technology. It was created to simplify model-based metrology. A detailed explanation of the proper usage can be found in the M.o.R. documentation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Organization logo

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:
txtAvailable download formats
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Search
Clear search
Close search
Google apps
Main menu