100+ datasets found

Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Examples datasets for Microbiology (statistics)
zenodo.org
zip
Updated Jan 21, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tristan Cordier; Inès Barrenechea; Tristan Cordier; Inès Barrenechea (2020). Examples datasets for Microbiology (statistics) [Dataset]. http://doi.org/10.5281/zenodo.2620409
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2620409
Dataset updated
Jan 21, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tristan Cordier; Inès Barrenechea; Tristan Cordier; Inès Barrenechea
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Three examples dataset to perform biostatistics analysis.
Normal and Skewed Example Data
figshare.com
txt
Updated Dec 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesus Rogel-Salazar (2021). Normal and Skewed Example Data [Dataset]. http://doi.org/10.6084/m9.figshare.17306285.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17306285.v1
Dataset updated
Dec 21, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jesus Rogel-Salazar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data for normally distributed and skewed datasets.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Pre and Post-Exercise Heart Rate Analysis
kaggle.com
zip
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah M Almutairi (2024). Pre and Post-Exercise Heart Rate Analysis [Dataset]. https://www.kaggle.com/datasets/abdullahmalmutairi/pre-and-post-exercise-heart-rate-analysis
Explore at:
zip(3857 bytes)Available download formats
Dataset updated
Sep 29, 2024
Authors
Abdullah M Almutairi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview:

This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.

The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.

File: Filename: heart_rate_data.csv File Format: CSV

- Features (Columns):

Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.

Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.

Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.

Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.

Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.

Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.

Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.

Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.

Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.

Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.

License: Choose an appropriate open license, such as:

CC BY 4.0 (Attribution 4.0 International).

Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?

Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
example 1 - time series - USD RUB 1 year data
kaggle.com
zip
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Denis Andrikov (2024). example 1 - time series - USD RUB 1 year data [Dataset]. https://www.kaggle.com/datasets/denisandrikov/example-1-time-series-usd-rub-1-year-data
Explore at:
zip(675 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
Denis Andrikov
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A simple table time series for school probability and statistics. We have to learn how to investigate data: value via time. What we try to do: - mean: average is the sum of all values divided by the number of values. It is also sometimes referred to as mean. - median is the middle number, when in order. Mode is the most common number. Range is the largest number minus the smallest number. - standard deviation s a measure of how dispersed the data is in relation to the mean.
Environmental data associated to particular health events example dataset
data.europa.eu
unknown
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Environmental data associated to particular health events example dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5823426?locale=el
Explore at:
unknown(6689542)Available download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set is a collection of environmental records associated with the individual events. The data set has been generated using the serdif-api wrapper (https://github.com/navarral/serdif-api) when sending a CSV file with example events for the Republic of Ireland. The serdif-api send a semantic query that (i) selects the environmental data sets within the region of the event, (ii) filters by the specific period of interest from the event, (iii) aggregates the data sets using the minimum, maximum, average or sum for each of the available variables for a specific time unit. The aggregation method and the time unit can be passed to the serdif-api through the Command Line Interface (CLI) (see example in https://github.com/navarral/serdif-api). The resulting data set format can be also specified as data table (CSV) or as graph (RDF) for analysis and publication as FAIR data. The open-ready data for research is retrieved as a zip file that contains: (i) data as csv: environmental data associated to particular events as a data table (ii) data as rdf: environmental data associated to particular events as a graph (iii) metadata for publication as rdf: metadata record with generalized information about the data that do not contain personal data as a graph; therefore, publishable. (iv) metadata for research as rdf: metadata records with detailed information about the data, such as individual dates, regions, data sets used and data lineage; which could lead to data privacy issues if published without approval from the Data Protection Officer (DPO) and data controller.
Environmental data associated to particular health events example dataset
data.europa.eu
zenodo.org
unknown
Updated Mar 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2023). Environmental data associated to particular health events example dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6817101?locale=bg
Explore at:
unknown(162)Available download formats
Dataset updated
Mar 6, 2023
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data represents and example output for environmental data (i.e. climate and pollution) linked with individual events through location and time. The linkage is the result of a semantic query that integrates environmental data within an area relevant to the event and selects a period of data before the event. The resulting event-environmental linked data contains: The data for analysis as a data table (.csv) and graph (.ttl) The metadata describing the linkage process and the data (.csv and .ttl) The interactive report to explore the (meta)data (.html) The graph files are ready to be shared and published as Findable, Accessible, Interoperable and Reusable (FAIR) data, including the necessary information to be reused by other researchers in different contexts.
N
Klemme, IA Population Breakdown by Gender and Age Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Klemme, IA Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1ea6c9b-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Iowa, Klemme
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Klemme by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Klemme. The dataset can be utilized to understand the population distribution of Klemme by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Klemme. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Klemme.

Key observations

Largest age group (population): Male # 15-19 years (44) | Female # 10-14 years (29). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Klemme population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Klemme is shown in the following column.

Population (Female): The female population in the Klemme is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Klemme for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Klemme Population by Gender. You can refer the same here
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Z
Data from: A 24-hour dynamic population distribution dataset based on mobile...
data.niaid.nih.gov
zenodo.org
Updated Feb 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
Explore at:
Dataset updated
Feb 16, 2022
Dataset provided by
Elisa Corporation
Department of Built Environment, Aalto University / Centre for Advanced Spatial Analysis, University College London
Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
Unit of Urban Research and Statistics, City of Helsinki / Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
Authors
Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki Metropolitan Area, Finland
Description
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

In this dataset:

We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

Please cite this dataset as:

Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

Organization of data

The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

Column names

YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

License Creative Commons Attribution 4.0 International.

Related datasets

Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
N
Gratis, OH Population Breakdown by Gender and Age
neilsberg.com
csv, json
Updated Sep 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Gratis, OH Population Breakdown by Gender and Age [Dataset]. https://www.neilsberg.com/research/datasets/66af7481-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
json, csvAvailable download formats
Dataset updated
Sep 14, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Gratis, Ohio
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Gratis by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Gratis. The dataset can be utilized to understand the population distribution of Gratis by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Gratis. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Gratis.

Key observations

Largest age group (population): Male # 0-4 years (74) | Female # 25-29 years (74). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Gratis population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Gratis is shown in the following column.

Population (Female): The female population in the Gratis is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Gratis for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Gratis Population by Gender. You can refer the same here
N
Malta, OH Population Breakdown by Gender and Age
neilsberg.com
csv, json
Updated Sep 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Malta, OH Population Breakdown by Gender and Age [Dataset]. https://www.neilsberg.com/research/datasets/6703b61d-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Sep 14, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ohio, Malta
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Malta by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Malta. The dataset can be utilized to understand the population distribution of Malta by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Malta. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Malta.

Key observations

Largest age group (population): Male # 35-39 years (52) | Female # 25-29 years (62). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Malta population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Malta is shown in the following column.

Population (Female): The female population in the Malta is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Malta for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Malta Population by Gender. You can refer the same here
Toy Dataset
kaggle.com
zip
Updated Dec 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlo Lepelaars (2018). Toy Dataset [Dataset]. https://www.kaggle.com/datasets/carlolepelaars/toy-dataset
Explore at:
zip(1184308 bytes)Available download formats
Dataset updated
Dec 10, 2018
Authors
Carlo Lepelaars
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

A fictional dataset for exploratory data analysis (EDA) and to test simple prediction models.

This toy dataset features 150000 rows and 6 columns.

Columns

Note: All data is fictional. The data has been generated so that their distributions are convenient for statistical analysis.

Number: A simple index number for each row

City: The location of a person (Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego and Austin)

Gender: Gender of a person (Male or Female)

Age: The age of a person (Ranging from 25 to 65 years)

Income: Annual income of a person (Ranging from -674 to 177175)

Illness: Is the person Ill? (Yes or No)

Acknowledgements

Stock photo by Mika Baumeister on Unsplash.
Mortality Statistics in US Cities
kaggle.com
zip
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mortality Statistics in US Cities [Dataset]. https://www.kaggle.com/datasets/thedevastator/mortality-statistics-in-us-cities
Explore at:
zip(96624 bytes)Available download formats
Dataset updated
Jan 23, 2023
Authors
The Devastator
Area covered
United States
Description
Mortality Statistics in US Cities

Deaths by Age and Cause of Death in 2016

By Health [source]

About this dataset

This dataset contains mortality statistics for 122 U.S. cities in 2016, providing detailed information about all deaths that occurred due to any cause, including pneumonia and influenza. The data is voluntarily reported from cities with populations of 100,000 or more, and it includes the place of death and the week during which the death certificate was filed. Data is provided broken down by age group and includes a flag indicating the reliability of each data set to help inform analysis. Each row also provides longitude and latitude information for each reporting area in order to make further analysis easier. These comprehensive mortality statistics are invaluable resources for tracking disease trends as well as making comparisons between different areas across the country in order to identify public health risks quickly and effectively

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains mortality rates for 122 U.S. cities in 2016, including deaths by age group and cause of death. The data can be used to study various trends in mortality and contribute to the understanding of how different diseases impact different age groups across the country.

In order to use the data, firstly one has to identify which variables they would like to use from this dataset. These include: reporting area; MMWR week; All causes by age greater than 65 years; All causes by age 45-64 years; All causes by age 25-44 years; All causes by age 1-24 years; All causes less than 1 year old; Pneumonia and Influenza total fatalities; Location (1 & 2); flag indicating reliability of data.

Once you have identified the variables that you are interested in,you will need to filter the dataset so that it only includes relevant information for your analysis or research purposes. For example, if you are looking at trends between different ages, then all you would need is information on those 3 specific cause groups (greater than 65, 45-64 and 25-44). You can do this using a selection tool that allows you to pick only certain columns from your data set or an excel filter tool if your data is stored as a csv file type .

Next step is preparing your data - it’s important for efficient analysis also helpful when there are too many variables/columns which can confuse our analysis process – eliminate unnecessary columns, rename column labels where needed etc ... In addition , make sure we clean up any missing values / outliers / incorrect entries before further investigation .Remember , outliers or corrupt entries may lead us into incorrect conclusions upon analyzing our set ! Once we complete the cleaning steps , now its safe enough transit into drawing insights !

The last step involves using statistical methods such as linear regression with multiple predictors or descriptive statistical measures such as mean/median etc ..to draw key insights based on analysis done so far and generate some actionable points !

With these steps taken care off , now its easier for anyone who decides dive into another project involving this particular dataset with added advantage formulated out of existing work done over our previous investigations!

Research Ideas

Creating population health profiles for cities in the U.S.

Tracking public health trends across different age groups

Analyzing correlations between mortality and geographical locations

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: rows.csv | Column name | Description | |:--------------------------------------------|:-----------------------------------...
p
Representative synthetic dataset of Luxembourg’s citizens
data.public.lu
csv
Updated Dec 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luxembourg National Data Service (2023). Representative synthetic dataset of Luxembourg’s citizens [Dataset]. https://data.public.lu/en/datasets/representative-synthetic-dataset-of-luxembourgs-citizens/
Explore at:
csv(10936553), csv(108540)Available download formats
Dataset updated
Dec 1, 2023
Dataset authored and provided by
Luxembourg National Data Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Luxembourg
Description
The dataset has been created by using the open-source code released by LNDS (Luxembourg National Data Service). It is meant to be an example of the dataset structure anyone can generate and personalize in terms of some fixed parameter, including the sample size. The file format is .csv, and the data are organized by individual profiles on the rows and their personal features on the columns. The information in the dataset has been generated based on the statistical information about the age-structure distribution, the number of populations over municipalities, the number of different nationalities present in Luxembourg, and salary statistics per municipality. The STATEC platform, the statistics portal of Luxembourg, is the public source we used to gather the real information that we ingested into our synthetic generation model. Other features like Date of birth, Social matricule, First name, Surname, Ethnicity, and physical attributes have been obtained by a logical relationship between variables without exploiting any additional real information. We are in compliance with the law in putting close to zero the risk of identifying a real person completely by chance.
N
Miami-Dade County, FL Population Breakdown by Gender and Age Dataset: Male...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Miami-Dade County, FL Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e1f11dea-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Florida, Miami-Dade County
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Miami-Dade County by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Miami-Dade County. The dataset can be utilized to understand the population distribution of Miami-Dade County by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Miami-Dade County. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Miami-Dade County.

Key observations

Largest age group (population): Male # 30-34 years (98,361) | Female # 55-59 years (98,343). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the Miami-Dade County population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the Miami-Dade County is shown in the following column.

Population (Female): The female population in the Miami-Dade County is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in Miami-Dade County for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Miami-Dade County Population by Gender. You can refer the same here
All Seaborn Built-in Datasets 📊✨
kaggle.com
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets 📊✨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets
Explore at:
zip(1383218 bytes)Available download formats
Dataset updated
Aug 27, 2024
Authors
Abdelrahman Mohamed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

Included Datasets:

Anagrams: Analysis of word anagram patterns.

Anscombe: Anscombe's quartet demonstrating the importance of data visualization.

Attention: Data on attention span variations in different scenarios.

Brain Networks: Connectivity data within brain networks.

Car Crashes: US car crash statistics.

Diamonds: Data on diamond properties including price, cut, and clarity.

Dots: Randomly generated data for scatter plot visualization.

Dow Jones: Historical records of the Dow Jones Industrial Average.

Exercise: The relationship between exercise and health metrics.

Flights: Monthly passenger numbers on flights.

FMRI: Functional MRI data capturing brain activity.

Geyser: Eruption times of the Old Faithful geyser.

Glue: Strength of glue under different conditions.

Health Expenditure: Health expenditure statistics across countries.

Iris: Famous dataset for classifying Iris species.

MPG: Miles per gallon for various vehicles.

Penguins: Data on penguin species and their features.

Planets: Characteristics of discovered exoplanets.

Sea Ice: Measurements of sea ice extent.

Taxis: Taxi trips data in a city.

Tips: Tipping data collected from a restaurant.

Titanic: Survival data from the Titanic disaster.

This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.
Labour Market Statistics Statistical Bulletin time series dataset - Dataset...
ckan.publishing.service.gov.uk
Updated Aug 30, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2013). Labour Market Statistics Statistical Bulletin time series dataset - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/labour-market-statistics-statisticial-bulletin
Explore at:
Dataset updated
Aug 30, 2013
Dataset provided by
CKANhttps://ckan.org/
Description
This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24728073.v1

Dataset updated

Dec 4, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Kingsley Okoye; Samira Hosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Clear search

Close search

Google apps

Main menu

Collection of example datasets used for the book - R Programming -...

Examples datasets for Microbiology (statistics)

Normal and Skewed Example Data

Dataset of development of business during the COVID-19 crisis

UC_vs_US Statistic Analysis.xlsx

Pre and Post-Exercise Heart Rate Analysis

example 1 - time series - USD RUB 1 year data

Environmental data associated to particular health events example dataset

Environmental data associated to particular health events example dataset

Klemme, IA Population Breakdown by Gender and Age Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Current Population Survey (CPS)

Data from: A 24-hour dynamic population distribution dataset based on mobile...

Gratis, OH Population Breakdown by Gender and Age

About this dataset

Content

Inspiration

Recommended for further research

Malta, OH Population Breakdown by Gender and Age

About this dataset

Content

Inspiration

Recommended for further research

Toy Dataset

Context

Columns

Acknowledgements

Mortality Statistics in US Cities

Mortality Statistics in US Cities

Deaths by Age and Cause of Death in 2016

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Representative synthetic dataset of Luxembourg’s citizens

Miami-Dade County, FL Population Breakdown by Gender and Age Dataset: Male...

About this dataset

Content

Inspiration

Recommended for further research

All Seaborn Built-in Datasets 📊✨

Labour Market Statistics Statistical Bulletin time series dataset - Dataset...

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research