100+ datasets found

Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
d
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monogan, Jamie (2023). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Monogan, Jamie
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Data Analysis in R
kaggle.com
zip
Updated May 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajdeep Kaur Bajwa (2022). Data Analysis in R [Dataset]. https://www.kaggle.com/datasets/rajdeepkaurbajwa/data-analysis-r
Explore at:
zip(5321 bytes)Available download formats
Dataset updated
May 16, 2022
Authors
Rajdeep Kaur Bajwa
Description
Dataset

This dataset was created by Rajdeep Kaur Bajwa

Contents
Statistical Data Analysis using R
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Barsanelli Costa (2023). Statistical Data Analysis using R [Dataset]. http://doi.org/10.6084/m9.figshare.5501035.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5501035.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Samuel Barsanelli Costa
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
R Scripts contain statistical data analisys for streamflow and sediment data, including Flow Duration Curves, Double Mass Analysis, Nonlinear Regression Analysis for Suspended Sediment Rating Curves, Stationarity Tests and include several plots.
d
Data from: R Manual for QCA
search.dataone.org
dataverse.harvard.edu
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mello, Patrick A. (2023). R Manual for QCA [Dataset]. http://doi.org/10.7910/DVN/KYF7VJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/KYF7VJ
Dataset updated
Nov 17, 2023
Dataset provided by
Harvard Dataverse
Authors
Mello, Patrick A.
Description
The R Manual for QCA entails a PDF file that describes all the steps and code needed to prepare and conduct a Qualitative Comparative Analysis (QCA) study in R. This is complemented by an R Script that can be customized as needed. The dataset further includes two files with sample data, for the set-theoretic analysis and the visualization of QCA results. The R Manual for QCA is the online appendix to "Qualitative Comparative Analysis: An Introduction to Research Design and Application", Georgetown University Press, 2021.
Basic R for Data Analysis
kaggle.com
zip
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kebba Ndure (2024). Basic R for Data Analysis [Dataset]. https://www.kaggle.com/datasets/kebbandure/basic-r-for-data-analysis/code
Explore at:
zip(279031 bytes)Available download formats
Dataset updated
Dec 8, 2024
Authors
Kebba Ndure
Description
ABOUT DATASET

This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.

Anyone new to R and wish to carry out some data analysis on R can check it out!
f
Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s004
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s004
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
R code dataset derivation centralized.
plos.figshare.com
txt
Updated Nov 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau (2024). R code dataset derivation centralized. [Dataset]. http://doi.org/10.1371/journal.pone.0312697.s011
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312697.s011
Dataset updated
Nov 14, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.
w
Dataset of books called An introduction to data analysis in R : hands-on...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called An introduction to data analysis in R : hands-on coding, data mining, visualization and statistics from scratch [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=An+introduction+to+data+analysis+in+R+%3A+hands-on+coding%2C+data+mining%2C+visualization+and+statistics+from+scratch
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is An introduction to data analysis in R : hands-on coding, data mining, visualization and statistics from scratch. It features 7 columns including author, publication date, language, and book publisher.
Data Analysis Project In R
kaggle.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jerraldo1705 (2023). Data Analysis Project In R [Dataset]. https://www.kaggle.com/datasets/jerraldo1705/data-analysis-project-in-r
Explore at:
zip(863273 bytes)Available download formats
Dataset updated
May 30, 2023
Authors
Jerraldo1705
Description
Dataset

This dataset was created by Jerraldo1705

Contents
R code generate analysis centralized.
plos.figshare.com
txt
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau (2024). R code generate analysis centralized. [Dataset]. http://doi.org/10.1371/journal.pone.0312697.s012
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312697.s012
Dataset updated
Nov 14, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.
D
R code for data analysis
researchdata.ntu.edu.sg
txt
Updated May 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ser Huay Janice Teresa Lee; Ser Huay Janice Teresa Lee (2019). R code for data analysis [Dataset]. http://doi.org/10.21979/N9/A0LK3I
Explore at:
txt(11667), txt(4812)Available download formats
Unique identifier
https://doi.org/10.21979/N9/A0LK3I
Dataset updated
May 2, 2019
Dataset provided by
DR-NTU (Data)
Authors
Ser Huay Janice Teresa Lee; Ser Huay Janice Teresa Lee
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset funded by
Ministry of Education (MOE)
Description
R code for running GLMM and BRT analysis
f
Data analysis: R code
datasetcatalog.nlm.nih.gov
figshare.com
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kullin, Brian; Hilton, Caroline; du Toit, Elloise; Bellairs, Gregory; Welp, Kirsten; Gardner-Lubbe, Sugnet; Chicken, Anika; Claassen-Weitz, Shantelle; Brink, Adrian; Livingstone, Hannah (2023). Data analysis: R code [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001471800
Explore at:
Dataset updated
Oct 9, 2023
Authors
Kullin, Brian; Hilton, Caroline; du Toit, Elloise; Bellairs, Gregory; Welp, Kirsten; Gardner-Lubbe, Sugnet; Chicken, Anika; Claassen-Weitz, Shantelle; Brink, Adrian; Livingstone, Hannah
Description
The complexity of contexts and varied purposes for which biome donation are requested is unknown in South Africa. The aim of this study was to provide strategic data towards actualisation of whether a gastrointestinal (GIT) stool donor bank may be established as a collaborative between Western Cape Blood Services (WCBS) and the University of Cape Town (UCT).We designed a cross-sectional, questionnaire-based survey to determine willingness of WCBS blood donors to donate stool specimens for microbiome biobanking. The prospective observational pilot study was conducted between 1 June 2022 and 1 July 2022 at three WCBS donation centres in Cape Town, South Africa. Anonymous blood donors who met the inclusion criteria were provided with infographics on stool donation and a stool collection kit. Anonymised demographic and interview data was aggregated for descriptive purposes, and for statistical analysis.Analysis of responses from 209/231 blood donors demonstrated in a logistic regression model that compensation (p = 3.139e-05) and ' societal benefit outweighs inconvenience’ beliefs (p = 7.751e-05) were covariates significantly associated with willingness to donate stool. Age was borderline significant at a 5% level (p = 0.0556). Most willing stool donors indicated that donating stool samples would not affect blood donations (140/157, 90%). Factors decreasing willingness to donate were stool collection being unpleasant or embarrassing.The survey provides strategic data for the WCBS and UCT towards establishment of a stool bank and provided an understanding of the underlying determinants governing participants decision process with regards to becoming potential donors.
f
Data analysis and plotting code in R Markdown.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Domberg, Andreas; Köymen, Bahar; Tomasello, Michael (2021). Data analysis and plotting code in R Markdown. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000772989
Explore at:
Dataset updated
Feb 5, 2021
Authors
Domberg, Andreas; Köymen, Bahar; Tomasello, Michael
Description
Reproduces all analyses and plots using the dataset in S3 File. (RMD)
Fitness Tracker Data Analysis with R
kaggle.com
zip
Updated Jun 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nargis Karimova (2022). Fitness Tracker Data Analysis with R [Dataset]. https://www.kaggle.com/datasets/nargiskarimova/fitness-tracker-data-analysis-with-r
Explore at:
zip(31712 bytes)Available download formats
Dataset updated
Jun 3, 2022
Authors
Nargis Karimova
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Nargis Karimova

Released under CC0: Public Domain

Contents
f
R code for data analysis.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhao, Mengchu; Li, Xiaoran; Song, Xiaomei; Hong, Xinlei; Zhou, Mi; Jin, Yanxuan; Zhao, Zikun; He, Haoning (2025). R code for data analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001301523
Explore at:
Dataset updated
Feb 19, 2025
Authors
Zhao, Mengchu; Li, Xiaoran; Song, Xiaomei; Hong, Xinlei; Zhou, Mi; Jin, Yanxuan; Zhao, Zikun; He, Haoning
Description
BackgroundPhysical activity (PA) is important for students in secondary school, however, trends in PA among secondary school students have shown a significant decline. There is a need to understand the PA of middle school students.ObjectiveThe first objective is to identify the PA levels and screen time of students in middle school. The second objective of the study is to examine the PA levels and screen time among students of different genders.MethodsParticipants from four consecutive two-year cycles of National Health and Nutrition Examination Survey (NHANES, 2011–2012, 2013–2014, 2015–2016, and 2017–2018) were included in this study. Spearman correlation model was used to identify the correlation between participants’ demographics, PA, and screen time data. Negative binomial regression model was used to describe students’ PA and screen time (Dependent variable) in different grades (Independent variables). Gender and Age were taken as control variables.ResultsAfter the data preprocessing, 2516 participants were included in this study. A significant correlation has been found between grade and PA, instead of screen time. Negative binomial regression shows that students have the lowest PA in their transition year grade 6, and their screen time decreased with the grade increased. Significant differences can be found across gender. Future efforts should focus on developing school transition support programs designed to improve PA.
Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx
frontiersin.figshare.com
docx
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ethan M. Meyers (2024). Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx [Dataset]. http://doi.org/10.3389/fninf.2023.1275903.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fninf.2023.1275903.s001
Dataset updated
Jan 3, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Ethan M. Meyers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.
Crime Data Analysis
kaggle.com
Updated Aug 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Candace Gostinski (2024). Crime Data Analysis [Dataset]. https://www.kaggle.com/datasets/candacegostinski/crime-data-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Candace Gostinski
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
In a world of increasing crime, many organizations are interested in examining incident details to learn from and prevent future crime. Our client, based in Los Angeles County, was interested in this exact thing. They asked us to examine the data to answer several questions; among them, what was the rate of increase or decrease in crime from 2020 to 2023, and which ethnicity or group of people were targeted the most.

Our data was collected from Kaggle.com at the following link:

https://www.kaggle.com/datasets/nathaniellybrand/los-angeles-crime-dataset-2020-present

It was cleaned, examined for further errors, and the analysis performed using RStudio. The results of this analysis are in the attached PDF entitled: "crime_data_analysis_report." Please feel free to review the results as well as follow along with the dataset on your own machine.
Friends - R Package Dataset
kaggle.com
zip
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Yukio Imafuko (2024). Friends - R Package Dataset [Dataset]. https://www.kaggle.com/datasets/lucasyukioimafuko/friends-r-package-dataset
Explore at:
zip(2018791 bytes)Available download formats
Dataset updated
Nov 11, 2024
Authors
Lucas Yukio Imafuko
Description
The whole data and source can be found at https://emilhvitfeldt.github.io/friends/

"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."

Content

friends.csv - Contains the scenes and lines for each character, including season and episodes.

friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.

friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.

Uses

Text mining, sentiment analysis and word statistics.

Data visualizations.
d
Health and Retirement Study (HRS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ELEKOY
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

Facebook

Twitter

Click to copy link

Link copied

Cite

Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24728073.v1

Dataset updated

Dec 4, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Kingsley Okoye; Samira Hosseini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

Clear search

Close search

Google apps

Main menu

Collection of example datasets used for the book - R Programming -...

Political Analysis Using R: Example Code and Data, Plus Data for Practice...

Data Analysis in R

Dataset

Contents

Statistical Data Analysis using R

Data from: R Manual for QCA

Basic R for Data Analysis

Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene...

R code dataset derivation centralized.

Dataset of books called An introduction to data analysis in R : hands-on...

Data Analysis Project In R

Dataset

Contents

R code generate analysis centralized.

R code for data analysis

Data analysis: R code

Data analysis and plotting code in R Markdown.

Fitness Tracker Data Analysis with R

Dataset

Contents

R code for data analysis.

Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx

Crime Data Analysis

Friends - R Package Dataset

Content

Uses

Health and Retirement Study (HRS)

Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research