65 datasets found
  1. p

    data_neo.Rdata

    • psycharchives.org
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). data_neo.Rdata [Dataset]. https://psycharchives.org/handle/20.500.12034/4717
    Explore at:
    Dataset updated
    Dec 20, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R is a very powerful language for statistical computing in many disciplines of research and has a steep learning curve. The software is open source, freely available and has a thriving community. This crash course provides an overview of Base-R concepts for beginners and covers the topics 1) introduction into R, 2) reading, saving, and viewing data, 3) selecting and changing objects in R, and 4) descriptive statistics.This course was held by Lisa Spitzer on September 3, 2021, as a precursor to the R tidyverse Workshop by Aurélien Ginolhac and Roland Krause (September 8 - 10, 2021). This entry features the slides, exercises/results, and chat messages of the crash course. Related to this entry are the recordings of the course, and the r tidyverse workshop materials. Click on "related PsychArchives objects" to view or download the recordings of the workshop.:

  2. DataCamp Courses

    • kaggle.com
    zip
    Updated Feb 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoffer Karlsson (2020). DataCamp Courses [Dataset]. https://www.kaggle.com/datasets/christoffer/datacamp-courses/data
    Explore at:
    zip(17307 bytes)Available download formats
    Dataset updated
    Feb 16, 2020
    Authors
    Christoffer Karlsson
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    DataCamp is one of the leading online course providers for Data Science.
    They have more than 300 course using R, Python and other technologies which made me wonder if it would be possible to identify current trends by looking at their course descriptions.

    Content

    The course data was scraped from DataCamp's site, extracting course titles and descriptions, and which technology the course aims to teach.

    Acknowledgements

    The scraping was an absolute pleasure thanks to the excellent rvest R package!

    Inspiration

    Some ideas:

    • Can you identify the most popular techniques?
    • Can you find out which tasks are more popular to do in R or Python?
  3. q

    Data from: A Customizable Inquiry-Based Statistics Teaching Application for...

    • qubeshub.org
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikus Abolins-Abols*; Natalie Christian; Jeffery Masters; Rachel Pigg (2024). A Customizable Inquiry-Based Statistics Teaching Application for Introductory Biology Students [Dataset]. https://qubeshub.org/publications/4651/?v=1
    Explore at:
    Dataset updated
    Apr 5, 2024
    Dataset provided by
    QUBES
    Authors
    Mikus Abolins-Abols*; Natalie Christian; Jeffery Masters; Rachel Pigg
    Description

    Building strong quantitative skills prepares undergraduate biology students for successful careers in science and medicine. While math and statistics anxiety can negatively impact student learning within biology classrooms, instructors may reduce this anxiety by steadily building student competency in quantitative reasoning through instructional scaffolding, application-based approaches, and simple computer program interfaces. However, few statistical programs exist that meet all needs of an inclusive, inquiry-based laboratory course. These needs include an open-source program, a simple interface, little required background knowledge in statistics for student users, and customizability to minimize cognitive load, align with course learning outcomes, and create desirable difficulty. To address these needs, we used the Shiny package in R to develop a custom statistical analysis application. Our “BioStats” app provides students with scaffolded learning experiences in applied statistics that promotes student agency and is customizable by the instructor. It introduces students to the strengths of the R interface, while eliminating the need for complex coding in the R programming language. It also prioritizes practical implementation of statistical analyses over learning statistical theory. To our knowledge, this is the first statistics teaching tool where students are presented basic statistics initially, more complex analyses as they advance, and includes an option to learn R statistical coding. The BioStats app interface yields a simplified introduction to applied statistics that is adaptable to many biology laboratory courses.

    Primary Image: Singing Junco. A sketch of a junco singing on a pine tree branch, created by the lead author of this paper.

  4. o

    University SET data, with faculty and courses characteristics

    • openicpsr.org
    Updated Sep 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
    Explore at:
    Dataset updated
    Sep 12, 2021
    Authors
    Under blind review in refereed journal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

  5. d

    Data for: Integrating open education practices with data analysis of open...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marja Bakermans (2024). Data for: Integrating open education practices with data analysis of open science in an undergraduate course [Dataset]. http://doi.org/10.5061/dryad.37pvmcvst
    Explore at:
    Dataset updated
    Jul 27, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Marja Bakermans
    Description

    The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a..., Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of ..., , # Data for: Integrating open education practices with data analysis of open science in an undergraduate course

    Author: Marja H Bakermans Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA ORCID: https://orcid.org/0000-0002-4879-7771 Institutional IRB approval: IRB-24–0314

    Data and file overview

    The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files. Below are descriptions of the name and contents of each file. NA = not applicable or no data available

    1. BestPracticesData.csv
      • Description: Data to assess the adherence of articles and datasets to open science best practices.
      • Column headers and descriptions:
        • Article: articles used in the study, numbered randomly
        • F1: Findable, Data are assigned a unique and persistent doi
        • F2: Findable, Metadata includes an identifier of data
        • F3: Findable, Data are registered in a searchable database
        • A1: ...
  6. w

    Dataset of books called A first course in statistical programming with R

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called A first course in statistical programming with R [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=A+first+course+in+statistical+programming+with+R
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 3 rows and is filtered where the book is A first course in statistical programming with R. It features 7 columns including author, publication date, language, and book publisher.

  7. t

    Manipulating data using R

    • test.researchdata.tuwien.at
    bin, pdf, txt
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vseslav Levchenko; Vseslav Levchenko; Vseslav Levchenko; Vseslav Levchenko (2024). Manipulating data using R [Dataset]. http://doi.org/10.70124/5rrjk-ey181
    Explore at:
    bin, pdf, txtAvailable download formats
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    TU Wien
    Authors
    Vseslav Levchenko; Vseslav Levchenko; Vseslav Levchenko; Vseslav Levchenko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 30, 2023
    Description

    Data created during Computer Statistics assignment

    Context and methodology

    • This is used for the project in the context of the "Introduction to Research Data Management" course, 2024 winter semester. Originally it was made for a homework assignment in the "Computer Statistics" course, 2023 winter semester.
    • The dataset consists of the following: code (and comment) written in the R markdown language that is to be compiled and executed in order to generate the 2 datasets created in the project; .pdf file generated from compiling and executing the aforementioned R code using RStudio; .txt file generated as part of one of the exercises in the assignment, also by compiling and executing the R code.
    • The code was written by Vseslav Levchenko in R, using RStudio.

    Technical details

    • The code was written in RStudio and it is recommended to use it when working with R, however it is not strictly necessary. However, it is required to install the R language itself. For the other files, standard software like Microsoft Excel and any PDF reader are all that is needed.
    • The code also contains necessary comments, and a .pdf file with the assignment's tasks is provided separately.
  8. d

    Code from: Beyond the classroom: Alicia’s multivariate journey

    • search.dataone.org
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold (2025). Code from: Beyond the classroom: Alicia’s multivariate journey [Dataset]. http://doi.org/10.5061/dryad.c59zw3rg6
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Allison Theobold
    Description

    The importance of data science skills for modern scientific research cannot be understated. Although policy documents increasingly recommend what skills should be included in undergraduate statistics and data science curricula, little is known about how students actually develop and apply these skills. This paper addresses this gap through an in-depth case study tracing one student’s learning progressions throughout her master’s program. Using a qualitative method to analyze student code, which has seen little use in statistics education research, I examined how Alicia transferred the data science skills from her applied statistics course into authentic research settings. The analysis shows that, while Alicia successfully navigated new challenges, she encountered persistent hurdles when extending bivariate techniques into multivariate contexts, particularly with visualizations and summary statistics. These findings highlight the obs..., R Script files submitted by Alicia (pseudonym) over the course of the study. The files are named according to when they were submitted:

    December 2018

    R Script #1

    April 2019

    R Script #1 (revised) R Script #2

    September 2019

    R Script #1 (revised) R Script #2 (revised)

    Qualitative Data Analysis Files (Rich text files)

    December 2018 Script #1 April 2019 Script #1 April 2019 Script #2 September 2019 Script #1 September 2019 Script #2

    Quantitative Data Analysis Files

    r-code-themes.csv

    Comma separated values file with separate sheets for each R script Each sheet contains the qualitative code assigned to each line of code and whether the code contained errors.

    , , # Code from: Beyond the classroom: Alicia’s multivariate journey

    https://doi.org/10.5061/dryad.c59zw3rg6

    This repository contains the R script files submitted by Alicia (pseudonym) throughout this study, files associated with the qualitative analysis of the code, and files associated with visualizations of the qualitative themes included in Alicia's code.

    Description of the data and file structure

    As this is a qualitative analysis, the usage of these "data" files differs from a typical quantitative analysis.

    • The .R Files contain the scripts generated by Alicia at each time point (December 2018, April 2019, September 2019)
    • The -codes.rft Files contain the (qualitative) process codes for each R script
    • The r-code-themes.xlsx The file contains information on every script and the qualitative code assigned to each line of code.

    Code/Software

    While the "data" for this analysis are R scripts, these scripts cannot be execu...,

  9. Geospatial Deep Learning Seminar Online Course

    • ckan.americaview.org
    Updated Nov 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.americaview.org (2021). Geospatial Deep Learning Seminar Online Course [Dataset]. https://ckan.americaview.org/dataset/geospatial-deep-learning-seminar-online-course
    Explore at:
    Dataset updated
    Nov 2, 2021
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This seminar is an applied study of deep learning methods for extracting information from geospatial data, such as aerial imagery, multispectral imagery, digital terrain data, and other digital cartographic representations. We first provide an introduction and conceptualization of artificial neural networks (ANNs). Next, we explore appropriate loss and assessment metrics for different use cases followed by the tensor data model, which is central to applying deep learning methods. Convolutional neural networks (CNNs) are then conceptualized with scene classification use cases. Lastly, we explore semantic segmentation, object detection, and instance segmentation. The primary focus of this course is semantic segmenation for pixel-level classification. The associated GitHub repo provides a series of applied examples. We hope to continue to add examples as methods and technologies further develop. These examples make use of a vareity of datasets (e.g., SAT-6, topoDL, Inria, LandCover.ai, vfillDL, and wvlcDL). Please see the repo for links to the data and associated papers. All examples have associated videos that walk through the process, which are also linked to the repo. A variety of deep learning architectures are explored including UNet, UNet++, DeepLabv3+, and Mask R-CNN. Currenlty, two examples use ArcGIS Pro and require no coding. The remaining five examples require coding and make use of PyTorch, Python, and R within the RStudio IDE. It is assumed that you have prior knowledge of coding in the Python and R enviroinments. If you do not have experience coding, please take a look at our Open-Source GIScience and Open-Source Spatial Analytics (R) courses, which explore coding in Python and R, respectively. After completing this seminar you will be able to: explain how ANNs work including weights, bias, activation, and optimization. describe and explain different loss and assessment metrics and determine appropriate use cases. use the tensor data model to represent data as input for deep learning. explain how CNNs work including convolutional operations/layers, kernel size, stride, padding, max pooling, activation, and batch normalization. use PyTorch, Python, and R to prepare data, produce and assess scene classification models, and infer to new data. explain common semantic segmentation architectures and how these methods allow for pixel-level classification and how they are different from traditional CNNs. use PyTorch, Python, and R (or ArcGIS Pro) to prepare data, produce and assess semantic segmentation models, and infer to new data.

  10. ALT2040 Data Science Program Requirement Scraping

    • figshare.com
    application/csv
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seth Akins; Daniel Krasnov; Irene Vrbik (2025). ALT2040 Data Science Program Requirement Scraping [Dataset]. http://doi.org/10.6084/m9.figshare.26058871.v2
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    May 5, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Seth Akins; Daniel Krasnov; Irene Vrbik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data CSVs which are from R scripts scraping the program requirements and related subject course calendars for the Data Science major or honours programs at various universities throughout Canada and the United States. The code to generate the data is available at https://github.com/Hedgemon4/course-scraping. There are two kinds of CSVs: course calendars and program requirements.Course Calendars:These files contain the course calendars for either a specific subject (such as computer science), or for any courses required by the major (one CSV for all required courses)The files are all layed out similarliy, with the same column names.Depending on the information available on the university, some will have more or less columns.Any columns with a * are found on every course CSV fileColumn Names:Course Code ()Course Name ()Course Description ()Credit Amount (): In all calendars, this has been standardized to the same scale that UBC usesAntirequisiteCorequisitePrerequisiteEquivalency: Courses which are equivalent to each otherRecommended/Preperation: Contains any courses which were recomended to take before the course, but not a mandatory requirementHours: Components of course (lecture, lab, tutorial, seminar, etc.)Lab, Lecture, Tutorial, Seminar, etc: Logical vectors for each course component which are true if the course contains that componentBreadth Requirement, Distribution Requirement, Quantitative Requirement, etc: Vectors which indicate if the course fufills various other requirements imposed by the institution for graduation (not used by all institutions and names vary)Delivery FormatNote/ Other Information: any other notes or information about the course not part of the other above vectorProgram Requirements:These files contain the requirements for the specified program at the universityDepending on how the program requirements are displayed, some have the requirements for the whole degree, and others for the major or program onlyAll the columns listed below are found in all the files, but are organized slightly differently depending on how the institutions academic calendar was formattedColumn Names:Requirement Category:Contains the name for the category or subcategory (Example: Category G1A)Starts with an alphabet, which is different depending on if the instituition lists the year with the requirements: G = general/no year listed, F = first year, S = second year, U = upper year, C = Co-Op courseFollowed by a number, which indiactes the requirement orderIf followed by a letter, that means it it a subcategory of the overall category, so the category might be F4, and the subcategory F4A. Note that if a catagory required more subcategories than there are letters, a three digit system is used (001 to 999) instead to indicate the subcategory number (Example: Category G1037)Category Description:Lists the requirements for that category or subcategoryDepending on how the academic calendar was formatted, and if it is a category or subcategory, it will either have a category requirement description (one of, two of, all of, 7+ credits, etc.) and be potentially followed by a list of courses, or it will just have a list of coursesExamples: (All of)/(MATH 54 or STAT 89A or EECS 16A& EECS 16B or PHYSICS 89)/(ASTRON 128)/(Students complete all of: BUS 343, BUS 360W, BUS 439, BUS 445, BUS 345, BUS 336)Category Minimum Credit Amount:Lists the minimum number of credits from the category or subcategory to fulfill this requirement, and is standardized to the same scale that UBC usesCategory Maximum Credit Amount:Lists the maximum number of credits from the category or subcategory to fulfill this requirement, and is standardized to the same scale that UBC usesCore Requirement:Indicates if the course or requirement is needed to fulfill the program requirements (no alternate options)

  11. q

    Calling Bull in an Age of Big Data with R

    • qubeshub.org
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrie Diaz Eaton (2025). Calling Bull in an Age of Big Data with R [Dataset]. http://doi.org/10.25334/1NH1-J694
    Explore at:
    Dataset updated
    Jul 22, 2025
    Dataset provided by
    QUBES
    Authors
    Carrie Diaz Eaton
    Description

    Use the calling bull course to introduce students to data, ethics, visualization, and R.

  12. n

    Data from: Designing data science workshops for data-intensive environmental...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    Montana State University
    California State Polytechnic University
    Authors
    Allison Theobold; Stacey Hancock; Sara Mannheimer
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

    Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

    Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

    The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. 
    The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
    
      The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
    
    
    The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. 
    The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. 
    The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
    
  13. e

    Course Materials for Environmental Data Science in R: Introduction to Data...

    • portal.edirepository.org
    zip
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sparkle Malone (2025). Course Materials for Environmental Data Science in R: Introduction to Data Integration and Machine Learning (ENV 730) [Dataset]. http://doi.org/10.6073/pasta/5100e60582808c66095e767be806109f
    Explore at:
    zip(779253756 byte)Available download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    EDI
    Authors
    Sparkle Malone
    Time period covered
    2001 - 2024
    Area covered
    Earth
    Description

    In today's world, understanding environmental data and making informed decisions based on it is crucial for addressing complex environmental challenges. Yale School of the Environment's Environmental Data Science in R: Introduction to Data Integration and Machine Learning (ENV 730) course serves as an introduction to the integration of environmental data using R programming language, coupled with machine learning techniques. This dataset contains a zip file with all the data files used in this course, along with a README that has the metadata for those files.

  14. o

    Data Literacy Training Summer 2020

    • openicpsr.org
    Updated May 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janice Dias; Melody Goodman; Jinal Shah (2023). Data Literacy Training Summer 2020 [Dataset]. http://doi.org/10.3886/E191001V2
    Explore at:
    Dataset updated
    May 5, 2023
    Dataset provided by
    Grassroot Community Foundation
    New York University
    Authors
    Janice Dias; Melody Goodman; Jinal Shah
    Time period covered
    Jun 1, 2020 - Aug 31, 2020
    Description

    Because of the COVID-19 pandemic, presentation of public health data to the public has increased without much of the public having the knowledge to understand what these statistics mean or why some populations are at higher risk of adverse outcomes. Recognizing that those most impacted by COVID-19 are from vulnerable populations, we developed a training program called "The quantitative public health data literacy training program", aimed at increasing the data literacy of towards high school and college students from such vulnerable groups that introduces the basics of public health, data literacy, statistical software, descriptive statistics, and data ethics. The instructors taught eight synchronous sessions (five were also offered asynchronously), consisting of lectures and experiential group exercises. The program recruited, engaged, and retained a large cohort (n > 100) of underrepresented students in biostatistics and data science for a virtual data literacy training. The course provides a framework for developing and implementing similar public health training programs designed to increase diversity in the field.This project provides de-identified data for program's baseline/final assessment , program feedback as well as grades for certain portion of the program. The "Data-files" folder contains all the data collected during program. Along with the deidentified data, code is also provided (in R language) to analyze the data as presented in tables in potential publications.

  15. Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN

    • ckan.americaview.org
    Updated Sep 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.americaview.org (2022). Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/open-source-spatial-analytics-r
    Explore at:
    Dataset updated
    Sep 10, 2022
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.

  16. US Dept of Education: College Scorecard

    • kaggle.com
    zip
    Updated Nov 9, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2017). US Dept of Education: College Scorecard [Dataset]. https://www.kaggle.com/forums/f/810/us-dept-of-education-college-scorecard
    Explore at:
    zip(589617678 bytes)Available download formats
    Dataset updated
    Nov 9, 2017
    Dataset authored and provided by
    Kaggle
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    It's no secret that US university students often graduate with debt repayment obligations that far outstrip their employment and income prospects. While it's understood that students from elite colleges tend to earn more than graduates from less prestigious universities, the finer relationships between future income and university attendance are quite murky. In an effort to make educational investments less speculative, the US Department of Education has matched information from the student financial aid system with federal tax returns to create the College Scorecard dataset.

    Kaggle is hosting the College Scorecard dataset in order to facilitate shared learning and collaboration. Insights from this dataset can help make the returns on higher education more transparent and, in turn, more fair.

    Data Description

    Here's a script showing an exploratory overview of some of the data.

    college-scorecard-release-*.zip contains a compressed version of the same data available through Kaggle Scripts.

    It consists of three components:

    • All the raw data files released in version 1.40 of the college scorecard data
    • Scorecard.csv, a single CSV file with all the years data combined. In it, we've converted categorical variables represented by integer keys in the original data to their labels and added a Year column
    • database.sqlite, a SQLite database containing a single Scorecard table that contains the same information as Scorecard.csv

    New to data exploration in R? Take the free, interactive DataCamp course, "Data Exploration With Kaggle Scripts," to learn the basics of visualizing data with ggplot. You'll also create your first Kaggle Scripts along the way.

  17. d

    Hydroinformatics: Intro to Hydrologic Analysis in R (Bookdown and Code)

    • dataone.org
    • hydroshare.org
    • +2more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John P Gannon (2021). Hydroinformatics: Intro to Hydrologic Analysis in R (Bookdown and Code) [Dataset]. https://dataone.org/datasets/sha256%3A795062010105f8ea75e4004b225dab4178626535bb3e562151b8076e83f88a8b
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    John P Gannon
    Description

    The linked bookdown contains the notes and most exercises for a course on data analysis techniques in hydrology using the programming language R. The material will be updated each time the course is taught. If new topics are added, the topics they replace will remain, in case they are useful to others.

    I hope these materials can be a resource to those teaching themselves R for hydrologic analysis and/or for instructors who may want to use a lesson or two or the entire course. At the top of each chapter there is a link to a github repository. In each repository is the code that produces each chapter and a version where the code chunks within it are blank. These repositories are all template repositories, so you can easily copy them to your own github space by clicking Use This Template on the repo page.

    In my class, I work through the each document, live coding with students following along.Typically I ask students to watch as I code and explain the chunk and then replicate it on their computer. Depending on the lesson, I will ask students to try some of the chunks before I show them the code as an in-class activity. Some chunks are explicitly designed for this purpose and are typically labeled a “challenge.”

    Chapters called ACTIVITY are either homework or class-period-long in-class activities. The code chunks in these are therefore blank. If you would like a key for any of these, please just send me an email.

    If you have questions, suggestions, or would like activity answer keys, etc. please email me at jpgannon at vt.edu

    Finally, if you use this resource, please fill out the survey on the first page of the bookdown (https://forms.gle/6Zcntzvr1wZZUh6S7). This will help me get an idea of how people are using this resource, how I might improve it, and whether or not I should continue to update it.

  18. Data from: THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON...

    • zenodo.org
    csv, pdf
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim (2024). THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON SYSTEM: THE COURSE "HEALTH CARE FOR PEOPLE DEPRIVED OF FREEDOM" AND ITS IMPACTS [Dataset]. http://doi.org/10.5281/zenodo.6499752
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Dataset name: asppl_dataset_v2.csv

    Version: 2.0

    Dataset period: 06/07/2018 - 01/14/2022

    Dataset Characteristics: Multivalued

    Number of Instances: 8118

    Number of Attributes: 9

    Missing Values: Yes

    Area(s): Health and education

    Sources:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Occupational Classification (CBO) (Brasil, 2022b);

    • National Registry of Health Establishments (CNES) (Brasil, 2022c);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).

    Table 1: Description of AVASUS dataset features.

    Attributes

    Description

    datatype

    Value

    gender

    Gender of the course participant.

    Categorical.

    Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed)

    course_progress

    Percentage of completion of the course.

    Numerical.

    Range from 0 to 100.

    course_evaluation

    A score given to the course by the participant.

    Numerical.

    0, 1, 2, 3, 4, 5 or NaN.

    evaluation_commentary

    Comment made by the participant about the course.

    Categorical.

    Free text or NaN.

    region

    Brazilian region in which the participant resides.

    Categorical.

    Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South).

    CNES

    The CNES code refers to the health establishment where the participant works.

    Numerical.

    CNES Code or NaN.

    health_care_level

    Identification of the health care network level for which the course participant works.

    Categorical.

    “ATENCAO PRIMARIA”,

    “MEDIA COMPLEXIDADE”,

    “ALTA COMPLEXIDADE”,

    and their possible combinations.

    (In English "PRIMARY HEALTH CARE", "SECONDARY HEALTH CARE" AND "TERTIARY HEALTH CARE")

    year_enrollment

    Year in which the course participant registered.

    Numerical.

    Year (YYYY).

    CBO

    Participant occupation.

    Categorical.

    Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”)

    Dataset name: prison_syphilis_and_population_brazil.csv

    Dataset period: 2017 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 13

    Missing Values: No

    Source:

    • National Penitentiary Department (DEPEN) (Brasil, 2022d);

    Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.

    Table 2: Description of DEPEN dataset Features.

    Attributes

    Description

    datatype

    Value

    Region

    Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil.

    Categorical.

    Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South.

    syphilis_2017

    Number of syphilis cases in the prison system in 2017.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2017

    Normalized rate of syphilis cases in 2017.

    Numerical.

    Syphilis case rate.

    syphilis_2018

    Number of syphilis cases in the prison system in 2018.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2018

    Normalized rate of syphilis cases in 2018.

    Numerical.

    Syphilis case rate.

    syphilis_2019

    Number of syphilis cases in the prison system in 2019.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2019

    Normalized rate of syphilis cases in 2019.

    Numerical.

    Syphilis case rate.

    syphilis_2020

    Number of syphilis cases in the prison system in 2020.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2020

    Normalized rate of syphilis cases in 2020.

    Numerical.

    Syphilis case rate.

    pop_2017

    Prison population in 2017.

    Numerical.

    Population number.

    pop_2018

    Prison population in 2018.

    Numerical.

    Population number.

    pop_2019

    Prison population in 2019.

    Numerical.

    Population number.

    pop_2020

    Prison population in 2020.

    Numerical.

    Population number.

    Dataset name: students_cumulative_sum.csv

    Dataset period: 2018 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 7

    Missing Values: No

    Source:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.

    Table 3: Description of Students dataset Features.

  19. Data for tutotials from the bigsnpr extended documentation & my statistical...

    • figshare.com
    application/gzip
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Privé (2025). Data for tutotials from the bigsnpr extended documentation & my statistical genetics course in R [Dataset]. http://doi.org/10.6084/m9.figshare.20452377.v8
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Florian Privé
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for tutorial at https://privefl.github.io/bigsnpr-extdoc/ and course at https://privefl.github.io/statgen-course/.This data contains a zip with PLINK .bed/.bim/.fam files and a phenotype file. This was previously available at https://www.mtholyoke.edu/courses/afoulkes/Data/statsTeachR/. Described in https://doi.org/10.1002/sim.6605.Also a subset of the data from https://doi.org/10.6084/m9.figshare.16858534 to be used with the tutorial data above.Also GWAS summary statistics for testosterone levels in females and 1000 genomes European data, subsetted around two loci.And GWAS summary statistics for CAD computed from the UK Biobank.

  20. w

    Showing Life Opportunities 2020-2021, Data from Experiment 3: Coastal...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Asanov (2024). Showing Life Opportunities 2020-2021, Data from Experiment 3: Coastal Educational Regime (Régimen Costa) - Ecuador [Dataset]. https://microdata.worldbank.org/index.php/catalog/6111
    Explore at:
    Dataset updated
    Jan 8, 2024
    Dataset provided by
    Francisco Flores
    Guido Buenstorf
    Igor Asanov
    Thomas Astebro
    Bruno Crepon
    Mona Mensmann
    David McKenzie
    Mathis Schulte
    Time period covered
    2020
    Area covered
    Ecuador
    Description

    Abstract

    Opportunity-focused, high-growth entrepreneurship and science-led innovation are crucial for continued economic growth and productivity. Working in these fields offers the opportunity for rewarding and high-paying careers. However, the majority of youth in developing countries do not consider either as job options, affecting their choices of what to study. Youth may not select these educational and career paths due to lack of knowledge, lack of appropriate skills, and lack of role models. We provide a scalable approach to overcoming these constraints through an online education course for secondary school students that covers entrepreneurial soft skills, scientific methods, and interviews with role models.

    The study comprises three experimental trials provided Before and during COVID-19 pandemic in different regions of Ecuador. This catalog entry includes data from Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021. The data from the other two experiments are also available in the catalog.

    Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021

    A randomized experiment conducted in high schools in Ecuador as rapid fire response to the hurdles of COVID-19 for the Coastal Educational regimes schools (Régimen Costa); Students finish the program in December 2020). The intervention is an online education course that covers entrepreneurial soft skills, scientific methods, and interviews with role models. This course is taken by students at home during the COVID-19 pandemic under teachers’ supervision. We work mostly with 14-22-year-old students (16,441 students) in 598 schools assigned to the program. We randomly assign schools either to treatment (and receiving the entrepreneurship courses online), or placebo-control (receiving a placebo treatment of online courses from standard curricula) groups. We also cross-randomize the role models and evaluate set of nimble interventions to increase take-up. The details of intervention can be found in AEA registry: Asanov, Igor and David McKenzie. 2021. Scaling up virtual learning of online learning in high schools. AEA RCT Registry. March 23 Merged datasets from the baseline, midline, endline survey for each experiment administrated through online learning platform in school during normal educational hours before COVID-19 pandemic or at student’s home during COVID-19 pandemic are documented here. The detailed information about the questioner and each item can be found in the codebooks (Baseline 1, Baseline 2, Midline, Endline 1, Endline 2) for corresponding experiments.

    Geographic coverage

    Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021 We cover students of last year of education in School K12 of technical specialization (Bachillerato técnico) that study in Coastal Educational Regime (Régimen Costa) 2020/2021, suppose to finish their education in school in March 2021 and we capable to register on the online platform. The schools in highlands educational regime covered in this experiment scatter over the next educational zones 1, 2, 3, 4, 5, 6, 7, 8, 9.
    Taken together in the experiment 2,3 we offered the program across all Ecuador to schools that have technical specialization track.

    Analysis unit

    Student

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    All students in selected schools who were present in classes filled out the baseline questionnaire

    Mode of data collection

    Internet [int]

    Research instrument

    Questionnaires We execute three main sets of questioners. A. Internet (Online Based survey)

    The survey consists of a multi-topic questionnaire administered to the students through online learning platform in school during normal educational hours before COVID-19 pandemic or at home during the COVID-19 pandemic. We collect next information: 1. Subject specific knowledge tests. Spanish, English, Statistics, Personal Initiative (only endline), Negotiations (only endline). 2. Career intentions, preferences, beliefs, expectations, and attitudes. STEM and entrepreneurial intentions, preferences, beliefs, expectations, and attitudes. 3. Psychological characteristics. Personal Initiative, Negotiations, General Cognitions (General Self-Efficacy, Youth Self-Efficacy, Perceived Subsidiary Self-Efficacy Scale, Self-Regulatory Focus, Short Grit Scale), Entrepreneurial Cognitions (Business Self-Efficacy, Identifying Opportunities, Business Attitudes, Social Entrepreneurship Standards). 4. Behavior in (incentivized) games: Other-regarding preferences (dictator game), tendency to cooperate (Prisoners Dilemma), Perseverance (triangle game), preference for honesty, creativity (unscramble game). 5. Other background information. Socioeconomic level, language spoken, risk and time preferences, trust level, parents background, big-five personality traits of student, cognitive abilities. Background information (5) collected only at the baseline. B. First follow-up Phone-based Survey Zone 2, Summer (Phone Based). The survey replicates by phone shorter version of the internet-based survey above. We collect next information: 1. Subject specific knowledge tests.
    2. Career intentions, preferences, beliefs, expectations, and attitudes. 3. Psychological characteristics

    C. (Second) Follow-up Phone-Based Survey, Winter, Zone 2, Highlands Educational Regime.

    We execute multi-topic questionnaire by phone to capture the first life-outcomes of students who finished the school. We collect next information:

    1. Life Outcome 1- Education. The set of questions that aims to measure the learning success, career/study intentions, propensity to plan and approach others with studying tasks, entrepreneurial intentions.
    2. Life Outcome 2- Labor. The set of questions that aims to measure employment status and income, job searching behavior, time devoted for working/business, salary expectations and knowledge about the careers, self-initiated contribution to the family.
    3. Personal Initiative/Negotiations related and other measures. The set of questions that aim to measure level of personal initiative, negotiation strategies, pregnancy rate, gender stereotypes, math/STEM self-efficacy, gender attitudes, parent-student communication effects.

    Cleaning operations

    Data Editing A. Internet, Online-based surveys. We extracted the raw data generated on online platform from each experiment and prepared it for research purposes. We made several pre-processing steps of data: 1. We transform the raw data generated on platform in standard statistical software (R/STATA) readable format. 2. We extracted the answer for each item for each student for each survey (Baseline, Midline, Endline). 3. We cleaned duplicated students and duplicated answers for each item in each survey based on administrative data, performance and information given by students on platform. 4. In case of baseline survey, we standardized items/scales but also kept the raw items.

    B. Phone-based surveys. The phone-based surveys are collected with help of advanced CATI kit. It contains all cases (attempts to call) and indication if the survey was effective. The data is cleaned to be ready for analysis. The data is anonymized but contains unique anonymous student id for merging across datasets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). data_neo.Rdata [Dataset]. https://psycharchives.org/handle/20.500.12034/4717

data_neo.Rdata

Explore at:
Dataset updated
Dec 20, 2021
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

R is a very powerful language for statistical computing in many disciplines of research and has a steep learning curve. The software is open source, freely available and has a thriving community. This crash course provides an overview of Base-R concepts for beginners and covers the topics 1) introduction into R, 2) reading, saving, and viewing data, 3) selecting and changing objects in R, and 4) descriptive statistics.This course was held by Lisa Spitzer on September 3, 2021, as a precursor to the R tidyverse Workshop by Aurélien Ginolhac and Roland Krause (September 8 - 10, 2021). This entry features the slides, exercises/results, and chat messages of the crash course. Related to this entry are the recordings of the course, and the r tidyverse workshop materials. Click on "related PsychArchives objects" to view or download the recordings of the workshop.:

Search
Clear search
Close search
Google apps
Main menu