Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R is a very powerful language for statistical computing in many disciplines of research and has a steep learning curve. The software is open source, freely available and has a thriving community. This crash course provides an overview of Base-R concepts for beginners and covers the topics 1) introduction into R, 2) reading, saving, and viewing data, 3) selecting and changing objects in R, and 4) descriptive statistics.This course was held by Lisa Spitzer on September 3, 2021, as a precursor to the R tidyverse Workshop by Aurélien Ginolhac and Roland Krause (September 8 - 10, 2021). This entry features the slides, exercises/results, and chat messages of the crash course. Related to this entry are the recordings of the course, and the r tidyverse workshop materials. Click on "related PsychArchives objects" to view or download the recordings of the workshop.:
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
DataCamp is one of the leading online course providers for Data Science.
They have more than 300 course using R, Python and other technologies which made me wonder if it would be
possible to identify current trends by looking at their course descriptions.
The course data was scraped from DataCamp's site, extracting course titles and descriptions, and which technology the course aims to teach.
The scraping was an absolute pleasure thanks to the excellent rvest R package!
Some ideas:
Facebook
TwitterBuilding strong quantitative skills prepares undergraduate biology students for successful careers in science and medicine. While math and statistics anxiety can negatively impact student learning within biology classrooms, instructors may reduce this anxiety by steadily building student competency in quantitative reasoning through instructional scaffolding, application-based approaches, and simple computer program interfaces. However, few statistical programs exist that meet all needs of an inclusive, inquiry-based laboratory course. These needs include an open-source program, a simple interface, little required background knowledge in statistics for student users, and customizability to minimize cognitive load, align with course learning outcomes, and create desirable difficulty. To address these needs, we used the Shiny package in R to develop a custom statistical analysis application. Our “BioStats” app provides students with scaffolded learning experiences in applied statistics that promotes student agency and is customizable by the instructor. It introduces students to the strengths of the R interface, while eliminating the need for complex coding in the R programming language. It also prioritizes practical implementation of statistical analyses over learning statistical theory. To our knowledge, this is the first statistics teaching tool where students are presented basic statistics initially, more complex analyses as they advance, and includes an option to learn R statistical coding. The BioStats app interface yields a simplified introduction to applied statistics that is adaptable to many biology laboratory courses.
Primary Image: Singing Junco. A sketch of a junco singing on a pine tree branch, created by the lead author of this paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
Facebook
TwitterThe open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a..., Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of ..., , # Data for: Integrating open education practices with data analysis of open science in an undergraduate course
Author: Marja H Bakermans Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA ORCID: https://orcid.org/0000-0002-4879-7771 Institutional IRB approval: IRB-24–0314
The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files. Below are descriptions of the name and contents of each file. NA = not applicable or no data available
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 3 rows and is filtered where the book is A first course in statistical programming with R. It features 7 columns including author, publication date, language, and book publisher.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterThe importance of data science skills for modern scientific research cannot be understated. Although policy documents increasingly recommend what skills should be included in undergraduate statistics and data science curricula, little is known about how students actually develop and apply these skills. This paper addresses this gap through an in-depth case study tracing one student’s learning progressions throughout her master’s program. Using a qualitative method to analyze student code, which has seen little use in statistics education research, I examined how Alicia transferred the data science skills from her applied statistics course into authentic research settings. The analysis shows that, while Alicia successfully navigated new challenges, she encountered persistent hurdles when extending bivariate techniques into multivariate contexts, particularly with visualizations and summary statistics. These findings highlight the obs..., R Script files submitted by Alicia (pseudonym) over the course of the study. The files are named according to when they were submitted:
December 2018
R Script #1
April 2019
R Script #1 (revised) R Script #2
September 2019
R Script #1 (revised) R Script #2 (revised)
Qualitative Data Analysis Files (Rich text files)
December 2018 Script #1 April 2019 Script #1 April 2019 Script #2 September 2019 Script #1 September 2019 Script #2
Quantitative Data Analysis Files
r-code-themes.csv
Comma separated values file with separate sheets for each R script Each sheet contains the qualitative code assigned to each line of code and whether the code contained errors.
, , # Code from: Beyond the classroom: Alicia’s multivariate journey
https://doi.org/10.5061/dryad.c59zw3rg6
This repository contains the R script files submitted by Alicia (pseudonym) throughout this study, files associated with the qualitative analysis of the code, and files associated with visualizations of the qualitative themes included in Alicia's code.
As this is a qualitative analysis, the usage of these "data" files differs from a typical quantitative analysis.
.R Files contain the scripts generated by Alicia at each time point (December 2018, April 2019, September 2019)-codes.rft Files contain the (qualitative) process codes for each R scriptr-code-themes.xlsx The file contains information on every script and the qualitative code assigned to each line of code.While the "data" for this analysis are R scripts, these scripts cannot be execu...,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This seminar is an applied study of deep learning methods for extracting information from geospatial data, such as aerial imagery, multispectral imagery, digital terrain data, and other digital cartographic representations. We first provide an introduction and conceptualization of artificial neural networks (ANNs). Next, we explore appropriate loss and assessment metrics for different use cases followed by the tensor data model, which is central to applying deep learning methods. Convolutional neural networks (CNNs) are then conceptualized with scene classification use cases. Lastly, we explore semantic segmentation, object detection, and instance segmentation. The primary focus of this course is semantic segmenation for pixel-level classification. The associated GitHub repo provides a series of applied examples. We hope to continue to add examples as methods and technologies further develop. These examples make use of a vareity of datasets (e.g., SAT-6, topoDL, Inria, LandCover.ai, vfillDL, and wvlcDL). Please see the repo for links to the data and associated papers. All examples have associated videos that walk through the process, which are also linked to the repo. A variety of deep learning architectures are explored including UNet, UNet++, DeepLabv3+, and Mask R-CNN. Currenlty, two examples use ArcGIS Pro and require no coding. The remaining five examples require coding and make use of PyTorch, Python, and R within the RStudio IDE. It is assumed that you have prior knowledge of coding in the Python and R enviroinments. If you do not have experience coding, please take a look at our Open-Source GIScience and Open-Source Spatial Analytics (R) courses, which explore coding in Python and R, respectively. After completing this seminar you will be able to: explain how ANNs work including weights, bias, activation, and optimization. describe and explain different loss and assessment metrics and determine appropriate use cases. use the tensor data model to represent data as input for deep learning. explain how CNNs work including convolutional operations/layers, kernel size, stride, padding, max pooling, activation, and batch normalization. use PyTorch, Python, and R to prepare data, produce and assess scene classification models, and infer to new data. explain common semantic segmentation architectures and how these methods allow for pixel-level classification and how they are different from traditional CNNs. use PyTorch, Python, and R (or ArcGIS Pro) to prepare data, produce and assess semantic segmentation models, and infer to new data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data CSVs which are from R scripts scraping the program requirements and related subject course calendars for the Data Science major or honours programs at various universities throughout Canada and the United States. The code to generate the data is available at https://github.com/Hedgemon4/course-scraping. There are two kinds of CSVs: course calendars and program requirements.Course Calendars:These files contain the course calendars for either a specific subject (such as computer science), or for any courses required by the major (one CSV for all required courses)The files are all layed out similarliy, with the same column names.Depending on the information available on the university, some will have more or less columns.Any columns with a * are found on every course CSV fileColumn Names:Course Code ()Course Name ()Course Description ()Credit Amount (): In all calendars, this has been standardized to the same scale that UBC usesAntirequisiteCorequisitePrerequisiteEquivalency: Courses which are equivalent to each otherRecommended/Preperation: Contains any courses which were recomended to take before the course, but not a mandatory requirementHours: Components of course (lecture, lab, tutorial, seminar, etc.)Lab, Lecture, Tutorial, Seminar, etc: Logical vectors for each course component which are true if the course contains that componentBreadth Requirement, Distribution Requirement, Quantitative Requirement, etc: Vectors which indicate if the course fufills various other requirements imposed by the institution for graduation (not used by all institutions and names vary)Delivery FormatNote/ Other Information: any other notes or information about the course not part of the other above vectorProgram Requirements:These files contain the requirements for the specified program at the universityDepending on how the program requirements are displayed, some have the requirements for the whole degree, and others for the major or program onlyAll the columns listed below are found in all the files, but are organized slightly differently depending on how the institutions academic calendar was formattedColumn Names:Requirement Category:Contains the name for the category or subcategory (Example: Category G1A)Starts with an alphabet, which is different depending on if the instituition lists the year with the requirements: G = general/no year listed, F = first year, S = second year, U = upper year, C = Co-Op courseFollowed by a number, which indiactes the requirement orderIf followed by a letter, that means it it a subcategory of the overall category, so the category might be F4, and the subcategory F4A. Note that if a catagory required more subcategories than there are letters, a three digit system is used (001 to 999) instead to indicate the subcategory number (Example: Category G1037)Category Description:Lists the requirements for that category or subcategoryDepending on how the academic calendar was formatted, and if it is a category or subcategory, it will either have a category requirement description (one of, two of, all of, 7+ credits, etc.) and be potentially followed by a list of courses, or it will just have a list of coursesExamples: (All of)/(MATH 54 or STAT 89A or EECS 16A& EECS 16B or PHYSICS 89)/(ASTRON 128)/(Students complete all of: BUS 343, BUS 360W, BUS 439, BUS 445, BUS 345, BUS 336)Category Minimum Credit Amount:Lists the minimum number of credits from the category or subcategory to fulfill this requirement, and is standardized to the same scale that UBC usesCategory Maximum Credit Amount:Lists the maximum number of credits from the category or subcategory to fulfill this requirement, and is standardized to the same scale that UBC usesCore Requirement:Indicates if the course or requirement is needed to fulfill the program requirements (no alternate options)
Facebook
TwitterUse the calling bull course to introduce students to data, ethics, visualization, and R.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.
Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.
Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.
The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files.
The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively.
The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean.
The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
Facebook
TwitterIn today's world, understanding environmental data and making informed decisions based on it is crucial for addressing complex environmental challenges. Yale School of the Environment's Environmental Data Science in R: Introduction to Data Integration and Machine Learning (ENV 730) course serves as an introduction to the integration of environmental data using R programming language, coupled with machine learning techniques. This dataset contains a zip file with all the data files used in this course, along with a README that has the metadata for those files.
Facebook
TwitterBecause of the COVID-19 pandemic, presentation of public health data to the public has increased without much of the public having the knowledge to understand what these statistics mean or why some populations are at higher risk of adverse outcomes. Recognizing that those most impacted by COVID-19 are from vulnerable populations, we developed a training program called "The quantitative public health data literacy training program", aimed at increasing the data literacy of towards high school and college students from such vulnerable groups that introduces the basics of public health, data literacy, statistical software, descriptive statistics, and data ethics. The instructors taught eight synchronous sessions (five were also offered asynchronously), consisting of lectures and experiential group exercises. The program recruited, engaged, and retained a large cohort (n > 100) of underrepresented students in biostatistics and data science for a virtual data literacy training. The course provides a framework for developing and implementing similar public health training programs designed to increase diversity in the field.This project provides de-identified data for program's baseline/final assessment , program feedback as well as grades for certain portion of the program. The "Data-files" folder contains all the data collected during program. Along with the deidentified data, code is also provided (in R language) to analyze the data as presented in tables in potential publications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It's no secret that US university students often graduate with debt repayment obligations that far outstrip their employment and income prospects. While it's understood that students from elite colleges tend to earn more than graduates from less prestigious universities, the finer relationships between future income and university attendance are quite murky. In an effort to make educational investments less speculative, the US Department of Education has matched information from the student financial aid system with federal tax returns to create the College Scorecard dataset.
Kaggle is hosting the College Scorecard dataset in order to facilitate shared learning and collaboration. Insights from this dataset can help make the returns on higher education more transparent and, in turn, more fair.
Here's a script showing an exploratory overview of some of the data.
college-scorecard-release-*.zip contains a compressed version of the same data available through Kaggle Scripts.
It consists of three components:
New to data exploration in R? Take the free, interactive DataCamp course, "Data Exploration With Kaggle Scripts," to learn the basics of visualizing data with ggplot. You'll also create your first Kaggle Scripts along the way.
Facebook
TwitterThe linked bookdown contains the notes and most exercises for a course on data analysis techniques in hydrology using the programming language R. The material will be updated each time the course is taught. If new topics are added, the topics they replace will remain, in case they are useful to others.
I hope these materials can be a resource to those teaching themselves R for hydrologic analysis and/or for instructors who may want to use a lesson or two or the entire course. At the top of each chapter there is a link to a github repository. In each repository is the code that produces each chapter and a version where the code chunks within it are blank. These repositories are all template repositories, so you can easily copy them to your own github space by clicking Use This Template on the repo page.
In my class, I work through the each document, live coding with students following along.Typically I ask students to watch as I code and explain the chunk and then replicate it on their computer. Depending on the lesson, I will ask students to try some of the chunks before I show them the code as an in-class activity. Some chunks are explicitly designed for this purpose and are typically labeled a “challenge.”
Chapters called ACTIVITY are either homework or class-period-long in-class activities. The code chunks in these are therefore blank. If you would like a key for any of these, please just send me an email.
If you have questions, suggestions, or would like activity answer keys, etc. please email me at jpgannon at vt.edu
Finally, if you use this resource, please fill out the survey on the first page of the bookdown (https://forms.gle/6Zcntzvr1wZZUh6S7). This will help me get an idea of how people are using this resource, how I might improve it, and whether or not I should continue to update it.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset name: asppl_dataset_v2.csv
Version: 2.0
Dataset period: 06/07/2018 - 01/14/2022
Dataset Characteristics: Multivalued
Number of Instances: 8118
Number of Attributes: 9
Missing Values: Yes
Area(s): Health and education
Sources:
Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);
Brazilian Occupational Classification (CBO) (Brasil, 2022b);
National Registry of Health Establishments (CNES) (Brasil, 2022c);
Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).
Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).
Table 1: Description of AVASUS dataset features.
|
Attributes |
Description |
datatype |
Value |
|
gender |
Gender of the course participant. |
Categorical. |
Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed) |
|
course_progress |
Percentage of completion of the course. |
Numerical. |
Range from 0 to 100. |
|
course_evaluation |
A score given to the course by the participant. |
Numerical. |
0, 1, 2, 3, 4, 5 or NaN. |
|
evaluation_commentary |
Comment made by the participant about the course. |
Categorical. |
Free text or NaN. |
|
region |
Brazilian region in which the participant resides. |
Categorical. |
Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South). |
|
CNES |
The CNES code refers to the health establishment where the participant works. |
Numerical. |
CNES Code or NaN. |
|
health_care_level |
Identification of the health care network level for which the course participant works. |
Categorical. |
“ATENCAO PRIMARIA”, “MEDIA COMPLEXIDADE”, “ALTA COMPLEXIDADE”, and their possible combinations. |
|
year_enrollment |
Year in which the course participant registered. |
Numerical. |
Year (YYYY). |
|
CBO |
Participant occupation. |
Categorical. |
Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”) |
Dataset name: prison_syphilis_and_population_brazil.csv
Dataset period: 2017 - 2020
Dataset Characteristics: Multivalued
Number of Instances: 6
Number of Attributes: 13
Missing Values: No
Source:
National Penitentiary Department (DEPEN) (Brasil, 2022d);
Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.
Table 2: Description of DEPEN dataset Features.
|
Attributes |
Description |
datatype |
Value |
|
Region |
Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil. |
Categorical. |
Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South. |
|
syphilis_2017 |
Number of syphilis cases in the prison system in 2017. |
Numerical. |
Number of syphilis cases. |
|
syphilis_rate_2017 |
Normalized rate of syphilis cases in 2017. |
Numerical. |
Syphilis case rate. |
|
syphilis_2018 |
Number of syphilis cases in the prison system in 2018. |
Numerical. |
Number of syphilis cases. |
|
syphilis_rate_2018 |
Normalized rate of syphilis cases in 2018. |
Numerical. |
Syphilis case rate. |
|
syphilis_2019 |
Number of syphilis cases in the prison system in 2019. |
Numerical. |
Number of syphilis cases. |
|
syphilis_rate_2019 |
Normalized rate of syphilis cases in 2019. |
Numerical. |
Syphilis case rate. |
|
syphilis_2020 |
Number of syphilis cases in the prison system in 2020. |
Numerical. |
Number of syphilis cases. |
|
syphilis_rate_2020 |
Normalized rate of syphilis cases in 2020. |
Numerical. |
Syphilis case rate. |
|
pop_2017 |
Prison population in 2017. |
Numerical. |
Population number. |
|
pop_2018 |
Prison population in 2018. |
Numerical. |
Population number. |
|
pop_2019 |
Prison population in 2019. |
Numerical. |
Population number. |
|
pop_2020 |
Prison population in 2020. |
Numerical. |
Population number. |
Dataset name: students_cumulative_sum.csv
Dataset period: 2018 - 2020
Dataset Characteristics: Multivalued
Number of Instances: 6
Number of Attributes: 7
Missing Values: No
Source:
Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);
Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).
Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.
Table 3: Description of Students dataset Features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for tutorial at https://privefl.github.io/bigsnpr-extdoc/ and course at https://privefl.github.io/statgen-course/.This data contains a zip with PLINK .bed/.bim/.fam files and a phenotype file. This was previously available at https://www.mtholyoke.edu/courses/afoulkes/Data/statsTeachR/. Described in https://doi.org/10.1002/sim.6605.Also a subset of the data from https://doi.org/10.6084/m9.figshare.16858534 to be used with the tutorial data above.Also GWAS summary statistics for testosterone levels in females and 1000 genomes European data, subsetted around two loci.And GWAS summary statistics for CAD computed from the UK Biobank.
Facebook
TwitterOpportunity-focused, high-growth entrepreneurship and science-led innovation are crucial for continued economic growth and productivity. Working in these fields offers the opportunity for rewarding and high-paying careers. However, the majority of youth in developing countries do not consider either as job options, affecting their choices of what to study. Youth may not select these educational and career paths due to lack of knowledge, lack of appropriate skills, and lack of role models. We provide a scalable approach to overcoming these constraints through an online education course for secondary school students that covers entrepreneurial soft skills, scientific methods, and interviews with role models.
The study comprises three experimental trials provided Before and during COVID-19 pandemic in different regions of Ecuador. This catalog entry includes data from Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021. The data from the other two experiments are also available in the catalog.
Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021
A randomized experiment conducted in high schools in Ecuador as rapid fire response to the hurdles of COVID-19 for the Coastal Educational regimes schools (Régimen Costa); Students finish the program in December 2020). The intervention is an online education course that covers entrepreneurial soft skills, scientific methods, and interviews with role models. This course is taken by students at home during the COVID-19 pandemic under teachers’ supervision. We work mostly with 14-22-year-old students (16,441 students) in 598 schools assigned to the program. We randomly assign schools either to treatment (and receiving the entrepreneurship courses online), or placebo-control (receiving a placebo treatment of online courses from standard curricula) groups. We also cross-randomize the role models and evaluate set of nimble interventions to increase take-up. The details of intervention can be found in AEA registry: Asanov, Igor and David McKenzie. 2021. Scaling up virtual learning of online learning in high schools. AEA RCT Registry. March 23 Merged datasets from the baseline, midline, endline survey for each experiment administrated through online learning platform in school during normal educational hours before COVID-19 pandemic or at student’s home during COVID-19 pandemic are documented here. The detailed information about the questioner and each item can be found in the codebooks (Baseline 1, Baseline 2, Midline, Endline 1, Endline 2) for corresponding experiments.
Experiment 3: Coastal Educational Regime (Régimen Costa) 2020/2021
We cover students of last year of education in School K12 of technical specialization (Bachillerato técnico) that study in Coastal Educational Regime (Régimen Costa) 2020/2021, suppose to finish their education in school in March 2021 and we capable to register on the online platform. The schools in highlands educational regime covered in this experiment scatter over the next educational zones 1, 2, 3, 4, 5, 6, 7, 8, 9.
Taken together in the experiment 2,3 we offered the program across all Ecuador to schools that have technical specialization track.
Student
Sample survey data [ssd]
All students in selected schools who were present in classes filled out the baseline questionnaire
Internet [int]
Questionnaires We execute three main sets of questioners. A. Internet (Online Based survey)
The survey consists of a multi-topic questionnaire administered to the students through online learning platform in school during normal educational hours before COVID-19 pandemic or at home during the COVID-19 pandemic. We collect next information:
1. Subject specific knowledge tests. Spanish, English, Statistics, Personal Initiative (only endline), Negotiations (only endline).
2. Career intentions, preferences, beliefs, expectations, and attitudes. STEM and entrepreneurial intentions, preferences, beliefs, expectations, and attitudes.
3. Psychological characteristics. Personal Initiative, Negotiations, General Cognitions (General Self-Efficacy, Youth Self-Efficacy, Perceived Subsidiary Self-Efficacy Scale, Self-Regulatory Focus, Short Grit Scale), Entrepreneurial Cognitions (Business Self-Efficacy, Identifying Opportunities, Business Attitudes, Social Entrepreneurship Standards).
4. Behavior in (incentivized) games: Other-regarding preferences (dictator game), tendency to cooperate (Prisoners Dilemma), Perseverance (triangle game), preference for honesty, creativity (unscramble game).
5. Other background information. Socioeconomic level, language spoken, risk and time preferences, trust level, parents background, big-five personality traits of student, cognitive abilities.
Background information (5) collected only at the baseline.
B. First follow-up Phone-based Survey Zone 2, Summer (Phone Based).
The survey replicates by phone shorter version of the internet-based survey above. We collect next information:
1. Subject specific knowledge tests.
2. Career intentions, preferences, beliefs, expectations, and attitudes.
3. Psychological characteristics
C. (Second) Follow-up Phone-Based Survey, Winter, Zone 2, Highlands Educational Regime.
We execute multi-topic questionnaire by phone to capture the first life-outcomes of students who finished the school. We collect next information:
Data Editing A. Internet, Online-based surveys. We extracted the raw data generated on online platform from each experiment and prepared it for research purposes. We made several pre-processing steps of data: 1. We transform the raw data generated on platform in standard statistical software (R/STATA) readable format. 2. We extracted the answer for each item for each student for each survey (Baseline, Midline, Endline). 3. We cleaned duplicated students and duplicated answers for each item in each survey based on administrative data, performance and information given by students on platform. 4. In case of baseline survey, we standardized items/scales but also kept the raw items.
B. Phone-based surveys. The phone-based surveys are collected with help of advanced CATI kit. It contains all cases (attempts to call) and indication if the survey was effective. The data is cleaned to be ready for analysis. The data is anonymized but contains unique anonymous student id for merging across datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R is a very powerful language for statistical computing in many disciplines of research and has a steep learning curve. The software is open source, freely available and has a thriving community. This crash course provides an overview of Base-R concepts for beginners and covers the topics 1) introduction into R, 2) reading, saving, and viewing data, 3) selecting and changing objects in R, and 4) descriptive statistics.This course was held by Lisa Spitzer on September 3, 2021, as a precursor to the R tidyverse Workshop by Aurélien Ginolhac and Roland Krause (September 8 - 10, 2021). This entry features the slides, exercises/results, and chat messages of the crash course. Related to this entry are the recordings of the course, and the r tidyverse workshop materials. Click on "related PsychArchives objects" to view or download the recordings of the workshop.: