Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset includes the overall and MMR-specific vaccination rates for 46,410 schools in 32 states
|variable |class |description |
|:--------|:---------|:-----------|
|index |double | Index ID |
|state |character | School's state |
|year |character | School academic year|
|name |character | School name|
|type |character | Whether a school is public, private, charter |
|city |character | City |
|county |character | County |
|district |character | School district |
|enroll |double | Enrollment |
|mmr |double | School's Measles, Mumps, and Rubella (MMR) vaccination rate |
|overall |double | School's overall vaccination rate|
|xrel |double | Percentage of students exempted from vaccination for religious reasons |
|xmed |double | Percentage of students exempted from vaccination for medical reasons |
|xper |double | Percentage of students exempted from vaccination for personal reasons |
|lat |double | Latitude |
|lng |double | Longitude |
This data originally comes from #tidytuesday and is originally from The Wallstreet Journal. They recently published an article around 46,412 schools across 32 US States.
"This repository contains immunization rate data for schools across the U.S., as compiled by The Wall Street Journal. The dataset includes the overall and MMR-specific vaccination rates for 46,412 schools in 32 states. As used in "What's the Measles Vaccination Rate at Your Child's School?".
Vaccination rates are for the 2017-18 school year for Colorado, Connecticut, Minnesota, Montana, New Jersey, New York, North Dakota, Pennsylvania, South Dakota, Utah and Washington. Rates for other states are 2018-19."
(The #tidytuesday page mentions 46412 records, but the file loads 1 less, and there 1 duplication:
283 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA
284 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA
Hence, total of 46410 records if you remove the duplication.)
The initial cleaning code from #tidytuesday had to be modified because 1. It was resulting in an error, possibly because the page where the list of URLs for individual states was coming from has changed since the code was published.
url_wsj <- "https://raw.githubusercontent.com/WSJ/measles-data/master/all-measles-rates.csv"
wsj <- read_csv(url_wsj)
list_of_urls <- "https://github.com/WSJ/measles-data/tree/master/individual-states"
raw_states <- list_of_urls %>%
read_html() %>%
html_table() %>%
.[[1]] %>%
select(1) %>% #changed select(Name) to select(1) becase there were three columns with headers 'Name'
mutate(Name = str_remove(Name, "\.csv")) %>%
filter(str_length(Name) > 3, str_length(Name) < 20) %>%
pull(Name)
raw_states=raw_states[2:32] # had to add this line of code because the first element on the list was "parent directory.." and the last, 33rd element was "View all files"
all_states <- glue::glue("https://raw.githubusercontent.com/WSJ/measles-data/master/individual-states/{raw_states}.csv") %>%
map(read_csv)
#As it turns out not every state had all of state, city, county, district information. Hence in the original code was limiting the identifier column to just state.
#Only having state and school name was leading to cross matching in states where multiple schools with same name were present
# clean_states <- all_states %>%
# map(~select(., state, name, lat, lng)) %>%
# map(~mutate_at(., vars(lat, lng), as.numeric)) %>%
# bind_rows() %>%
# filter(!is.na(lat))
#Hence added as many parameters that could have been added out of "state", "name", "district", "county", "city" for each state
clean_states <- all_states %>%
map(~select(., tidyselect::any_of(c("state", "name", "district", "county", "city", "lat","lng")))) %>%
map(~mutate_at(., v...
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset includes the overall and MMR-specific vaccination rates for 46,410 schools in 32 states
|variable |class |description |
|:--------|:---------|:-----------|
|index |double | Index ID |
|state |character | School's state |
|year |character | School academic year|
|name |character | School name|
|type |character | Whether a school is public, private, charter |
|city |character | City |
|county |character | County |
|district |character | School district |
|enroll |double | Enrollment |
|mmr |double | School's Measles, Mumps, and Rubella (MMR) vaccination rate |
|overall |double | School's overall vaccination rate|
|xrel |double | Percentage of students exempted from vaccination for religious reasons |
|xmed |double | Percentage of students exempted from vaccination for medical reasons |
|xper |double | Percentage of students exempted from vaccination for personal reasons |
|lat |double | Latitude |
|lng |double | Longitude |
This data originally comes from #tidytuesday and is originally from The Wallstreet Journal. They recently published an article around 46,412 schools across 32 US States.
"This repository contains immunization rate data for schools across the U.S., as compiled by The Wall Street Journal. The dataset includes the overall and MMR-specific vaccination rates for 46,412 schools in 32 states. As used in "What's the Measles Vaccination Rate at Your Child's School?".
Vaccination rates are for the 2017-18 school year for Colorado, Connecticut, Minnesota, Montana, New Jersey, New York, North Dakota, Pennsylvania, South Dakota, Utah and Washington. Rates for other states are 2018-19."
(The #tidytuesday page mentions 46412 records, but the file loads 1 less, and there 1 duplication:
283 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA
284 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA
Hence, total of 46410 records if you remove the duplication.)
The initial cleaning code from #tidytuesday had to be modified because 1. It was resulting in an error, possibly because the page where the list of URLs for individual states was coming from has changed since the code was published.
url_wsj <- "https://raw.githubusercontent.com/WSJ/measles-data/master/all-measles-rates.csv"
wsj <- read_csv(url_wsj)
list_of_urls <- "https://github.com/WSJ/measles-data/tree/master/individual-states"
raw_states <- list_of_urls %>%
read_html() %>%
html_table() %>%
.[[1]] %>%
select(1) %>% #changed select(Name) to select(1) becase there were three columns with headers 'Name'
mutate(Name = str_remove(Name, "\.csv")) %>%
filter(str_length(Name) > 3, str_length(Name) < 20) %>%
pull(Name)
raw_states=raw_states[2:32] # had to add this line of code because the first element on the list was "parent directory.." and the last, 33rd element was "View all files"
all_states <- glue::glue("https://raw.githubusercontent.com/WSJ/measles-data/master/individual-states/{raw_states}.csv") %>%
map(read_csv)
#As it turns out not every state had all of state, city, county, district information. Hence in the original code was limiting the identifier column to just state.
#Only having state and school name was leading to cross matching in states where multiple schools with same name were present
# clean_states <- all_states %>%
# map(~select(., state, name, lat, lng)) %>%
# map(~mutate_at(., vars(lat, lng), as.numeric)) %>%
# bind_rows() %>%
# filter(!is.na(lat))
#Hence added as many parameters that could have been added out of "state", "name", "district", "county", "city" for each state
clean_states <- all_states %>%
map(~select(., tidyselect::any_of(c("state", "name", "district", "county", "city", "lat","lng")))) %>%
map(~mutate_at(., v...