1 dataset found
  1. Measles Immunization Rates in US Schools

    • kaggle.com
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PallaviSRane (2024). Measles Immunization Rates in US Schools [Dataset]. https://www.kaggle.com/datasets/pallavisrane/measles-immunization-rated-in-us-schools
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2024
    Dataset provided by
    Kaggle
    Authors
    PallaviSRane
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    The dataset includes the overall and MMR-specific vaccination rates for 46,410 schools in 32 states

    The table contains the following columns:

    |variable |class   |description |
    |:--------|:---------|:-----------|
    |index  |double  | Index ID |
    |state  |character | School's state |
    |year   |character | School academic year|
    |name   |character | School name|
    |type   |character | Whether a school is public, private, charter |
    |city   |character | City |
    |county  |character | County |
    |district |character | School district |
    |enroll  |double  | Enrollment |
    |mmr   |double  | School's Measles, Mumps, and Rubella (MMR) vaccination rate |
    |overall |double  | School's overall vaccination rate|
    |xrel   |double | Percentage of students exempted from vaccination for religious reasons |
    |xmed   |double  | Percentage of students exempted from vaccination for medical reasons |
    |xper   |double  | Percentage of students exempted from vaccination for personal reasons |
    |lat   |double  | Latitude |
    |lng   |double  | Longitude |
    

    Acknowledgements:

    This data originally comes from #tidytuesday and is originally from The Wallstreet Journal. They recently published an article around 46,412 schools across 32 US States.

    "This repository contains immunization rate data for schools across the U.S., as compiled by The Wall Street Journal. The dataset includes the overall and MMR-specific vaccination rates for 46,412 schools in 32 states. As used in "What's the Measles Vaccination Rate at Your Child's School?".

    Vaccination rates are for the 2017-18 school year for Colorado, Connecticut, Minnesota, Montana, New Jersey, New York, North Dakota, Pennsylvania, South Dakota, Utah and Washington. Rates for other states are 2018-19."
    (The #tidytuesday page mentions 46412 records, but the file loads 1 less, and there 1 duplication: 283 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA 284 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA Hence, total of 46410 records if you remove the duplication.)

    Data cleaning:

    The initial cleaning code from #tidytuesday had to be modified because 1. It was resulting in an error, possibly because the page where the list of URLs for individual states was coming from has changed since the code was published.

    1. When we were adding the latitude and longitude data from the states to the original vaccination file, it was being done only with school name and if one state had multiple schools with the same name, that was leading to a many to many matching, resulting in a cartesian matching and duplication.

    Code:

    Following code adds latitude and longitude to the original dataset and removes any duplication giving 46410 records

    Modifications are mentioned in comments

    url_wsj <- "https://raw.githubusercontent.com/WSJ/measles-data/master/all-measles-rates.csv"
    
    wsj <- read_csv(url_wsj)
    
    list_of_urls <- "https://github.com/WSJ/measles-data/tree/master/individual-states"
    
    raw_states <- list_of_urls %>% 
     read_html() %>% 
     html_table() %>% 
     .[[1]] %>% 
     select(1) %>% #changed select(Name) to select(1) becase there were three columns with headers 'Name'
     mutate(Name = str_remove(Name, "\.csv")) %>% 
     filter(str_length(Name) > 3, str_length(Name) < 20) %>% 
     pull(Name)
    
    raw_states=raw_states[2:32] # had to add this line of code because the first element on the list was "parent directory.." and the last, 33rd element was "View all files"
    
    all_states <- glue::glue("https://raw.githubusercontent.com/WSJ/measles-data/master/individual-states/{raw_states}.csv") %>% 
     map(read_csv)
    
    #As it turns out not every state had all of state, city, county, district information. Hence in the original code was limiting the identifier column to just state.
    #Only having state and school name was leading to cross matching in states where multiple schools with same name were present
    # clean_states <- all_states %>% 
    #  map(~select(., state, name, lat, lng)) %>%  
    #  map(~mutate_at(., vars(lat, lng), as.numeric)) %>% 
    #  bind_rows() %>% 
    #  filter(!is.na(lat))
    
    #Hence added as many parameters that could have been added out of "state", "name", "district", "county", "city" for each state
    clean_states <- all_states %>% 
     map(~select(., tidyselect::any_of(c("state", "name", "district", "county", "city", "lat","lng")))) %>% 
     map(~mutate_at(., v...
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
PallaviSRane (2024). Measles Immunization Rates in US Schools [Dataset]. https://www.kaggle.com/datasets/pallavisrane/measles-immunization-rated-in-us-schools
Organization logo

Measles Immunization Rates in US Schools

Immunization records from 464100 schools from 32 US states

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2024
Dataset provided by
Kaggle
Authors
PallaviSRane
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Area covered
United States
Description

The dataset includes the overall and MMR-specific vaccination rates for 46,410 schools in 32 states

The table contains the following columns:

|variable |class   |description |
|:--------|:---------|:-----------|
|index  |double  | Index ID |
|state  |character | School's state |
|year   |character | School academic year|
|name   |character | School name|
|type   |character | Whether a school is public, private, charter |
|city   |character | City |
|county  |character | County |
|district |character | School district |
|enroll  |double  | Enrollment |
|mmr   |double  | School's Measles, Mumps, and Rubella (MMR) vaccination rate |
|overall |double  | School's overall vaccination rate|
|xrel   |double | Percentage of students exempted from vaccination for religious reasons |
|xmed   |double  | Percentage of students exempted from vaccination for medical reasons |
|xper   |double  | Percentage of students exempted from vaccination for personal reasons |
|lat   |double  | Latitude |
|lng   |double  | Longitude |

Acknowledgements:

This data originally comes from #tidytuesday and is originally from The Wallstreet Journal. They recently published an article around 46,412 schools across 32 US States.

"This repository contains immunization rate data for schools across the U.S., as compiled by The Wall Street Journal. The dataset includes the overall and MMR-specific vaccination rates for 46,412 schools in 32 states. As used in "What's the Measles Vaccination Rate at Your Child's School?".

Vaccination rates are for the 2017-18 school year for Colorado, Connecticut, Minnesota, Montana, New Jersey, New York, North Dakota, Pennsylvania, South Dakota, Utah and Washington. Rates for other states are 2018-19."
(The #tidytuesday page mentions 46412 records, but the file loads 1 less, and there 1 duplication: 283 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA 284 New York 2017-18 Jackson Main Public Hempstead Nassau NA NA 100 -1 NA NA NA Hence, total of 46410 records if you remove the duplication.)

Data cleaning:

The initial cleaning code from #tidytuesday had to be modified because 1. It was resulting in an error, possibly because the page where the list of URLs for individual states was coming from has changed since the code was published.

  1. When we were adding the latitude and longitude data from the states to the original vaccination file, it was being done only with school name and if one state had multiple schools with the same name, that was leading to a many to many matching, resulting in a cartesian matching and duplication.

Code:

Following code adds latitude and longitude to the original dataset and removes any duplication giving 46410 records

Modifications are mentioned in comments

url_wsj <- "https://raw.githubusercontent.com/WSJ/measles-data/master/all-measles-rates.csv"

wsj <- read_csv(url_wsj)

list_of_urls <- "https://github.com/WSJ/measles-data/tree/master/individual-states"

raw_states <- list_of_urls %>% 
 read_html() %>% 
 html_table() %>% 
 .[[1]] %>% 
 select(1) %>% #changed select(Name) to select(1) becase there were three columns with headers 'Name'
 mutate(Name = str_remove(Name, "\.csv")) %>% 
 filter(str_length(Name) > 3, str_length(Name) < 20) %>% 
 pull(Name)

raw_states=raw_states[2:32] # had to add this line of code because the first element on the list was "parent directory.." and the last, 33rd element was "View all files"

all_states <- glue::glue("https://raw.githubusercontent.com/WSJ/measles-data/master/individual-states/{raw_states}.csv") %>% 
 map(read_csv)

#As it turns out not every state had all of state, city, county, district information. Hence in the original code was limiting the identifier column to just state.
#Only having state and school name was leading to cross matching in states where multiple schools with same name were present
# clean_states <- all_states %>% 
#  map(~select(., state, name, lat, lng)) %>%  
#  map(~mutate_at(., vars(lat, lng), as.numeric)) %>% 
#  bind_rows() %>% 
#  filter(!is.na(lat))

#Hence added as many parameters that could have been added out of "state", "name", "district", "county", "city" for each state
clean_states <- all_states %>% 
 map(~select(., tidyselect::any_of(c("state", "name", "district", "county", "city", "lat","lng")))) %>% 
 map(~mutate_at(., v...
Search
Clear search
Close search
Google apps
Main menu