Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?
With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.
To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.
This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.
This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).
I used this R code to extract this data:
load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
View(contacts)
contacts[["ALB"]][["home"]]
contacts[["ITA"]][["all"]]
rowSums(contacts[["ALB"]][["all"]])
out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
rownames(m1) = names(contacts)
colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
rownames(m2) = rownames(m1)
rownames(m3) = rownames(m1)
colnames(m2) = colnames(m1)
colnames(m3) = colnames(m1)
write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)
Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">
Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the supplementary materials (Supplementary_figures.docx, Supplementary_tables.docx) of the manuscript: "Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France". This repository also provides the R codes and datasets necessary to run the analyses described in the manuscript.
The R datasets with suffix "_a" have anonymous spatial coordinates to respect confidentiality. Therefore, the preliminary preparation of the data is not provided in the public codes. These datasets, all geolocated and necessary to the analyses, are:
These datasets were used to run the following analyses codes:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the content of the subset of all files with a correct publication date from the 2017 release of files related to the JFK case (retrieved from https://www.archives.gov/research/jfk/2017-release). This content was extracted from the source PDF files using the R OCR libraries tesseract and pdftools.
The code to derive the dataset is given as follows:
library(tesseract) library(pdftools)
pdfs <- list.files("[path to your output directory containing all PDF files]")
meta <- read.csv2("[path to your input directory]/jfkrelease-2017-dce65d0ec70a54d5744de17d280f3ad2.csv",header = T,sep = ',') #the meta file containing all metadata for the PDF files (e.g. publication date)
meta$Doc.Date <- as.character(meta$Doc.Date)
meta.clean <- meta[-which(meta$Doc.Date=="" | grepl("/0000",meta$Doc.Date)),] for(i in 1:nrow(meta.clean)){ meta.clean$Doc.Date[i] <- gsub("00","01",meta.clean$Doc.Date[i])
if(nchar(meta.clean$Doc.Date[i])<10){ meta.clean$Doc.Date[i]<-format(strptime(meta.clean$Doc.Date[i],format = "%d/%m/%y"),"%m/%d/%Y") }
}
meta.clean$Doc.Date <- strptime(meta.clean$Doc.Date,format = "%m/%d/%Y")
meta.clean <- meta.clean[order(meta.clean$Doc.Date),]
docs <- data.frame(content=character(0),dpub=character(0),stringsAsFactors = F) for(i in 1:nrow(meta.clean)){
pdf_prop <- pdftools::pdf_info(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i]))) tmp_files <- c() for(k in 1:pdf_prop$pages){ tmp_files <- c(tmp_files,paste0("/home/STAFF/luczakma/RProjects/JFK/data/tmp/",k)) }
img_file <- pdftools::pdf_convert(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i])), format = 'tiff', pages = NULL, dpi = 700,filenames = tmp_files)
txt <- ""
for(j in 1:length(img_file)){ extract <- ocr(img_file[j], engine = tesseract("eng")) #unlink(img_file) txt <- paste(txt,extract,collapse = " ") }
docs <- rbind(docs,data.frame(content=iconv(tolower(gsub("\s+"," ",gsub("[[:punct:]]|[ ]"," ",txt))),to="UTF-8"),dpub=format(meta.clean$Doc.Date[i],"%Y/%m/%d"),stringsAsFactors = F),stringsAsFactors = F) }
write.table(docs,"[path to your output directory]/documents.csv", row.names = F)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Possum Regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/abrambeyer/openintro-possum on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Can you use your regression skills to predict the age of a possum, its head length, whether it is male or female? This classic practice regression dataset comes originally from the DAAG R package (datasets used in examples and exercises in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R"). This dataset is also used in the OpenIntro Statistics book chapter 8 Introduction to linear regression.
From the DAAG R package: "*The possum data frame consists of nine morphometric measurements on each of 104 mountain brushtail possums, trapped at seven sites from Southern Victoria to central Queensland*."
Data originally found in the DAAG R package and used in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R").
A subset of the data was also put together for the OpenIntro Statistics book chapter 8 Introduction to linear regression.
Original Source of dataset: Lindenmayer, D. B., Viggers, K. L., Cunningham, R. B., and Donnelly, C. F. 1995. Morphological variation among columns of the mountain brushtail possum, Trichosurus caninus Ogilby (Phalangeridae: Marsupiala). Australian Journal of Zoology 43: 449-458.
Get your feet wet with regression techniques here on Kaggle by using this dataset. Perfect for beginners since the OpenIntro Statistics book does a good explanation in Chapter 8.
--- Original source retains full ownership of the source dataset ---
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?
With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.
To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.
This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.
This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).
I used this R code to extract this data:
load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
View(contacts)
contacts[["ALB"]][["home"]]
contacts[["ITA"]][["all"]]
rowSums(contacts[["ALB"]][["all"]])
out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
rownames(m1) = names(contacts)
colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
rownames(m2) = rownames(m1)
rownames(m3) = rownames(m1)
colnames(m2) = colnames(m1)
colnames(m3) = colnames(m1)
write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)
Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">
Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.