4 datasets found

Social Contacts
kaggle.com
Updated Apr 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick (2020). Social Contacts [Dataset]. https://www.kaggle.com/datasets/bitsnpieces/social-contacts/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Patrick
Description
Inspiration

Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?

Context

With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.

To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.

Content

This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.

This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).

I used this R code to extract this data:

load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata View(contacts) contacts[["ALB"]][["home"]] contacts[["ITA"]][["all"]] rowSums(contacts[["ALB"]][["all"]]) out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) } out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) } out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) } m1 = data.frame(t(matrix(unlist(out1), nrow=16))) m2 = data.frame(t(matrix(unlist(out2), nrow=16))) m3 = data.frame(t(matrix(unlist(out3), nrow=16))) rownames(m1) = names(contacts) colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79") rownames(m2) = rownames(m1) rownames(m3) = rownames(m1) colnames(m2) = colnames(m1) colnames(m3) = colnames(m1) write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE) write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE) write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)

Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">

Acknowledgements

Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.

References

The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study

Projecting social contact matrices in 152 countries using contact surveys and demographic data

Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases (POLYMOD study)

Related resources

My starter notebook

http://www.socialcontactdata.org/

https://www.kaggle.com/tsubasatwi/close-contact-status-of-corona-in-japan

Facebook Data for Good Mobility Dashboard
Data from: Spatio-temporal dynamics of attacks around deaths of wolves: A...
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oksana Grente; Oksana Grente; Thomas Opitz; Thomas Opitz; Christophe Duchamp; Christophe Duchamp; Nolwenn Drouet-Hoguet; Nolwenn Drouet-Hoguet; Simon Chamaillé-Jammes; Simon Chamaillé-Jammes; Olivier Gimenez; Olivier Gimenez (2025). Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France [Dataset]. http://doi.org/10.5281/zenodo.14893823
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14893823
Dataset updated
Feb 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Oksana Grente; Oksana Grente; Thomas Opitz; Thomas Opitz; Christophe Duchamp; Christophe Duchamp; Nolwenn Drouet-Hoguet; Nolwenn Drouet-Hoguet; Simon Chamaillé-Jammes; Simon Chamaillé-Jammes; Olivier Gimenez; Olivier Gimenez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
France
Description
This repository contains the supplementary materials (Supplementary_figures.docx, Supplementary_tables.docx) of the manuscript: "Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France". This repository also provides the R codes and datasets necessary to run the analyses described in the manuscript.

The R datasets with suffix "_a" have anonymous spatial coordinates to respect confidentiality. Therefore, the preliminary preparation of the data is not provided in the public codes. These datasets, all geolocated and necessary to the analyses, are:

Attack_sf_a.RData: 19,302 analyzed wolf attacks on sheep

ID: unique ID of the attack

DATE: date of the attack

PASTURE: the related pasture ID from "Pasture_sf_a" where the attack is located

STATUS: column resulting from the preparation and the attribution of attacks to pastures (part 2.2.4 of the manuscript); not shown here to respect confidentiality

Pasture_sf_a.RData: 4987 analyzed pastures grazed by sheep

ID: unique ID of the pasture

CODE: Official code in the pastoral census

FLOCK_SIZE: maximum annual number of sheep grazing in the pasture

USED_MONTHS: months for which the pasture is grazed by sheep

Removal_sf_a.RData: 232 analyzed single wolf removal or groups of wolf removals

ID: unique ID of the removal

OVERLAP: are they single removal ("non-interacting" in the manuscript => "NO" here), or not ("interacting" in the manuscrit, here "SIMULTANEOUS" for removals occurring during the same operation or "NON-SIMULTANEOUS" if not).

DATE_MIN: date of the single removal or date of the first removal of a group

DATE_MAX: date of the single removal or date of the last removal of a group

CLASS: administrative type of the removal according to definitions from 2.1 part of the manuscript

SEX: sex or sexes of the removed wolves if known

AGE: class age of the removed wolves if known

BREEDER: breeding status of the removed female wolves, "Yes" for female breeder, "No" for female non-breeder. Males are "No" by default, when necropsied; dead individuals with NA were not found.

SEASON: season of the removal, as defined in part 2.3.4 of the manuscript

MASSIF: mountain range attributed to the removal, as defined in part 2.3.4 of the manuscript

Area_to_exclude_sf_a.RData: one row for each mountain range, corresponding to the area where removal controls of the mountain range could not be sampled, as defined in part 2.3.6 of the manuscript

These datasets were used to run the following analyses codes:

Code 1 : The file Kernel_wolf_culling_attacks_p.R contains the before-after analyses.

We start by delimiting the spatio-temporal buffer for each row of the "Removal_sf_a.RData" dataset.

We identify the attacks from "Attack_sf_a.RData" within each buffer, giving the data frame "Buffer_df" (one row per attack)

We select the pastures from "Pasture_sf_a.RData" within each buffer, giving the data frame "Buffer_sf" (one row per removal)

We calculate the spatial correction

We spatially slice each buffer into 200 rings, giving the data frame "Ring_sf" (one row per ring)

We add the total pastoral area of the ring of the attack ("SPATIAL_WEIGHT"), for each attack of each buffer, within Buffer_df ("Buffer_df.RData")

We calculate the pastoral correction

We create the pastoral matrix for each removal, giving a matrix of 200 rows (one for each ring) and 180 columns (one for each day, 90 days before the removal date and 90 day after the removal date), with the total pastoral area in use by sheep for each corresponding cell of the matrix (one element per removal, "Pastoral_matrix_lt.RData")

We simulate, for each removal, the random distribution of the attacks from "Buffer_df.RData" according to "Pastoral_matrix_lt.RData". The process is done 100 times (one element per simulation, "Buffer_simulation_lt.RData").

We estimate the attack intensities

We classified the removals into 20 subsets, according to part 2.3.4 of the manuscript ("Variables_lt.RData") (one element per subset)

We perform, for each subset, the kernel estimations with the observed attacks ("Kernel_lt.RData"), with the simulated attacks ("Kernel_simulation_lt.RData") and we correct the first kernel computations with the second ("Kernel_controlled_lt.RData") (one element per subset).

We calculate the trend of attack intensities, for each subset, that compares the total attack intensity before and after the removals (part 2.3.5 of the manuscript), giving "Trends_intensities_df.RData". (one row per subset)

We calculate the trend of attack intensities, for each subset, along the spatial axis, three times, one for each time analysis scale. This gives "Shift_df" (one row per ring and per time analysis scale.

Code 2 : The file Control_removals_p.R contains the control-impact analyses.

It starts with the simulation of 100 removal control sets ("Control_sf_lt_a.RData") from the real set of removals ("Removal_sf_a.RData"), that is done with the function "Control_fn" (l. 92).

The rest of the analyses follows the same process as in the first code "Kernel_wolf_culling_attacks_p.R", in order to apply the before-after analyses to each control set. All objects have the same structure as before, except that they are now a list, with one resulting element per control set. These objects have "control" in their names (not to be confused with "controlled" which refers to the pastoral correction already applied in the first code).

The code is also applied again, from l. 92 to l. 433, this time for the real set of removals (l. 121) - with "Simulated = FALSE" (l. 119). We could not simply use the results from the first code because the set of removals is restricted to removals attributed to mountain ranges only. There are 2 resulting objects: "Kernel_real_lt.RData" (observed real trends) and "Kernel_controlled_real_lt.RData" (real trends corrected for pastoral use).

The part of the code from line 439 to 524 relates to the calculations of the trends (for the real set and the control sets), as in the first code, giving "Trends_intensities_real_df.RData" and "Trends_intensities_control_lt.RData".

The part of the code from line 530 to 588 relates to the calculation of the 95% confidence intervals and the means of the intensity trends for each subset based on the results of the 100 control sets (Trends_intensities_mean_control_df.RData, Trends_intensities_CImin_control_df.RData and Trends_intensities_CImax_control_df.RData). This will be used to test the significativity of the real trends. This comparison is done right after, l. 595-627, and gives the data frame "Trends_comparison_df.RData".

Code 3 : The file Figures.R produces part of the figures from the manuscript:

"Dataset map": figure 1

"Buffer": figure 2 (then pasted in powerpoint)

"Kernel construction": figure 5 (then pasted in powerpoint)

"Trend distributions": figure 7

"Kernels": part of figures 10 and S2

"Attack shifts": figure 9 and S1

"Significant": figure 8
Z
A dataset for temporal analysis of files related to the JFK case
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luczak-Roesch, Markus (2020). A dataset for temporal analysis of files related to the JFK case [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1042153
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Luczak-Roesch, Markus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the content of the subset of all files with a correct publication date from the 2017 release of files related to the JFK case (retrieved from https://www.archives.gov/research/jfk/2017-release). This content was extracted from the source PDF files using the R OCR libraries tesseract and pdftools.

The code to derive the dataset is given as follows:

BEGIN R DATA PROCESSING SCRIPT

library(tesseract) library(pdftools)

pdfs <- list.files("[path to your output directory containing all PDF files]")

meta <- read.csv2("[path to your input directory]/jfkrelease-2017-dce65d0ec70a54d5744de17d280f3ad2.csv",header = T,sep = ',') #the meta file containing all metadata for the PDF files (e.g. publication date)

meta$Doc.Date <- as.character(meta$Doc.Date)

meta.clean <- meta[-which(meta$Doc.Date=="" | grepl("/0000",meta$Doc.Date)),] for(i in 1:nrow(meta.clean)){ meta.clean$Doc.Date[i] <- gsub("00","01",meta.clean$Doc.Date[i])

if(nchar(meta.clean$Doc.Date[i])<10){ meta.clean$Doc.Date[i]<-format(strptime(meta.clean$Doc.Date[i],format = "%d/%m/%y"),"%m/%d/%Y") }

}

meta.clean$Doc.Date <- strptime(meta.clean$Doc.Date,format = "%m/%d/%Y")

meta.clean <- meta.clean[order(meta.clean$Doc.Date),]

docs <- data.frame(content=character(0),dpub=character(0),stringsAsFactors = F) for(i in 1:nrow(meta.clean)){

for(i in 1:3){

pdf_prop <- pdftools::pdf_info(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i]))) tmp_files <- c() for(k in 1:pdf_prop$pages){ tmp_files <- c(tmp_files,paste0("/home/STAFF/luczakma/RProjects/JFK/data/tmp/",k)) }

img_file <- pdftools::pdf_convert(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i])), format = 'tiff', pages = NULL, dpi = 700,filenames = tmp_files)

txt <- ""

for(j in 1:length(img_file)){ extract <- ocr(img_file[j], engine = tesseract("eng")) #unlink(img_file) txt <- paste(txt,extract,collapse = " ") }

docs <- rbind(docs,data.frame(content=iconv(tolower(gsub("\s+"," ",gsub("[[:punct:]]|[ ]"," ",txt))),to="UTF-8"),dpub=format(meta.clean$Doc.Date[i],"%Y/%m/%d"),stringsAsFactors = F),stringsAsFactors = F) }

write.table(docs,"[path to your output directory]/documents.csv", row.names = F)

END R DATA PROCESSING SCRIPT
A
‘Possum Regression’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Possum Regression’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-possum-regression-0ed3/6b42ebdb/?iid=005-529&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Possum Regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/abrambeyer/openintro-possum on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Can you use your regression skills to predict the age of a possum, its head length, whether it is male or female? This classic practice regression dataset comes originally from the DAAG R package (datasets used in examples and exercises in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R"). This dataset is also used in the OpenIntro Statistics book chapter 8 Introduction to linear regression.

Content

From the DAAG R package: "*The possum data frame consists of nine morphometric measurements on each of 104 mountain brushtail possums, trapped at seven sites from Southern Victoria to central Queensland*."

Acknowledgements

Data originally found in the DAAG R package and used in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R").

A subset of the data was also put together for the OpenIntro Statistics book chapter 8 Introduction to linear regression.

Original Source of dataset: Lindenmayer, D. B., Viggers, K. L., Cunningham, R. B., and Donnelly, C. F. 1995. Morphological variation among columns of the mountain brushtail possum, Trichosurus caninus Ogilby (Phalangeridae: Marsupiala). Australian Journal of Zoology 43: 449-458.

Inspiration

Get your feet wet with regression techniques here on Kaggle by using this dataset. Perfect for beginners since the OpenIntro Statistics book does a good explanation in Chapter 8.

Can we use total length to predict a possum's head length?

Which possum body dimensions are most correlated with age and sex?

Can we classify a possum's sex by its body dimensions and location?

Can we predict a possum's trapping location from its body dimensions?

--- Original source retains full ownership of the source dataset ---
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick (2020). Social Contacts [Dataset]. https://www.kaggle.com/datasets/bitsnpieces/social-contacts/discussion

Social Contacts

Projected age and location specific mean contact rates across 152 countries

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 30, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Patrick

Description

Inspiration

Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?

Context

With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.

To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.

Content

This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.

This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).

I used this R code to extract this data:

load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
View(contacts)
contacts[["ALB"]][["home"]]
contacts[["ITA"]][["all"]]
rowSums(contacts[["ALB"]][["all"]])
out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
rownames(m1) = names(contacts)
colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
rownames(m2) = rownames(m1)
rownames(m3) = rownames(m1)
colnames(m2) = colnames(m1)
colnames(m3) = colnames(m1)
write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)

Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">

Acknowledgements

Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.

References

Related resources

Clear search

Close search

Google apps

Main menu

Social Contacts

Inspiration

Context

Content

Acknowledgements

References

Related resources

Data from: Spatio-temporal dynamics of attacks around deaths of wolves: A...

A dataset for temporal analysis of files related to the JFK case

BEGIN R DATA PROCESSING SCRIPT

for(i in 1:3){

END R DATA PROCESSING SCRIPT

‘Possum Regression’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Social Contacts

Projected age and location specific mean contact rates across 152 countries

Inspiration

Context

Content

Acknowledgements

References

Related resources