4 datasets found
  1. Social Contacts

    • kaggle.com
    Updated Apr 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick (2020). Social Contacts [Dataset]. https://www.kaggle.com/datasets/bitsnpieces/social-contacts/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Patrick
    Description

    Inspiration

    Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?

    Context

    With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.

    To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.

    Content

    This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.

    This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).

    I used this R code to extract this data:

    load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
    View(contacts)
    contacts[["ALB"]][["home"]]
    contacts[["ITA"]][["all"]]
    rowSums(contacts[["ALB"]][["all"]])
    out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
    out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
    out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
    m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
    m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
    m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
    rownames(m1) = names(contacts)
    colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
    rownames(m2) = rownames(m1)
    rownames(m3) = rownames(m1)
    colnames(m2) = colnames(m1)
    colnames(m3) = colnames(m1)
    write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
    write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
    write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)
    

    Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">

    Acknowledgements

    Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.

    References

    Related resources

  2. Data from: Spatio-temporal dynamics of attacks around deaths of wolves: A...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv
    Updated Feb 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oksana Grente; Oksana Grente; Thomas Opitz; Thomas Opitz; Christophe Duchamp; Christophe Duchamp; Nolwenn Drouet-Hoguet; Nolwenn Drouet-Hoguet; Simon Chamaillé-Jammes; Simon Chamaillé-Jammes; Olivier Gimenez; Olivier Gimenez (2025). Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France [Dataset]. http://doi.org/10.5281/zenodo.14893823
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Oksana Grente; Oksana Grente; Thomas Opitz; Thomas Opitz; Christophe Duchamp; Christophe Duchamp; Nolwenn Drouet-Hoguet; Nolwenn Drouet-Hoguet; Simon Chamaillé-Jammes; Simon Chamaillé-Jammes; Olivier Gimenez; Olivier Gimenez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France
    Description

    This repository contains the supplementary materials (Supplementary_figures.docx, Supplementary_tables.docx) of the manuscript: "Spatio-temporal dynamics of attacks around deaths of wolves: A statistical assessment of lethal control efficiency in France". This repository also provides the R codes and datasets necessary to run the analyses described in the manuscript.

    The R datasets with suffix "_a" have anonymous spatial coordinates to respect confidentiality. Therefore, the preliminary preparation of the data is not provided in the public codes. These datasets, all geolocated and necessary to the analyses, are:

    • Attack_sf_a.RData: 19,302 analyzed wolf attacks on sheep
      • ID: unique ID of the attack
      • DATE: date of the attack
      • PASTURE: the related pasture ID from "Pasture_sf_a" where the attack is located
      • STATUS: column resulting from the preparation and the attribution of attacks to pastures (part 2.2.4 of the manuscript); not shown here to respect confidentiality
    • Pasture_sf_a.RData: 4987 analyzed pastures grazed by sheep
      • ID: unique ID of the pasture
      • CODE: Official code in the pastoral census
      • FLOCK_SIZE: maximum annual number of sheep grazing in the pasture
      • USED_MONTHS: months for which the pasture is grazed by sheep
    • Removal_sf_a.RData: 232 analyzed single wolf removal or groups of wolf removals
      • ID: unique ID of the removal
      • OVERLAP: are they single removal ("non-interacting" in the manuscript => "NO" here), or not ("interacting" in the manuscrit, here "SIMULTANEOUS" for removals occurring during the same operation or "NON-SIMULTANEOUS" if not).
      • DATE_MIN: date of the single removal or date of the first removal of a group
      • DATE_MAX: date of the single removal or date of the last removal of a group
      • CLASS: administrative type of the removal according to definitions from 2.1 part of the manuscript
      • SEX: sex or sexes of the removed wolves if known
      • AGE: class age of the removed wolves if known
      • BREEDER: breeding status of the removed female wolves, "Yes" for female breeder, "No" for female non-breeder. Males are "No" by default, when necropsied; dead individuals with NA were not found.
      • SEASON: season of the removal, as defined in part 2.3.4 of the manuscript
      • MASSIF: mountain range attributed to the removal, as defined in part 2.3.4 of the manuscript
    • Area_to_exclude_sf_a.RData: one row for each mountain range, corresponding to the area where removal controls of the mountain range could not be sampled, as defined in part 2.3.6 of the manuscript

    These datasets were used to run the following analyses codes:

    • Code 1 : The file Kernel_wolf_culling_attacks_p.R contains the before-after analyses.
      • We start by delimiting the spatio-temporal buffer for each row of the "Removal_sf_a.RData" dataset.
        • We identify the attacks from "Attack_sf_a.RData" within each buffer, giving the data frame "Buffer_df" (one row per attack)
        • We select the pastures from "Pasture_sf_a.RData" within each buffer, giving the data frame "Buffer_sf" (one row per removal)
      • We calculate the spatial correction
        • We spatially slice each buffer into 200 rings, giving the data frame "Ring_sf" (one row per ring)
        • We add the total pastoral area of the ring of the attack ("SPATIAL_WEIGHT"), for each attack of each buffer, within Buffer_df ("Buffer_df.RData")
      • We calculate the pastoral correction
        • We create the pastoral matrix for each removal, giving a matrix of 200 rows (one for each ring) and 180 columns (one for each day, 90 days before the removal date and 90 day after the removal date), with the total pastoral area in use by sheep for each corresponding cell of the matrix (one element per removal, "Pastoral_matrix_lt.RData")
        • We simulate, for each removal, the random distribution of the attacks from "Buffer_df.RData" according to "Pastoral_matrix_lt.RData". The process is done 100 times (one element per simulation, "Buffer_simulation_lt.RData").
      • We estimate the attack intensities
        • We classified the removals into 20 subsets, according to part 2.3.4 of the manuscript ("Variables_lt.RData") (one element per subset)
        • We perform, for each subset, the kernel estimations with the observed attacks ("Kernel_lt.RData"), with the simulated attacks ("Kernel_simulation_lt.RData") and we correct the first kernel computations with the second ("Kernel_controlled_lt.RData") (one element per subset).
        • We calculate the trend of attack intensities, for each subset, that compares the total attack intensity before and after the removals (part 2.3.5 of the manuscript), giving "Trends_intensities_df.RData". (one row per subset)
        • We calculate the trend of attack intensities, for each subset, along the spatial axis, three times, one for each time analysis scale. This gives "Shift_df" (one row per ring and per time analysis scale.
    • Code 2 : The file Control_removals_p.R contains the control-impact analyses.
      • It starts with the simulation of 100 removal control sets ("Control_sf_lt_a.RData") from the real set of removals ("Removal_sf_a.RData"), that is done with the function "Control_fn" (l. 92).
      • The rest of the analyses follows the same process as in the first code "Kernel_wolf_culling_attacks_p.R", in order to apply the before-after analyses to each control set. All objects have the same structure as before, except that they are now a list, with one resulting element per control set. These objects have "control" in their names (not to be confused with "controlled" which refers to the pastoral correction already applied in the first code).
      • The code is also applied again, from l. 92 to l. 433, this time for the real set of removals (l. 121) - with "Simulated = FALSE" (l. 119). We could not simply use the results from the first code because the set of removals is restricted to removals attributed to mountain ranges only. There are 2 resulting objects: "Kernel_real_lt.RData" (observed real trends) and "Kernel_controlled_real_lt.RData" (real trends corrected for pastoral use).
      • The part of the code from line 439 to 524 relates to the calculations of the trends (for the real set and the control sets), as in the first code, giving "Trends_intensities_real_df.RData" and "Trends_intensities_control_lt.RData".
      • The part of the code from line 530 to 588 relates to the calculation of the 95% confidence intervals and the means of the intensity trends for each subset based on the results of the 100 control sets (Trends_intensities_mean_control_df.RData, Trends_intensities_CImin_control_df.RData and Trends_intensities_CImax_control_df.RData). This will be used to test the significativity of the real trends. This comparison is done right after, l. 595-627, and gives the data frame "Trends_comparison_df.RData".
    • Code 3 : The file Figures.R produces part of the figures from the manuscript:
      • "Dataset map": figure 1
      • "Buffer": figure 2 (then pasted in powerpoint)
      • "Kernel construction": figure 5 (then pasted in powerpoint)
      • "Trend distributions": figure 7
      • "Kernels": part of figures 10 and S2
      • "Attack shifts": figure 9 and S1
      • "Significant": figure 8
  3. Z

    A dataset for temporal analysis of files related to the JFK case

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luczak-Roesch, Markus (2020). A dataset for temporal analysis of files related to the JFK case [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1042153
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Luczak-Roesch, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the content of the subset of all files with a correct publication date from the 2017 release of files related to the JFK case (retrieved from https://www.archives.gov/research/jfk/2017-release). This content was extracted from the source PDF files using the R OCR libraries tesseract and pdftools.

    The code to derive the dataset is given as follows:

    BEGIN R DATA PROCESSING SCRIPT

    library(tesseract) library(pdftools)

    pdfs <- list.files("[path to your output directory containing all PDF files]")

    meta <- read.csv2("[path to your input directory]/jfkrelease-2017-dce65d0ec70a54d5744de17d280f3ad2.csv",header = T,sep = ',') #the meta file containing all metadata for the PDF files (e.g. publication date)

    meta$Doc.Date <- as.character(meta$Doc.Date)

    meta.clean <- meta[-which(meta$Doc.Date=="" | grepl("/0000",meta$Doc.Date)),] for(i in 1:nrow(meta.clean)){ meta.clean$Doc.Date[i] <- gsub("00","01",meta.clean$Doc.Date[i])

    if(nchar(meta.clean$Doc.Date[i])<10){ meta.clean$Doc.Date[i]<-format(strptime(meta.clean$Doc.Date[i],format = "%d/%m/%y"),"%m/%d/%Y") }

    }

    meta.clean$Doc.Date <- strptime(meta.clean$Doc.Date,format = "%m/%d/%Y")

    meta.clean <- meta.clean[order(meta.clean$Doc.Date),]

    docs <- data.frame(content=character(0),dpub=character(0),stringsAsFactors = F) for(i in 1:nrow(meta.clean)){

    for(i in 1:3){

    pdf_prop <- pdftools::pdf_info(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i]))) tmp_files <- c() for(k in 1:pdf_prop$pages){ tmp_files <- c(tmp_files,paste0("/home/STAFF/luczakma/RProjects/JFK/data/tmp/",k)) }

    img_file <- pdftools::pdf_convert(paste0("[path to your output directory]/",tolower(meta.clean$File.Name[i])), format = 'tiff', pages = NULL, dpi = 700,filenames = tmp_files)

    txt <- ""

    for(j in 1:length(img_file)){ extract <- ocr(img_file[j], engine = tesseract("eng")) #unlink(img_file) txt <- paste(txt,extract,collapse = " ") }

    docs <- rbind(docs,data.frame(content=iconv(tolower(gsub("\s+"," ",gsub("[[:punct:]]|[ ]"," ",txt))),to="UTF-8"),dpub=format(meta.clean$Doc.Date[i],"%Y/%m/%d"),stringsAsFactors = F),stringsAsFactors = F) }

    write.table(docs,"[path to your output directory]/documents.csv", row.names = F)

    END R DATA PROCESSING SCRIPT

  4. A

    ‘Possum Regression’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Possum Regression’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-possum-regression-0ed3/6b42ebdb/?iid=005-529&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Possum Regression’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/abrambeyer/openintro-possum on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Can you use your regression skills to predict the age of a possum, its head length, whether it is male or female? This classic practice regression dataset comes originally from the DAAG R package (datasets used in examples and exercises in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R"). This dataset is also used in the OpenIntro Statistics book chapter 8 Introduction to linear regression.

    Content

    From the DAAG R package: "*The possum data frame consists of nine morphometric measurements on each of 104 mountain brushtail possums, trapped at seven sites from Southern Victoria to central Queensland*."

    Acknowledgements

    Data originally found in the DAAG R package and used in the book Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R").

    A subset of the data was also put together for the OpenIntro Statistics book chapter 8 Introduction to linear regression.

    Original Source of dataset: Lindenmayer, D. B., Viggers, K. L., Cunningham, R. B., and Donnelly, C. F. 1995. Morphological variation among columns of the mountain brushtail possum, Trichosurus caninus Ogilby (Phalangeridae: Marsupiala). Australian Journal of Zoology 43: 449-458.

    Inspiration

    Get your feet wet with regression techniques here on Kaggle by using this dataset. Perfect for beginners since the OpenIntro Statistics book does a good explanation in Chapter 8.

    • Can we use total length to predict a possum's head length?
    • Which possum body dimensions are most correlated with age and sex?
    • Can we classify a possum's sex by its body dimensions and location?
    • Can we predict a possum's trapping location from its body dimensions?

    --- Original source retains full ownership of the source dataset ---

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patrick (2020). Social Contacts [Dataset]. https://www.kaggle.com/datasets/bitsnpieces/social-contacts/discussion
Organization logo

Social Contacts

Projected age and location specific mean contact rates across 152 countries

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Patrick
Description

Inspiration

Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?

Context

With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.

To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.

Content

This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.

This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).

I used this R code to extract this data:

load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
View(contacts)
contacts[["ALB"]][["home"]]
contacts[["ITA"]][["all"]]
rowSums(contacts[["ALB"]][["all"]])
out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
rownames(m1) = names(contacts)
colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
rownames(m2) = rownames(m1)
rownames(m3) = rownames(m1)
colnames(m2) = colnames(m1)
colnames(m3) = colnames(m1)
write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)

Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">

Acknowledgements

Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.

References

Related resources

Search
Clear search
Close search
Google apps
Main menu