10 datasets found
  1. FacialRecognition

    • kaggle.com
    zip
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
    Explore at:
    zip(121674455 bytes)Available download formats
    Dataset updated
    Dec 1, 2016
    Authors
    TheNicelander
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    #https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

    ###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

    ###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

    ###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

    ###To look at samples of the data, uncomment this line:

    head(d.train)

    ###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

    im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

    im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

    ################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

    #strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

    ###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

    install.packages('foreach')

    library("foreach", lib.loc="~/R/win-library/3.3")

    ###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

    ###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

    save(d.train, im.train, d.test, im.test, file='data.Rd')

    load('data.Rd')

    #each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

    #im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

    #To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

    #Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

    #Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

    #there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

    #One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

    #To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

    #The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

    install.packages('reshape2')

    library(...

  2. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  3. d

    DONALD

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miani, Alessandro (2024). DONALD [Dataset]. http://doi.org/10.7910/DVN/VDQL8A
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Miani, Alessandro
    Description

    DONALD’s raw texts are stored in the R data frame “DONALD.txt.rdata” (size = 12.5 Gb, 4.87 Gb compressed) consisting of 2,173,172 rows (one per document) × 2 columns, namely document ID and text.

  4. 120 years of Olympic history: athletes and results

    • kaggle.com
    zip
    Updated Jun 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    rgriffin (2018). 120 years of Olympic history: athletes and results [Dataset]. https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
    Explore at:
    zip(5690772 bytes)Available download formats
    Dataset updated
    Jun 15, 2018
    Authors
    rgriffin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.

    Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.

    Content

    The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:

    1. ID - Unique number for each athlete
    2. Name - Athlete's name
    3. Sex - M or F
    4. Age - Integer
    5. Height - In centimeters
    6. Weight - In kilograms
    7. Team - Team name
    8. NOC - National Olympic Committee 3-letter code
    9. Games - Year and season
    10. Year - Integer
    11. Season - Summer or Winter
    12. City - Host city
    13. Sport - Sport
    14. Event - Event
    15. Medal - Gold, Silver, Bronze, or NA

    Acknowledgements

    The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.

    Inspiration

    This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.

  5. s

    WoSIS snapshot - December 2023

    • repository.soilwise-he.eu
    • data.isric.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WoSIS snapshot - December 2023 [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/e50f84e1-aa5b-49cb-bd6b-cd581232a2ec
    Explore at:
    Description

    ABSTRACT:

    The World Soil Information Service (WoSIS) provides quality-assessed and standardized soil profile data to support digital soil mapping and environmental applications at broad scale levels. Since the release of the ‘WoSIS snapshot 2019’ many new soil data were shared with us, registered in the ISRIC data repository, and subsequently standardized in accordance with the licenses specified by the data providers. The source data were contributed by a wide range of data providers, therefore special attention was paid to the standardization of soil property definitions, soil analytical procedures and soil property values (and units of measurement).

    We presently consider the following soil chemical properties (organic carbon, total carbon, total carbonate equivalent, total Nitrogen, Phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures (aggregates) that are operationally comparable.

    For each profile we provide the original soil classification (FAO, WRB, USDA, and version) and horizon designations as far as these have been specified in the source databases.

    Three measures for 'fitness-for-intended-use' are provided: positional uncertainty (for site locations), time of sampling/description, and a first approximation for the uncertainty associated with the operationally defined analytical methods. These measures should be considered during digital soil mapping and subsequent earth system modelling that use the present set of soil data.

    DATA SET DESCRIPTION:

    The 'WoSIS 2023 snapshot' comprises data for 228k profiles from 217k geo-referenced sites that originate from 174 countries. The profiles represent over 900k soil layers (or horizons) and over 6 million records. The actual number of measurements for each property varies (greatly) between profiles and with depth, this generally depending on the objectives of the initial soil sampling programmes.

    The data are provided in TSV (tab separated values) format and as GeoPackage. The zip-file (446 Mb) contains the following files:

    • Readme_WoSIS_202312_v2.pdf: Provides a short description of the dataset, file structure, column names, units and category values (this file is also available directly under 'online resources'). The pdf includes links to tutorials for downloading the TSV files into R respectively Excel. See also 'HOW TO READ TSV FILES INTO R AND PYTHON' in the next section.

    • wosis_202312_observations.tsv: This file lists the four to six letter codes for each observation, whether the observation is for a site/profile or layer (horizon), the unit of measurement and the number of profiles respectively layers represented in the snapshot. It also provides an estimate for the inferred accuracy for the laboratory measurements.

    • wosis_202312_sites.tsv: This file characterizes the site location where profiles were sampled.

    • wosis_2023112_profiles: Presents the unique profile ID (i.e. primary key), site_id, source of the data, country ISO code and name, positional uncertainty, latitude and longitude (WGS 1984), maximum depth of soil described and sampled, as well as information on the soil classification system and edition. Depending on the soil classification system used, the number of fields will vary .

    • wosis_202312_layers: This file characterises the layers (or horizons) per profile, and lists their upper and lower depths (cm).

    • wosis_202312_xxxx.tsv : This type of file presents results for each observation (e.g. “xxxx” = “BDFIOD” ), as defined under “code” in file wosis_202312_observation.tsv. (e.g. wosis_202311_bdfiod.tsv).

    • wosis_202312.gpkg: Contains the above datafiles in GeoPackage format (which stores the files within an SQLite database).

    HOW TO READ TSV FILES INTO R AND PYTHON:

    A) To read the data in R, please uncompress the ZIP file and specify the uncompressed folder.

    setwd("/YourFolder/WoSIS_2023_December/") ## For example: setwd('D:/WoSIS_2023_December/')

    Then use read_tsv to read the TSV files, specifying the data types for each column (c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time).

    observations = readr::read_tsv('wosis_202312_observations.tsv', col_types='cccciid')
    observations ## show columns and first 10 rows

    sites = readr::read_tsv('wosis_202312_sites.tsv', col_types='iddcccc') sites

    profiles = readr::read_tsv('wosis_202312_profiles.tsv', col_types='icciccddcccccciccccicccci') profiles

    layers = readr::read_tsv('wosis_202312_layers.tsv', col_types='iiciciiilcc') layers

    Do this for each observation 'XXXX', e.g. file 'Wosis_202312_orgc.tsv':

    orgc = readr::read_tsv('wosis_202312_orgc.tsv', col_types='iicciilccdccddccccc')
    orgc

    Note: One may also use the following R code (example is for file 'observations.tsv'): observations <- read.table("wosis_202312_observations.tsv", sep = "\t", header = TRUE, quote = "", comment.char = "", stringsAsFactors = FALSE )

    B) To read the files into python first decompress the files to your selected folder. Then in python:

    import the required library

    import pandas as pd

    Read the observations data

    observations = pd.read_csv("wosis_202312_observations.tsv", sep="\t") # print the data frame header and some rows observations.head()

    Read the sites data

    sites = pd.read_csv("wosis_202312_sites.tsv", sep="\t")

    Read the profiles data

    profiles = pd.read_csv("wosis_202312_profiles.tsv", sep="\t")

    Read the layers data

    layers = pd.read_csv("wosis_202312_layers.tsv", sep="\t")

    Read the soil property data, e.g. 'cfvo' (do this for each observation)

    cfvo = pd.read_csv("wosis_202312_cfvo.tsv", sep="\t")

    CITATION: Calisto, L., de Sousa, L.M., Batjes, N.H., 2023. Standardised soil profile data for the world (WoSIS snapshot – December 2023), https://doi.org/10.17027/isric-wdcsoils-20231130

    Supplement to: Batjes N.H., Calisto, L. and de Sousa L.M., 2023. Providing quality-assessed and standardised soil data to support global mapping and modelling (WoSIS snapshot 2023). Earth System Science Data, https://doi.org/10.5194/essd-16-4735-2024.

  6. d

    Data from: An assessment of wheat yield sensitivity and breeding gains in...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Mar 5, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharon M. Gourdji; Ky L. Mathews; Matthew Reynolds; Jose Crossa; David B. Lobell (2013). An assessment of wheat yield sensitivity and breeding gains in hot environments [Dataset]. http://doi.org/10.5061/dryad.525vm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 5, 2013
    Dataset provided by
    Dryad
    Authors
    Sharon M. Gourdji; Ky L. Mathews; Matthew Reynolds; Jose Crossa; David B. Lobell
    Time period covered
    Nov 8, 2012
    Description

    regression.dat_nurseries_ADW2.RdatThis R data frame contains 1353 rows corresponding to the international trials in the CIMMYT database used in this study. The column names should be self-descriptive, and contain all the predictors used for this regression analysis.

  7. The first two rows of a pandas DataFrame ready to be used with GLAM.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felix Molter; Armin W. Thomas; Hauke R. Heekeren; Peter N. C. Mohr (2023). The first two rows of a pandas DataFrame ready to be used with GLAM. [Dataset]. http://doi.org/10.1371/journal.pone.0226428.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Felix Molter; Armin W. Thomas; Hauke R. Heekeren; Peter N. C. Mohr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The first two rows of a pandas DataFrame ready to be used with GLAM.

  8. s

    Data.Rda for How uncertain is the survival extrapolation? A study of the...

    • orda.shef.ac.uk
    application/gzip
    Updated Sep 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Kearns (2019). Data.Rda for How uncertain is the survival extrapolation? A study of the impact of different parametric survival models on extrapolated uncertainty about hazard functions, lifetime mean survival and cost-effectiveness [Dataset]. http://doi.org/10.15131/shef.data.9751907.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 10, 2019
    Dataset provided by
    The University of Sheffield
    Authors
    Benjamin Kearns
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An R Rda-file containing the four hypothetical datasets used in the analysis (Flat, increasing, decreasing, unimodal). These are stored in a single data-frame, where the 400 rows correspond to observations. For each dataset there are three variables: an event indicator = 1 for if death was during follow-up, else = 0 (suffix "_dead"), the true time of death (suffix "_time") and the observed follow-up time, which will = 1 if the true time of death > 1 (suffix "_obs"). Hence there are twelve columns.A script is provided seperately within this project (Analysis.R) which includes the code used to analyse this dataset in order to obtain the results reported in the manuscript "How uncertain is the survival extrapolation? A study of the impact of different parametric survival models on extrapolated uncertainty about hazard functions, lifetime mean survival and cost-effectiveness."

  9. Data from: Streptococcus pyogenes pharyngitis elicits diverse antibody...

    • zenodo.org
    bin, csv, html
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danika Hill; Danika Hill (2024). Streptococcus pyogenes pharyngitis elicits diverse antibody responses to key vaccine antigens influenced by the imprint of past infections. [Dataset]. http://doi.org/10.5281/zenodo.13347362
    Explore at:
    bin, csv, htmlAvailable download formats
    Dataset updated
    Oct 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Danika Hill; Danika Hill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here you will find the raw data (RawData.RData) and code (CHIVAS_SEROLOGY_Code.Rmd, an R Markdown file) for generating the analysis and figures for the following publication:

    Streptococcus pyogenes pharyngitis elicits diverse antibody responses to key vaccine antigens influenced by the imprint of past infections.

    Joshua Osowicki1,2,3 #, Hannah R Frost1 #, Kristy I Azzopardi1, Alana L Whitcombe4, Reuben McGregor4, Lauren H. Carlton4, Ciara Baker1, Loraine Fabri1,5,6, Manisha Pandey7, Michael F Good7, Jonathan R. Carapetis8,9,10, Mark J Walker11,12,13, Pierre R Smeesters1,2,5,6, Paul V Licciardi2,14, Nicole J Moreland4 *, Danika L Hill15 *, Andrew C Steer1,2,3 *

    Provided in the RData file are the following items:

    Dataframes:

    "outcome" : clinical variables associated with human challenge for each participant

    "data" : ELISA and functional antibody responses for human challenge participants. Each timepoint and isotype for each antigen as seperate column)

    "data_long": Data equivalent to "data" file but in long format, i.e. One column for each antigen, timepoint and isotype as factors.

    "data.melt" : Data equivalent to "data" file but in longer format , i.e. timepoint, isotype and antigen as factors, 'value' as ELISA AU.

    "luminex" : IgG responses to 6 antigens analysed by luminex bead-based assay in human challenge participants.

    "luminex.children" : IgG responses to 6 antigen analysed by luminex bead-based assay in children

    Vectors:

    "pharyngitis" : participant "id" for the 19 individuals that developed pharyngitis.

    "Antigen.Order" : relates to "Main" antigen classification used in Figure 2

    'additional" : relates to "Additional

    Function:

    "custom_theme" : used as a theme when using ggplot to graph.

    Adobe Illustrator or Inkscape were used to generate the final image files for publication, with some graph editing to axes labels, font size, adding p-values etc.

    Additional files:

    3 .csv files have been included for download

    "ELISA_data_wide_format.csv", a wide format data table of 25 human challenge individuals and 219 variables. Equivalent to the 'data' dataframe in the RData file

    "CHIVAS_luminex.csv", a long format data table of 25 human challenge participants at 1 week, 1 month, and 3 months. Equivalent to the 'luminex' dataframe in the RData file.

    "Luminex.children.csv", a datatable of 6 luminex variables for 39 children (healthy and post pharyngitis). Equivalent to the 'luminex.children' dataframe in the RData file.

  10. d

    Data and analysis from: Body mass, temperature, and depth shape the maximum...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastián A. Pardo; Nicholas K. Dulvy (2022). Data and analysis from: Body mass, temperature, and depth shape the maximum intrinsic rate of population increase in sharks and rays [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 2, 2022
    Dataset provided by
    Dryad
    Authors
    Sebastián A. Pardo; Nicholas K. Dulvy
    Time period covered
    Sep 19, 2022
    Description

    Analyses are reproducible using version 3.3.2 or above (R Core Team 2016). Files needed for reproducing the analyses are: chond-data.csv: Data frame with 63 rows (species) and 11 variables. Some of these variables are based on the same life history trait but are transformed for ease of interpretation and analysis. stein-et-al-single.tree: Phylogenetic tree with scaled branch lengths from Stein et al. (2018) used in analyses. These are freely downloadable from http://vertlife.org/sharktree/. rmax-scaling-analysis.R: R code with minimum working example of how to load data files, fit models phylogenetic linear models using the pgls function in the caper package, run information-theoretic comparisons, and check diagnostics.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
Organization logo

FacialRecognition

Test environment for FacialRecognition competition

Explore at:
zip(121674455 bytes)Available download formats
Dataset updated
Dec 1, 2016
Authors
TheNicelander
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

#https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

###To look at samples of the data, uncomment this line:

head(d.train)

###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

#strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

install.packages('foreach')

library("foreach", lib.loc="~/R/win-library/3.3")

###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

#each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

#im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

#To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

#Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

#Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

#there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

#One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

#To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

#The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

install.packages('reshape2')

library(...

Search
Clear search
Close search
Google apps
Main menu