10 datasets found

FacialRecognition
kaggle.com
zip
Updated Dec 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition
Explore at:
zip(121674455 bytes)Available download formats
Dataset updated
Dec 1, 2016
Authors
TheNicelander
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description

#https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

###To look at samples of the data, uncomment this line:

head(d.train)

###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

#strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

install.packages('foreach')

library("foreach", lib.loc="~/R/win-library/3.3")

###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

#each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

#im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

#To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

#Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

#Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

#there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

#One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

#To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

#The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

install.packages('reshape2')

library(...
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
d
DONALD
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miani, Alessandro (2024). DONALD [Dataset]. http://doi.org/10.7910/DVN/VDQL8A
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/VDQL8A
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Miani, Alessandro
Description
DONALD’s raw texts are stored in the R data frame “DONALD.txt.rdata” (size = 12.5 Gb, 4.87 Gb compressed) consisting of 2,173,172 rows (one per document) × 2 columns, namely document ID and text.
120 years of Olympic history: athletes and results
kaggle.com
zip
Updated Jun 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
rgriffin (2018). 120 years of Olympic history: athletes and results [Dataset]. https://www.kaggle.com/datasets/heesoo37/120-years-of-olympic-history-athletes-and-results
Explore at:
zip(5690772 bytes)Available download formats
Dataset updated
Jun 15, 2018
Authors
rgriffin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.

Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.

Content

The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:

ID - Unique number for each athlete

Name - Athlete's name

Sex - M or F

Age - Integer

Height - In centimeters

Weight - In kilograms

Team - Team name

NOC - National Olympic Committee 3-letter code

Games - Year and season

Year - Integer

Season - Summer or Winter

City - Host city

Sport - Sport

Event - Event

Medal - Gold, Silver, Bronze, or NA

Acknowledgements

The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.

Inspiration

This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.
s
WoSIS snapshot - December 2023
repository.soilwise-he.eu
data.isric.org
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WoSIS snapshot - December 2023 [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/e50f84e1-aa5b-49cb-bd6b-cd581232a2ec
Explore at:
Description
ABSTRACT:

The World Soil Information Service (WoSIS) provides quality-assessed and standardized soil profile data to support digital soil mapping and environmental applications at broad scale levels. Since the release of the ‘WoSIS snapshot 2019’ many new soil data were shared with us, registered in the ISRIC data repository, and subsequently standardized in accordance with the licenses specified by the data providers. The source data were contributed by a wide range of data providers, therefore special attention was paid to the standardization of soil property definitions, soil analytical procedures and soil property values (and units of measurement).

We presently consider the following soil chemical properties (organic carbon, total carbon, total carbonate equivalent, total Nitrogen, Phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures (aggregates) that are operationally comparable.

For each profile we provide the original soil classification (FAO, WRB, USDA, and version) and horizon designations as far as these have been specified in the source databases.

Three measures for 'fitness-for-intended-use' are provided: positional uncertainty (for site locations), time of sampling/description, and a first approximation for the uncertainty associated with the operationally defined analytical methods. These measures should be considered during digital soil mapping and subsequent earth system modelling that use the present set of soil data.

DATA SET DESCRIPTION:

The 'WoSIS 2023 snapshot' comprises data for 228k profiles from 217k geo-referenced sites that originate from 174 countries. The profiles represent over 900k soil layers (or horizons) and over 6 million records. The actual number of measurements for each property varies (greatly) between proﬁles and with depth, this generally depending on the objectives of the initial soil sampling programmes.

The data are provided in TSV (tab separated values) format and as GeoPackage. The zip-file (446 Mb) contains the following files:

Readme_WoSIS_202312_v2.pdf: Provides a short description of the dataset, file structure, column names, units and category values (this file is also available directly under 'online resources'). The pdf includes links to tutorials for downloading the TSV files into R respectively Excel. See also 'HOW TO READ TSV FILES INTO R AND PYTHON' in the next section.

wosis_202312_observations.tsv: This file lists the four to six letter codes for each observation, whether the observation is for a site/profile or layer (horizon), the unit of measurement and the number of profiles respectively layers represented in the snapshot. It also provides an estimate for the inferred accuracy for the laboratory measurements.

wosis_202312_sites.tsv: This file characterizes the site location where profiles were sampled.

wosis_2023112_profiles: Presents the unique profile ID (i.e. primary key), site_id, source of the data, country ISO code and name, positional uncertainty, latitude and longitude (WGS 1984), maximum depth of soil described and sampled, as well as information on the soil classification system and edition. Depending on the soil classification system used, the number of fields will vary .

wosis_202312_layers: This file characterises the layers (or horizons) per profile, and lists their upper and lower depths (cm).

wosis_202312_xxxx.tsv : This type of file presents results for each observation (e.g. “xxxx” = “BDFIOD” ), as defined under “code” in file wosis_202312_observation.tsv. (e.g. wosis_202311_bdfiod.tsv).

wosis_202312.gpkg: Contains the above datafiles in GeoPackage format (which stores the files within an SQLite database).

HOW TO READ TSV FILES INTO R AND PYTHON:

A) To read the data in R, please uncompress the ZIP file and specify the uncompressed folder.

setwd("/YourFolder/WoSIS_2023_December/") ## For example: setwd('D:/WoSIS_2023_December/')

Then use read_tsv to read the TSV files, specifying the data types for each column (c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time).

observations = readr::read_tsv('wosis_202312_observations.tsv', col_types='cccciid')
observations ## show columns and first 10 rows

sites = readr::read_tsv('wosis_202312_sites.tsv', col_types='iddcccc') sites

profiles = readr::read_tsv('wosis_202312_profiles.tsv', col_types='icciccddcccccciccccicccci') profiles

layers = readr::read_tsv('wosis_202312_layers.tsv', col_types='iiciciiilcc') layers

Do this for each observation 'XXXX', e.g. file 'Wosis_202312_orgc.tsv':

orgc = readr::read_tsv('wosis_202312_orgc.tsv', col_types='iicciilccdccddccccc')
orgc

Note: One may also use the following R code (example is for file 'observations.tsv'): observations <- read.table("wosis_202312_observations.tsv", sep = "\t", header = TRUE, quote = "", comment.char = "", stringsAsFactors = FALSE )

B) To read the files into python first decompress the files to your selected folder. Then in python:

import the required library

import pandas as pd

Read the observations data

observations = pd.read_csv("wosis_202312_observations.tsv", sep="\t") # print the data frame header and some rows observations.head()

Read the sites data

sites = pd.read_csv("wosis_202312_sites.tsv", sep="\t")

Read the profiles data

profiles = pd.read_csv("wosis_202312_profiles.tsv", sep="\t")

Read the layers data

layers = pd.read_csv("wosis_202312_layers.tsv", sep="\t")

Read the soil property data, e.g. 'cfvo' (do this for each observation)

cfvo = pd.read_csv("wosis_202312_cfvo.tsv", sep="\t")

CITATION: Calisto, L., de Sousa, L.M., Batjes, N.H., 2023. Standardised soil profile data for the world (WoSIS snapshot – December 2023), https://doi.org/10.17027/isric-wdcsoils-20231130

Supplement to: Batjes N.H., Calisto, L. and de Sousa L.M., 2023. Providing quality-assessed and standardised soil data to support global mapping and modelling (WoSIS snapshot 2023). Earth System Science Data, https://doi.org/10.5194/essd-16-4735-2024.
d
Data from: An assessment of wheat yield sensitivity and breeding gains in...
datadryad.org
data.niaid.nih.gov
zip
Updated Mar 5, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharon M. Gourdji; Ky L. Mathews; Matthew Reynolds; Jose Crossa; David B. Lobell (2013). An assessment of wheat yield sensitivity and breeding gains in hot environments [Dataset]. http://doi.org/10.5061/dryad.525vm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.525vm
Dataset updated
Mar 5, 2013
Dataset provided by
Dryad
Authors
Sharon M. Gourdji; Ky L. Mathews; Matthew Reynolds; Jose Crossa; David B. Lobell
Time period covered
Nov 8, 2012
Description
regression.dat_nurseries_ADW2.RdatThis R data frame contains 1353 rows corresponding to the international trials in the CIMMYT database used in this study. The column names should be self-descriptive, and contain all the predictors used for this regression analysis.
The first two rows of a pandas DataFrame ready to be used with GLAM.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felix Molter; Armin W. Thomas; Hauke R. Heekeren; Peter N. C. Mohr (2023). The first two rows of a pandas DataFrame ready to be used with GLAM. [Dataset]. http://doi.org/10.1371/journal.pone.0226428.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0226428.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Felix Molter; Armin W. Thomas; Hauke R. Heekeren; Peter N. C. Mohr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The first two rows of a pandas DataFrame ready to be used with GLAM.
s
Data.Rda for How uncertain is the survival extrapolation? A study of the...
orda.shef.ac.uk
application/gzip
Updated Sep 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Kearns (2019). Data.Rda for How uncertain is the survival extrapolation? A study of the impact of different parametric survival models on extrapolated uncertainty about hazard functions, lifetime mean survival and cost-effectiveness [Dataset]. http://doi.org/10.15131/shef.data.9751907.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.9751907.v2
Dataset updated
Sep 10, 2019
Dataset provided by
The University of Sheffield
Authors
Benjamin Kearns
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An R Rda-file containing the four hypothetical datasets used in the analysis (Flat, increasing, decreasing, unimodal). These are stored in a single data-frame, where the 400 rows correspond to observations. For each dataset there are three variables: an event indicator = 1 for if death was during follow-up, else = 0 (suffix "_dead"), the true time of death (suffix "_time") and the observed follow-up time, which will = 1 if the true time of death > 1 (suffix "_obs"). Hence there are twelve columns.A script is provided seperately within this project (Analysis.R) which includes the code used to analyse this dataset in order to obtain the results reported in the manuscript "How uncertain is the survival extrapolation? A study of the impact of different parametric survival models on extrapolated uncertainty about hazard functions, lifetime mean survival and cost-effectiveness."
Data from: Streptococcus pyogenes pharyngitis elicits diverse antibody...
zenodo.org
bin, csv, html
Updated Oct 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danika Hill; Danika Hill (2024). Streptococcus pyogenes pharyngitis elicits diverse antibody responses to key vaccine antigens influenced by the imprint of past infections. [Dataset]. http://doi.org/10.5281/zenodo.13347362
Explore at:
bin, csv, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13347362
Dataset updated
Oct 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Danika Hill; Danika Hill
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here you will find the raw data (RawData.RData) and code (CHIVAS_SEROLOGY_Code.Rmd, an R Markdown file) for generating the analysis and figures for the following publication:

Streptococcus pyogenes pharyngitis elicits diverse antibody responses to key vaccine antigens influenced by the imprint of past infections.

Joshua Osowicki1,2,3 #, Hannah R Frost1 #, Kristy I Azzopardi1, Alana L Whitcombe4, Reuben McGregor4, Lauren H. Carlton4, Ciara Baker1, Loraine Fabri1,5,6, Manisha Pandey7, Michael F Good7, Jonathan R. Carapetis8,9,10, Mark J Walker11,12,13, Pierre R Smeesters1,2,5,6, Paul V Licciardi2,14, Nicole J Moreland4 *, Danika L Hill15 *, Andrew C Steer1,2,3 *

Provided in the RData file are the following items:

Dataframes:

"outcome" : clinical variables associated with human challenge for each participant

"data" : ELISA and functional antibody responses for human challenge participants. Each timepoint and isotype for each antigen as seperate column)

"data_long": Data equivalent to "data" file but in long format, i.e. One column for each antigen, timepoint and isotype as factors.

"data.melt" : Data equivalent to "data" file but in longer format , i.e. timepoint, isotype and antigen as factors, 'value' as ELISA AU.

"luminex" : IgG responses to 6 antigens analysed by luminex bead-based assay in human challenge participants.

"luminex.children" : IgG responses to 6 antigen analysed by luminex bead-based assay in children

Vectors:

"pharyngitis" : participant "id" for the 19 individuals that developed pharyngitis.

"Antigen.Order" : relates to "Main" antigen classification used in Figure 2

'additional" : relates to "Additional

Function:

"custom_theme" : used as a theme when using ggplot to graph.

Adobe Illustrator or Inkscape were used to generate the final image files for publication, with some graph editing to axes labels, font size, adding p-values etc.

Additional files:

3 .csv files have been included for download

"ELISA_data_wide_format.csv", a wide format data table of 25 human challenge individuals and 219 variables. Equivalent to the 'data' dataframe in the RData file

"CHIVAS_luminex.csv", a long format data table of 25 human challenge participants at 1 week, 1 month, and 3 months. Equivalent to the 'luminex' dataframe in the RData file.

"Luminex.children.csv", a datatable of 6 luminex variables for 39 children (healthy and post pharyngitis). Equivalent to the 'luminex.children' dataframe in the RData file.
d
Data and analysis from: Body mass, temperature, and depth shape the maximum...
datadryad.org
data.niaid.nih.gov
zip
Updated Oct 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastián A. Pardo; Nicholas K. Dulvy (2022). Data and analysis from: Body mass, temperature, and depth shape the maximum intrinsic rate of population increase in sharks and rays [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrb
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wh70rxwrb
Dataset updated
Oct 2, 2022
Dataset provided by
Dryad
Authors
Sebastián A. Pardo; Nicholas K. Dulvy
Time period covered
Sep 19, 2022
Description
Analyses are reproducible using version 3.3.2 or above (R Core Team 2016). Files needed for reproducing the analyses are: chond-data.csv: Data frame with 63 rows (species) and 11 variables. Some of these variables are based on the same life history trait but are transformed for ease of interpretation and analysis. stein-et-al-single.tree: Phylogenetic tree with scaled branch lengths from Stein et al. (2018) used in analyses. These are freely downloadable from http://vertlife.org/sharktree/. rmax-scaling-analysis.R: R code with minimum working example of how to load data files, fit models phylogenetic linear models using the pgls function in the caper package, run information-theoretic comparisons, and check diagnostics.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

TheNicelander (2016). FacialRecognition [Dataset]. https://www.kaggle.com/petein/facialrecognition

FacialRecognition

Test environment for FacialRecognition competition

Explore at:

zip(121674455 bytes)Available download formats

Dataset updated

Dec 1, 2016

Authors

TheNicelander

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

#https://www.kaggle.com/c/facial-keypoints-detection/details/getting-started-with-r #################################

###Variables for downloaded files data.dir <- ' ' train.file <- paste0(data.dir, 'training.csv') test.file <- paste0(data.dir, 'test.csv') #################################

###Load csv -- creates a data.frame matrix where each column can have a different type. d.train <- read.csv(train.file, stringsAsFactors = F) d.test <- read.csv(test.file, stringsAsFactors = F)

###In training.csv, we have 7049 rows, each one with 31 columns. ###The first 30 columns are keypoint locations, which R correctly identified as numbers. ###The last one is a string representation of the image, identified as a string.

###To look at samples of the data, uncomment this line:

head(d.train)

###Let's save the first column as another variable, and remove it from d.train: ###d.train is our dataframe, and we want the column called Image. ###Assigning NULL to a column removes it from the dataframe

im.train <- d.train$Image d.train$Image <- NULL #removes 'image' from the dataframe

im.test <- d.test$Image d.test$Image <- NULL #removes 'image' from the dataframe

################################# #The image is represented as a series of numbers, stored as a string #Convert these strings to integers by splitting them and converting the result to integer

#strsplit splits the string #unlist simplifies its output to a vector of strings #as.integer converts it to a vector of integers. as.integer(unlist(strsplit(im.train[1], " "))) as.integer(unlist(strsplit(im.test[1], " ")))

###Install and activate appropriate libraries ###The tutorial is meant for Linux and OSx, where they use a different library, so: ###Replace all instances of %dopar% with %do%.

install.packages('foreach')

library("foreach", lib.loc="~/R/win-library/3.3")

###implement parallelization im.train <- foreach(im = im.train, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } im.test <- foreach(im = im.test, .combine=rbind) %do% { as.integer(unlist(strsplit(im, " "))) } #The foreach loop will evaluate the inner command for each row in im.train, and combine the results with rbind (combine by rows). #%do% instructs R to do all evaluations in parallel. #im.train is now a matrix with 7049 rows (one for each image) and 9216 columns (one for each pixel):

###Save all four variables in data.Rd file ###Can reload them at anytime with load('data.Rd')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

#each image is a vector of 96*96 pixels (96*96 = 9216). #convert these 9216 integers into a 96x96 matrix: im <- matrix(data=rev(im.train[1,]), nrow=96, ncol=96)

#im.train[1,] returns the first row of im.train, which corresponds to the first training image. #rev reverse the resulting vector to match the interpretation of R's image function #(which expects the origin to be in the lower left corner).

#To visualize the image we use R's image function: image(1:96, 1:96, im, col=gray((0:255)/255))

#Let’s color the coordinates for the eyes and nose points(96-d.train$nose_tip_x[1], 96-d.train$nose_tip_y[1], col="red") points(96-d.train$left_eye_center_x[1], 96-d.train$left_eye_center_y[1], col="blue") points(96-d.train$right_eye_center_x[1], 96-d.train$right_eye_center_y[1], col="green")

#Another good check is to see how variable is our data. #For example, where are the centers of each nose in the 7049 images? (this takes a while to run): for(i in 1:nrow(d.train)) { points(96-d.train$nose_tip_x[i], 96-d.train$nose_tip_y[i], col="red") }

#there are quite a few outliers -- they could be labeling errors. Looking at one extreme example we get this: #In this case there's no labeling error, but this shows that not all faces are centralized idx <- which.max(d.train$nose_tip_x) im <- matrix(data=rev(im.train[idx,]), nrow=96, ncol=96) image(1:96, 1:96, im, col=gray((0:255)/255)) points(96-d.train$nose_tip_x[idx], 96-d.train$nose_tip_y[idx], col="red")

#One of the simplest things to try is to compute the mean of the coordinates of each keypoint in the training set and use that as a prediction for all images colMeans(d.train, na.rm=T)

#To build a submission file we need to apply these computed coordinates to the test instances: p <- matrix(data=colMeans(d.train, na.rm=T), nrow=nrow(d.test), ncol=ncol(d.train), byrow=T) colnames(p) <- names(d.train) predictions <- data.frame(ImageId = 1:nrow(d.test), p) head(predictions)

#The expected submission format has one one keypoint per row, but we can easily get that with the help of the reshape2 library:

install.packages('reshape2')

library(...

Clear search

Close search

Google apps

Main menu

FacialRecognition

head(d.train)

install.packages('foreach')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

install.packages('reshape2')

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

DONALD

120 years of Olympic history: athletes and results

Context

Content

Acknowledgements

Inspiration

WoSIS snapshot - December 2023

Do this for each observation 'XXXX', e.g. file 'Wosis_202312_orgc.tsv':

import the required library

Read the observations data

Read the sites data

Read the profiles data

Read the layers data

Read the soil property data, e.g. 'cfvo' (do this for each observation)

Data from: An assessment of wheat yield sensitivity and breeding gains in...

The first two rows of a pandas DataFrame ready to be used with GLAM.

Data.Rda for How uncertain is the survival extrapolation? A study of the...

Data from: Streptococcus pyogenes pharyngitis elicits diverse antibody...

Data and analysis from: Body mass, temperature, and depth shape the maximum...

FacialRecognition

Test environment for FacialRecognition competition

head(d.train)

install.packages('foreach')

save(d.train, im.train, d.test, im.test, file='data.Rd')

load('data.Rd')

install.packages('reshape2')