5 datasets found
  1. H

    GenBank network assortative mixing R data frames

    • dataverse.harvard.edu
    Updated May 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Qin; Jeff Hemsley; Sarah Bratt (2021). GenBank network assortative mixing R data frames [Dataset]. http://doi.org/10.7910/DVN/ZRVK1L
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Jian Qin; Jeff Hemsley; Sarah Bratt
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    GenBank collaboration networks assortative mixing R data frame files for 2002 and 2012.

  2. Social Contacts

    • kaggle.com
    Updated Apr 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick (2020). Social Contacts [Dataset]. https://www.kaggle.com/datasets/bitsnpieces/social-contacts/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Patrick
    Description

    Inspiration

    Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?

    Context

    With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.

    To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.

    Content

    This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.

    This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).

    I used this R code to extract this data:

    load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
    View(contacts)
    contacts[["ALB"]][["home"]]
    contacts[["ITA"]][["all"]]
    rowSums(contacts[["ALB"]][["all"]])
    out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
    out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
    out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
    m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
    m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
    m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
    rownames(m1) = names(contacts)
    colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
    rownames(m2) = rownames(m1)
    rownames(m3) = rownames(m1)
    colnames(m2) = colnames(m1)
    colnames(m3) = colnames(m1)
    write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
    write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
    write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)
    

    Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">

    Acknowledgements

    Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.

    References

    Related resources

  3. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  4. Data for analysis in Barrie et al. (2025)

    • figshare.com
    csv
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eleanor Barrie; Luke L. Powell; Billi Krochuk; Patricia F Rodrigues; Jared D Wolfe; Crinan Jarrett; Diogo F Ferreira; Kristin E Brzeski; Jacob C Cooper; Susana Lin Mufumu; Silvestre Esteban Malanza; Agustin Ebana Nsue Akele; Cayetano Ebana Ebana Alene (2025). Data for analysis in Barrie et al. (2025) [Dataset]. http://doi.org/10.6084/m9.figshare.29114960.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Eleanor Barrie; Luke L. Powell; Billi Krochuk; Patricia F Rodrigues; Jared D Wolfe; Crinan Jarrett; Diogo F Ferreira; Kristin E Brzeski; Jacob C Cooper; Susana Lin Mufumu; Silvestre Esteban Malanza; Agustin Ebana Nsue Akele; Cayetano Ebana Ebana Alene
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Barrie
    Description

    These files contain the data used for analysis in: Barrie EM, Krochuk BA, Jarrett C, Ferreira DF, Rodrigues P, Mufumu SL, Malanza SE, Akele AEN, Alene CEE, Brzeski KE, Cooper JC, Wolfe JD and Powell LL (2025) Specialized insectivores drive differences in avian community composition between primary and secondary forest in Central Africa. Front. Conserv. Sci. 6:1504350. doi: 10.3389/fcosc.2025.1504350At a long-term bird banding station on mainland Equatorial Guinea, we captured over 3200 birds across 6 field seasons in selectively logged secondary forest and in largely undisturbed primary forest. Our objective was to understand how community composition changed with human disturbance—with particular interest in the guilds and species that indicate primary rainforest.banding_data.csv consists of the raw banding/capture data from mist-netting and ringing in the field, including info on time and date of capture, net lane and net number, species, ring number, and recaptures.buffers.csv lists (for each net lane) the amount of overlap with other nearby net lanes and the proportion used for the offset in statistical analysis. See Barrie et al. (2025) for methodology.days.csv lists all combinations of net lanes and dates run and whether these were "Day 1" or "Day 2" (all net lanes were run for two consecutive days per year.effort.csv contains data on effort in terms of mist net hours, with the opening and closing times and duration open for every net run.forest_type.csv lists each net lane and whether it was in primary or secondary forestguilds.csv contains data on the dietary guild classifications of all focal species analysed in Barrie et al. (2025), which is needed to merge with banding_data.csv in R and create the data frame for analysis

  5. P

    DQN Replay Dataset Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishabh Agarwal; Dale Schuurmans; Mohammad Norouzi (2021). DQN Replay Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/dqn-replay-dataset
    Explore at:
    Dataset updated
    Jul 23, 2021
    Authors
    Rishabh Agarwal; Dale Schuurmans; Mohammad Norouzi
    Description

    The DQN Replay Dataset was collected as follows: We first train a DQN agent, on all 60 Atari 2600 games with sticky actions enabled for 200 million frames (standard protocol) and save all of the experience tuples of (observation, action, reward, next observation) (approximately 50 million) encountered during training.

    This logged DQN data can be found in the public GCP bucket gs://atari-replay-datasets which can be downloaded using gsutil. To install gsutil, follow the instructions here.

    After installing gsutil, run the command to copy the entire dataset:

    gsutil -m cp -R gs://atari-replay-datasets/dqn

    To run the dataset only for a specific Atari 2600 game (e.g., replace GAME_NAME by Pong to download the logged DQN replay datasets for the game of Pong), run the command:

    gsutil -m cp -R gs://atari-replay-datasets/dqn/[GAME_NAME]

    This data can be generated by running the online agents using batch_rl/baselines/train.py for 200 million frames (standard protocol). Note that the dataset consists of approximately 50 million experience tuples due to frame skipping (i.e., repeating a selected action for k consecutive frames) of 4. The stickiness parameter is set to 0.25, i.e., there is 25% chance at every time step that the environment will execute the agent's previous action again, instead of the agent's new action.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jian Qin; Jeff Hemsley; Sarah Bratt (2021). GenBank network assortative mixing R data frames [Dataset]. http://doi.org/10.7910/DVN/ZRVK1L

GenBank network assortative mixing R data frames

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2021
Dataset provided by
Harvard Dataverse
Authors
Jian Qin; Jeff Hemsley; Sarah Bratt
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

GenBank collaboration networks assortative mixing R data frame files for 2002 and 2012.

Search
Clear search
Close search
Google apps
Main menu