100+ datasets found
  1. Friends - R Package Dataset

    • kaggle.com
    zip
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Yukio Imafuko (2024). Friends - R Package Dataset [Dataset]. https://www.kaggle.com/datasets/lucasyukioimafuko/friends-r-package-dataset
    Explore at:
    zip(2018791 bytes)Available download formats
    Dataset updated
    Nov 11, 2024
    Authors
    Lucas Yukio Imafuko
    Description

    The whole data and source can be found at https://emilhvitfeldt.github.io/friends/

    "The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."

    Content

    • friends.csv - Contains the scenes and lines for each character, including season and episodes.
    • friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.
    • friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.

    Uses

    • Text mining, sentiment analysis and word statistics.
    • Data visualizations.
  2. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  3. f

    R-script to Analyse Data

    • uvaauas.figshare.com
    txt
    Updated Apr 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. Blanke (2022). R-script to Analyse Data [Dataset]. http://doi.org/10.21942/uva.14346842.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 4, 2022
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    T. Blanke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exploratory data analysis and visualisation of datasets

  4. r

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Apr 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    Dataset updated
    Apr 1, 2019
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Publication


    Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

    Description of R codes and data files in the repository

    This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

    The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

    These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

    Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

    Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

    The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  5. Basic R for Data Analysis

    • kaggle.com
    zip
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kebba Ndure (2024). Basic R for Data Analysis [Dataset]. https://www.kaggle.com/datasets/kebbandure/basic-r-for-data-analysis/code
    Explore at:
    zip(279031 bytes)Available download formats
    Dataset updated
    Dec 8, 2024
    Authors
    Kebba Ndure
    Description

    ABOUT DATASET

    This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.

    Anyone new to R and wish to carry out some data analysis on R can check it out!

  6. MICRON Data (2015-2016) with associated R Markdown code

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). MICRON Data (2015-2016) with associated R Markdown code [Dataset]. https://catalog.data.gov/dataset/micron-data-2015-2016-with-associated-r-markdown-code
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This data set includes water quality data and microbial community abundance tables for periphyton samples from this project. The data set also includes extensive R markdown code used to process the data and generate the results included in the report. This dataset is associated with the following publication: Hagy, J., R. Devereux, K. Houghton, D. Beddick, T. Pierce, and S. Friedman. Developing Microbial Community Indicators of Nutrient Exposure in Southeast Coastal Plain Streams using a Molecular Approach. US EPA Office of Research and Development, Washington, DC, USA, 2018.

  7. Reddit: /r/stocks

    • kaggle.com
    zip
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit: /r/stocks [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-stock-market-insights-with-reddit-user
    Explore at:
    zip(622416 bytes)Available download formats
    Dataset updated
    Dec 19, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reddit: /r/stocks

    Analyzing User Engagement to Identify Market Trends

    By Reddit [source]

    About this dataset

    This dataset provides a valuable opportunity for researchers to explore the fascinating world of stock exchange markets through the eyes of those participating in discussions on Reddit. We have compiled posts from the subredditstocks subreddit to provide researchers with an invaluable source of information on how stock market trends may be impacted by user sentiment. With detailed data columns such as post titles, scores, id's, URLs, comments counts and created times for each post we are offering a unique vantage point into understanding how stocks market discussions may inform our better understanding of these dynamics. By delving further into user sentiment and engagement with stock topics, investigators can put together meaningful pieces in assembling full-fledged investments picture that is based off sound evidence gained from real people’s experiences and opinion. Discovering new insights has never been made easier – let’s venture out on this journey together!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨! ### Research Ideas
    • Using the score and comments data, researchers can determine which stocks are being discussed and tracked the most, indicating potential areas of interest in the stock market.
    • Analyzing the body text of posts to identify common topics of conversation related to various stocks assists in providing a better understanding of users' feelings towards different stock investments.
    • Through analyzing fluctuations in user engagement over time, researchers can observe which stocks have experienced an increase or decrease in user interest and reaction to new developments within different markets

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: stocks.csv | Column name | Description | |:--------------|:--------------------------------------------------------------------| | title | The title of the post. (String) | | score | The score of the post, based on the Reddit voting system. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments on the post. (Integer) | | created | The date and time the post was created. (Timestamp) | | body | The body text of the post. (String) | | timestamp | The date and time the post was last updated. (Timestamp) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.

  8. Protocol data (R version)

    • figshare.com
    application/gzip
    Updated Oct 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Gillis (2020). Protocol data (R version) [Dataset]. http://doi.org/10.6084/m9.figshare.13020569.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jesse Gillis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.

  9. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  10. s

    Data and R code used in: Plant geographic distribution influences chemical...

    • repository.soilwise-he.eu
    • search.dataone.org
    • +1more
    Updated Jan 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Data and R code used in: Plant geographic distribution influences chemical defenses in native and introduced Plantago lanceolata populations [Dataset]. http://doi.org/10.5061/dryad.5dv41nsd1
    Explore at:
    Dataset updated
    Jan 22, 2024
    Description

    Open Access# Data and R code used in: Plant geographic distribution influences chemical defenses in native and introduced Plantago lanceolata populations ## Description of the data and file structure * 00_ReadMe_DescriptonVariables.csv: A list with the description of variables from each file used. * 00_Metadata_Coordinates.csv : A dataset that includes the coordinates of each Plantago lanceolata population used. * 00_Metadata_Climate.csv : A dataset that includes coordinates, bioclimatic parameters, and the results of PCA. The dataset was created based on the script '1_Environmental variables.qmd' * 00_Metadata_Individuals.csv: A dataset that includes general information about each plant individual. Information about root traits and chemistry is missing in four samples since we lost the samples. * 01_Datset_PlantTraits.csv: Size-related and resource allocation traits measured of Plantago lanceolata and herbivore damage. * 02_Dataset_TargetedCompounds.csv: Phytohormones, Iridoid glycosides, Verbascoside and Flavonoids quantification of the leaves and roots of Plantago lanceolata. Data generated from HPLC * 03_Dataset_Volatiles_Area.csv: Area of identified volatile compounds. Data generated from GC-FID * 03_Dataset_Volatiles_Compounds.csv: Information on identified volatile compounds. Data generated from GC-MS. * 04_Dataset_Metabolome_Negative_Metadata.txt: Metadata for files in negative mode * 04_Dataset_Metabolome_Negative_Intensity.xlsx : File with the intensity of the metabolite features in negative mode. The file was generated from Metaboscape and adapted as required for the Notame package. * 04_Dataset_Metabolome_Negative_Intensity_filtered.xlsx: File generated after preprocessing of features in negative mode. During the notadame pacakged preprossesing 0 were converted to na * 04_Dataset_Metabolome_Negative.msmsonly.csv: File with a intensity of the the metabolite features in negative mode with ms/ms data. File generated from Metaboscape. * 04_Results_Metabolome_Negative_canopus_compound_summary.tsv: Feature classification. Results generated from Sirius software. * 04_Results_Metabolome_Negative_compound_identifications.tsv: Feature identification. Results generated from Sirius software. * 05_Dataset_Metabolome_Positive_Metadata.txt: Metadata for files in positive mode * 05_DatasetMetabolome_Positive_Intensity.xlsx : File with a intensity of the the metabolite features in positive mode. File generated from Metaboscape and adapted as required for the Notame package. * 05_Dataset_Metabolome_Positive_Intensity_filtered: File generated after preprocessing of features in positive mode.During the notadame pacakged preprossesing 0 were converted to na ## ## Code/Software * 1_Environmental vairables.qmd: Rscript to Retrieve bioclimatic variables from based on the coordinates of each population and then perform a principal components analysis to reduce the axes variation and included the first principal component as an explanatory variable in our model to estimate trait differences between native and introduced populations. Figure 1b and 1d * 2_PlantTraits_and_Herbivory: Rscript for statistical anaylsis of size-related traits, resource allocation traits and herbivore damage. Figure 2. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * 3_Metabolome: Rscript for statistical anaylsis of Plantago lanceolata metabolome. Figure 3. It needs to source: Metabolome_preprocessing_R, Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R. * 4_TargetedCompounds: Rscript for statistical anaylsis of Plantago lanceolata targeted compounds. Figure 4. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * 5_Volatilome: Rscript for statistical anaylsis of Plantago lanceolata metabolome. Figure 5. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * Model_1_Function.R : Function to run statistical models * Model_2_Function.R : Function to run statistical models * Plots_Function.R : Function to run plot graphs * Metabolome_prepocessing.R: Script to preprocess features

  11. Reddit: /r/travel

    • kaggle.com
    zip
    Updated Dec 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit: /r/travel [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-travel-experiences-desires-and-opinio
    Explore at:
    zip(369897 bytes)Available download formats
    Dataset updated
    Dec 18, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reddit: /r/travel

    An Exploration of Users & Posts

    By Reddit [source]

    About this dataset

    Traveling can be an incredibly exciting and rewarding experience; it is the perfect way to break away from the everyday routine and explore new cultures, sights, and sounds. For those planning a travel-related adventure – whether international or local – having access to real-user experiences in the form of advice and recommendations can mean the difference between a fantastic journey and a costly mistake. That's why this dataset of Reddit posts history on 'travel' is particularly useful for exploring Reddit users' opinions, desires, and experiences with their travel endeavors.

    This dataset contains information on over 750+ Reddit posts regarding traveling as well as thousands of related comments over an extended period of time. For every post listed, data such as title, score (number of upvotes), URL link to page, number of comments given per post/comment thread, creation date/time stamp for both post/comment threads can be found.

    All together these attributes provide detailed insights into user sentiments towards various aspects regarding traveling: What topics are they most interested in? What do they think are the best (or worst) destinations? Are there any tips or pitfalls that could inform our own decisions when embarking on our next journey? All this information resulting from our analysis will give us better guidance when helping us make smarter decisions during our planning process!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides valuable insights into the various opinions, desires and experiences of Redditors about travel-related activities. The data consists of posts and comments collected from the 'travel' sub reddit page on Reddit. To get started with this dataset, you need to first understand that each post includes data such as title, score, ID, url, number of comments created at the timestamp etc. This can be used to understand the kind of conversations that are happening in these forums regarding travel related topics.

    Research Ideas

    • Analyzing user sentiment around various topics in the travel industry such as airlines, hotels, attractions and experiences.
    • Comparing time of year to the frequency of posts related to summer vacation or other holiday specific activities.
    • Examining which geographical locations generate the most interest among Redditors, and applying this data to marketing campaigns for those areas

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: travel.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments the post has received. (Integer) | | created | The date and time the post was created. (DateTime) | | body | The body of the post. (String) | | timestamp | The date and time the post was last updated. (DateTime) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.

  12. Reddit r/developersindia Community Posts Dataset

    • kaggle.com
    zip
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Utkarsh Gupta (aka AvGeekGupta) (2023). Reddit r/developersindia Community Posts Dataset [Dataset]. https://www.kaggle.com/datasets/avgeekgupta/reddit-rdevelopersindia-community-posts-dataset
    Explore at:
    zip(837396 bytes)Available download formats
    Dataset updated
    Oct 29, 2023
    Authors
    Utkarsh Gupta (aka AvGeekGupta)
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Reddit r/developersindia Community Posts Dataset

    https://www.redditstatic.com/desktop2x/img/favicon/android-icon-192x192.png"> https://styles.redditmedia.com/t5_2dfnk0/styles/communityIcon_uli9r9wy5lba1.png?width=256&s=7d293bdc1f306bce5ec9a495652c2a60706c4e1c">

    Overview

    This dataset contains information about posts from the r/developersindia subreddit on Reddit. It covers posts starting from July 7, 2023, and goes back in time. The dataset aims to provide insights into the discussions and activities within the r/developersindia community.

    Columns

    • title: The title of the Reddit post.
    • selftext: The content of the post if it's a text-based post; otherwise, it may be empty or contain additional information.
    • subreddit: The name of the subreddit (r/developersindia).
    • author_flair_text: The flair text associated with the author of the post, if any.
    • num_comments: The number of comments on the post.
    • downs: The number of downvotes the post has received.
    • is_crosspostable: Indicates whether the post is crosspostable (True/False).
    • view_count: The number of views on the post.
    • ups: The number of upvotes the post has received.
    • url: The URL associated with the post.
    • is_video: Indicates whether the post contains a video (True/False).
    • num_crossposts: The number of times the post has been crossposted.
    • subreddit_subscribers: The number of subscribers to the r/developersindia subreddit.
    • author: The username of the author of the post.
    • treatment_tags: Tags or labels applied to the post for special treatment.
    • all_awardings: Information about any awards received by the post.
    • media: Information about media content associated with the post.

    Potential Use Cases

    This dataset can be valuable for various data analysis and machine learning tasks, including:

    • Sentiment analysis of posts within the r/developersindia community.
    • Topic modeling to identify common discussion topics.
    • User behavior analysis, such as post frequency and engagement.
    • Exploring trends and patterns in posts and comments.
    • Investigating the impact of upvotes, downvotes, and crossposts.
  13. Fiber object

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Dec 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Boltengagen; Mark Boltengagen (2024). Fiber object [Dataset]. http://doi.org/10.5281/zenodo.14545007
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mark Boltengagen; Mark Boltengagen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains fiber R object, stored as an RDS file for use in R package for analysis of long-read sequencing.

  14. Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx

    • frontiersin.figshare.com
    docx
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ethan M. Meyers (2024). Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx [Dataset]. http://doi.org/10.3389/fninf.2023.1275903.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Ethan M. Meyers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.

  15. sptotal R package data

    • catalog.data.gov
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2023). sptotal R package data [Dataset]. https://catalog.data.gov/dataset/sptotal-r-package-data
    Explore at:
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The data are just illustrative tools to help users understand how the sptotal R package works. As a result, those interested in sptotal, which may include EPA and the general public, may be interested in the data. This dataset is associated with the following publication: Higham, M., J. Ver Hoef, B. Frank, and M. Dumelle. sptotal: an R package for predicting totals and weighted sums from spatial data. Journal of Open Source Software. Journal of Open Source Software, 8(85): 05363, (2023).

  16. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  17. R

    R package WallomicsData

    • entrepot.recherche.data.gouv.fr
    Updated May 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harold Durufle; Harold Durufle (2023). R package WallomicsData [Dataset]. http://doi.org/10.57745/3J1XPO
    Explore at:
    Dataset updated
    May 15, 2023
    Dataset provided by
    Recherche Data Gouv
    Authors
    Harold Durufle; Harold Durufle
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    Datasets from the WallOmics project. Contains phenomics, metabolomics, proteomics and transcriptomics data collected from two organs of five ecotypes of the model plant Arabidopsis thaliana exposed to two temperature growth conditions. Exploratory and integrative analyses of these data are presented in Durufle et al (2020) (doi:10.1093/bib/bbaa166) and Durufle et al (2020) (doi:10.3390/cells9102249).

  18. Income data

    • kaggle.com
    zip
    Updated Apr 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatemeh Khosravi (2022). Income data [Dataset]. https://www.kaggle.com/datasets/fatimaarefi/adult-data
    Explore at:
    zip(41366 bytes)Available download formats
    Dataset updated
    Apr 25, 2022
    Authors
    Fatemeh Khosravi
    Description

    Dataset

    This dataset was created by Fatemeh Khosravi

    Contents

    Predict income with neural networks in R.

  19. f

    Data and R code for analysis.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillips, Gareth L.; Bray, Peran; Fernandes, Leanne; Mosquera, Enrique; Bonin, Mary; Beard, Daniel; Morris, Sheriden; Godoy, Dan; Abom, Rickard T. M.; Matthews, Samuel A.; Campili, Adriana R.; Tracey, Dieter; Taylor, Sascha; Jonker, Michelle J.; Lang, Bethan J.; Sinclair-Taylor, Tane H.; Emslie, Michael J.; Hemingson, Christopher R.; Wilmes, Jennifer C.; Ceccarelli, Daniela M.; Williamson, David H.; Quincey, Richard; Beeden, Roger; Fletcher, Cameron S. (2024). Data and R code for analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001394297
    Explore at:
    Dataset updated
    Apr 24, 2024
    Authors
    Phillips, Gareth L.; Bray, Peran; Fernandes, Leanne; Mosquera, Enrique; Bonin, Mary; Beard, Daniel; Morris, Sheriden; Godoy, Dan; Abom, Rickard T. M.; Matthews, Samuel A.; Campili, Adriana R.; Tracey, Dieter; Taylor, Sascha; Jonker, Michelle J.; Lang, Bethan J.; Sinclair-Taylor, Tane H.; Emslie, Michael J.; Hemingson, Christopher R.; Wilmes, Jennifer C.; Ceccarelli, Daniela M.; Williamson, David H.; Quincey, Richard; Beeden, Roger; Fletcher, Cameron S.
    Description

    Detailed code for data wrangling (“Coral Protection Outcomes_Wrangle.Rmd”) as well as analysis and figure generation (“Coral Protection Outcomes_FinguresAnalysis.Rmd”). Outputs from the data wrangling step to be used in the analysis script are included in the “CoralProtection.Rdata” file. (ZIP)

  20. R

    Detect R Dataset

    • universe.roboflow.com
    zip
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    suranaree university of technology (2023). Detect R Dataset [Dataset]. https://universe.roboflow.com/suranaree-university-of-technology-khvje/detect-r/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 29, 2023
    Dataset authored and provided by
    suranaree university of technology
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    R Bounding Boxes
    Description

    Detect R

    ## Overview
    
    Detect R is a dataset for object detection tasks - it contains R annotations for 1,227 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lucas Yukio Imafuko (2024). Friends - R Package Dataset [Dataset]. https://www.kaggle.com/datasets/lucasyukioimafuko/friends-r-package-dataset
Organization logo

Friends - R Package Dataset

The One with the Transcript

Explore at:
zip(2018791 bytes)Available download formats
Dataset updated
Nov 11, 2024
Authors
Lucas Yukio Imafuko
Description

The whole data and source can be found at https://emilhvitfeldt.github.io/friends/

"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."

Content

  • friends.csv - Contains the scenes and lines for each character, including season and episodes.
  • friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.
  • friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.

Uses

  • Text mining, sentiment analysis and word statistics.
  • Data visualizations.
Search
Clear search
Close search
Google apps
Main menu