100+ datasets found

Friends - R Package Dataset
kaggle.com
zip
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Yukio Imafuko (2024). Friends - R Package Dataset [Dataset]. https://www.kaggle.com/datasets/lucasyukioimafuko/friends-r-package-dataset
Explore at:
zip(2018791 bytes)Available download formats
Dataset updated
Nov 11, 2024
Authors
Lucas Yukio Imafuko
Description
The whole data and source can be found at https://emilhvitfeldt.github.io/friends/

"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."

Content

friends.csv - Contains the scenes and lines for each character, including season and episodes.

friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.

friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.

Uses

Text mining, sentiment analysis and word statistics.

Data visualizations.
H
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
dataverse.harvard.edu
search.dataone.org
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jamie Monogan (2020). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Apr 28, 2020
Dataset provided by
Harvard Dataverse
Authors
Jamie Monogan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
f
R-script to Analyse Data
uvaauas.figshare.com
txt
Updated Apr 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T. Blanke (2022). R-script to Analyse Data [Dataset]. http://doi.org/10.21942/uva.14346842.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.21942/uva.14346842.v1
Dataset updated
Apr 4, 2022
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
T. Blanke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Exploratory data analysis and visualisation of datasets
r
R codes and dataset for Visualisation of Diachronic Constructional Change...
researchdata.edu.au
bridges.monash.edu
Updated Apr 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg (2019). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
Explore at:
Unique identifier
https://doi.org/10.26180/5c844c7a81768
Dataset updated
Apr 1, 2019
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg; Gede Primahadi Wijaya Rajeg
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Publication

Primahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387

Description of R codes and data files in the repository

This repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).

The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).

These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt.

Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.

Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).

The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.
Basic R for Data Analysis
kaggle.com
zip
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kebba Ndure (2024). Basic R for Data Analysis [Dataset]. https://www.kaggle.com/datasets/kebbandure/basic-r-for-data-analysis/code
Explore at:
zip(279031 bytes)Available download formats
Dataset updated
Dec 8, 2024
Authors
Kebba Ndure
Description
ABOUT DATASET

This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.

Anyone new to R and wish to carry out some data analysis on R can check it out!
MICRON Data (2015-2016) with associated R Markdown code
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). MICRON Data (2015-2016) with associated R Markdown code [Dataset]. https://catalog.data.gov/dataset/micron-data-2015-2016-with-associated-r-markdown-code
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This data set includes water quality data and microbial community abundance tables for periphyton samples from this project. The data set also includes extensive R markdown code used to process the data and generate the results included in the report. This dataset is associated with the following publication: Hagy, J., R. Devereux, K. Houghton, D. Beddick, T. Pierce, and S. Friedman. Developing Microbial Community Indicators of Nutrient Exposure in Southeast Coastal Plain Streams using a Molecular Approach. US EPA Office of Research and Development, Washington, DC, USA, 2018.
Reddit: /r/stocks
kaggle.com
zip
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit: /r/stocks [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-stock-market-insights-with-reddit-user
Explore at:
zip(622416 bytes)Available download formats
Dataset updated
Dec 19, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit: /r/stocks

Analyzing User Engagement to Identify Market Trends

By Reddit [source]

About this dataset

This dataset provides a valuable opportunity for researchers to explore the fascinating world of stock exchange markets through the eyes of those participating in discussions on Reddit. We have compiled posts from the subredditstocks subreddit to provide researchers with an invaluable source of information on how stock market trends may be impacted by user sentiment. With detailed data columns such as post titles, scores, id's, URLs, comments counts and created times for each post we are offering a unique vantage point into understanding how stocks market discussions may inform our better understanding of these dynamics. By delving further into user sentiment and engagement with stock topics, investigators can put together meaningful pieces in assembling full-fledged investments picture that is based off sound evidence gained from real people’s experiences and opinion. Discovering new insights has never been made easier – let’s venture out on this journey together!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨! ### Research Ideas

Using the score and comments data, researchers can determine which stocks are being discussed and tracked the most, indicating potential areas of interest in the stock market.

Analyzing the body text of posts to identify common topics of conversation related to various stocks assists in providing a better understanding of users' feelings towards different stock investments.

Through analyzing fluctuations in user engagement over time, researchers can observe which stocks have experienced an increase or decrease in user interest and reaction to new developments within different markets

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: stocks.csv | Column name | Description | |:--------------|:--------------------------------------------------------------------| | title | The title of the post. (String) | | score | The score of the post, based on the Reddit voting system. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments on the post. (Integer) | | created | The date and time the post was created. (Timestamp) | | body | The body text of the post. (String) | | timestamp | The date and time the post was last updated. (Timestamp) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.
Protocol data (R version)
figshare.com
application/gzip
Updated Oct 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Gillis (2020). Protocol data (R version) [Dataset]. http://doi.org/10.6084/m9.figshare.13020569.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13020569.v2
Dataset updated
Oct 16, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jesse Gillis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the R version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in RMarkdown (.Rmd) and Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Rstudio or Jupyter.The scripts used to generate the data are included in the Github directory. Briefly: - full_biccn_hvg.rds contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in SingleCellExperiment format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes. - biccn_hvgs.txt: highly variable genes from the BICCN dataset described above (computed with the MetaNeighbor library). - biccn_gaba.rds: same dataset as full_biccn_hvg.rds, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes). - go_mouse.rds: gene ontology annotations, stored as a list of gene symbols (one element per gene set).- functional_aurocs.txt: results of the MetaNeighbor functional analysis in protocol 3.
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
s
Data and R code used in: Plant geographic distribution influences chemical...
repository.soilwise-he.eu
search.dataone.org
+1more
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Data and R code used in: Plant geographic distribution influences chemical defenses in native and introduced Plantago lanceolata populations [Dataset]. http://doi.org/10.5061/dryad.5dv41nsd1
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.5dv41nsd1
Dataset updated
Jan 22, 2024
Description
Open Access# Data and R code used in: Plant geographic distribution influences chemical defenses in native and introduced Plantago lanceolata populations ## Description of the data and file structure * 00_ReadMe_DescriptonVariables.csv: A list with the description of variables from each file used. * 00_Metadata_Coordinates.csv : A dataset that includes the coordinates of each Plantago lanceolata population used. * 00_Metadata_Climate.csv : A dataset that includes coordinates, bioclimatic parameters, and the results of PCA. The dataset was created based on the script '1_Environmental variables.qmd' * 00_Metadata_Individuals.csv: A dataset that includes general information about each plant individual. Information about root traits and chemistry is missing in four samples since we lost the samples. * 01_Datset_PlantTraits.csv: Size-related and resource allocation traits measured of Plantago lanceolata and herbivore damage. * 02_Dataset_TargetedCompounds.csv: Phytohormones, Iridoid glycosides, Verbascoside and Flavonoids quantification of the leaves and roots of Plantago lanceolata. Data generated from HPLC * 03_Dataset_Volatiles_Area.csv: Area of identified volatile compounds. Data generated from GC-FID * 03_Dataset_Volatiles_Compounds.csv: Information on identified volatile compounds. Data generated from GC-MS. * 04_Dataset_Metabolome_Negative_Metadata.txt: Metadata for files in negative mode * 04_Dataset_Metabolome_Negative_Intensity.xlsx : File with the intensity of the metabolite features in negative mode. The file was generated from Metaboscape and adapted as required for the Notame package. * 04_Dataset_Metabolome_Negative_Intensity_filtered.xlsx: File generated after preprocessing of features in negative mode. During the notadame pacakged preprossesing 0 were converted to na * 04_Dataset_Metabolome_Negative.msmsonly.csv: File with a intensity of the the metabolite features in negative mode with ms/ms data. File generated from Metaboscape. * 04_Results_Metabolome_Negative_canopus_compound_summary.tsv: Feature classification. Results generated from Sirius software. * 04_Results_Metabolome_Negative_compound_identifications.tsv: Feature identification. Results generated from Sirius software. * 05_Dataset_Metabolome_Positive_Metadata.txt: Metadata for files in positive mode * 05_DatasetMetabolome_Positive_Intensity.xlsx : File with a intensity of the the metabolite features in positive mode. File generated from Metaboscape and adapted as required for the Notame package. * 05_Dataset_Metabolome_Positive_Intensity_filtered: File generated after preprocessing of features in positive mode.During the notadame pacakged preprossesing 0 were converted to na ## ## Code/Software * 1_Environmental vairables.qmd: Rscript to Retrieve bioclimatic variables from based on the coordinates of each population and then perform a principal components analysis to reduce the axes variation and included the first principal component as an explanatory variable in our model to estimate trait differences between native and introduced populations. Figure 1b and 1d * 2_PlantTraits_and_Herbivory: Rscript for statistical anaylsis of size-related traits, resource allocation traits and herbivore damage. Figure 2. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * 3_Metabolome: Rscript for statistical anaylsis of Plantago lanceolata metabolome. Figure 3. It needs to source: Metabolome_preprocessing_R, Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R. * 4_TargetedCompounds: Rscript for statistical anaylsis of Plantago lanceolata targeted compounds. Figure 4. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * 5_Volatilome: Rscript for statistical anaylsis of Plantago lanceolata metabolome. Figure 5. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * Model_1_Function.R : Function to run statistical models * Model_2_Function.R : Function to run statistical models * Plots_Function.R : Function to run plot graphs * Metabolome_prepocessing.R: Script to preprocess features
Reddit: /r/travel
kaggle.com
zip
Updated Dec 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit: /r/travel [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-travel-experiences-desires-and-opinio
Explore at:
zip(369897 bytes)Available download formats
Dataset updated
Dec 18, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit: /r/travel

An Exploration of Users & Posts

By Reddit [source]

About this dataset

Traveling can be an incredibly exciting and rewarding experience; it is the perfect way to break away from the everyday routine and explore new cultures, sights, and sounds. For those planning a travel-related adventure – whether international or local – having access to real-user experiences in the form of advice and recommendations can mean the difference between a fantastic journey and a costly mistake. That's why this dataset of Reddit posts history on 'travel' is particularly useful for exploring Reddit users' opinions, desires, and experiences with their travel endeavors.

This dataset contains information on over 750+ Reddit posts regarding traveling as well as thousands of related comments over an extended period of time. For every post listed, data such as title, score (number of upvotes), URL link to page, number of comments given per post/comment thread, creation date/time stamp for both post/comment threads can be found.

All together these attributes provide detailed insights into user sentiments towards various aspects regarding traveling: What topics are they most interested in? What do they think are the best (or worst) destinations? Are there any tips or pitfalls that could inform our own decisions when embarking on our next journey? All this information resulting from our analysis will give us better guidance when helping us make smarter decisions during our planning process!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides valuable insights into the various opinions, desires and experiences of Redditors about travel-related activities. The data consists of posts and comments collected from the 'travel' sub reddit page on Reddit. To get started with this dataset, you need to first understand that each post includes data such as title, score, ID, url, number of comments created at the timestamp etc. This can be used to understand the kind of conversations that are happening in these forums regarding travel related topics.

Research Ideas

Analyzing user sentiment around various topics in the travel industry such as airlines, hotels, attractions and experiences.

Comparing time of year to the frequency of posts related to summer vacation or other holiday specific activities.

Examining which geographical locations generate the most interest among Redditors, and applying this data to marketing campaigns for those areas

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: travel.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments the post has received. (Integer) | | created | The date and time the post was created. (DateTime) | | body | The body of the post. (String) | | timestamp | The date and time the post was last updated. (DateTime) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.
Reddit r/developersindia Community Posts Dataset
kaggle.com
zip
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Utkarsh Gupta (aka AvGeekGupta) (2023). Reddit r/developersindia Community Posts Dataset [Dataset]. https://www.kaggle.com/datasets/avgeekgupta/reddit-rdevelopersindia-community-posts-dataset
Explore at:
zip(837396 bytes)Available download formats
Dataset updated
Oct 29, 2023
Authors
Utkarsh Gupta (aka AvGeekGupta)
License
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Description
Reddit r/developersindia Community Posts Dataset

https://www.redditstatic.com/desktop2x/img/favicon/android-icon-192x192.png"> https://styles.redditmedia.com/t5_2dfnk0/styles/communityIcon_uli9r9wy5lba1.png?width=256&s=7d293bdc1f306bce5ec9a495652c2a60706c4e1c">

Overview

This dataset contains information about posts from the r/developersindia subreddit on Reddit. It covers posts starting from July 7, 2023, and goes back in time. The dataset aims to provide insights into the discussions and activities within the r/developersindia community.

Columns

title: The title of the Reddit post.

selftext: The content of the post if it's a text-based post; otherwise, it may be empty or contain additional information.

subreddit: The name of the subreddit (r/developersindia).

author_flair_text: The flair text associated with the author of the post, if any.

num_comments: The number of comments on the post.

downs: The number of downvotes the post has received.

is_crosspostable: Indicates whether the post is crosspostable (True/False).

view_count: The number of views on the post.

ups: The number of upvotes the post has received.

url: The URL associated with the post.

is_video: Indicates whether the post contains a video (True/False).

num_crossposts: The number of times the post has been crossposted.

subreddit_subscribers: The number of subscribers to the r/developersindia subreddit.

author: The username of the author of the post.

treatment_tags: Tags or labels applied to the post for special treatment.

all_awardings: Information about any awards received by the post.

media: Information about media content associated with the post.

Potential Use Cases

This dataset can be valuable for various data analysis and machine learning tasks, including:

Sentiment analysis of posts within the r/developersindia community.

Topic modeling to identify common discussion topics.

User behavior analysis, such as post frequency and engagement.

Exploring trends and patterns in posts and comments.

Investigating the impact of upvotes, downvotes, and crossposts.
Fiber object
zenodo.org
data.niaid.nih.gov
bin
Updated Dec 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Boltengagen; Mark Boltengagen (2024). Fiber object [Dataset]. http://doi.org/10.5281/zenodo.14545007
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14545007
Dataset updated
Dec 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mark Boltengagen; Mark Boltengagen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains fiber R object, stored as an RDS file for use in R package for analysis of long-read sequencing.
Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx
frontiersin.figshare.com
docx
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ethan M. Meyers (2024). Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx [Dataset]. http://doi.org/10.3389/fninf.2023.1275903.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fninf.2023.1275903.s001
Dataset updated
Jan 3, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Ethan M. Meyers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.
sptotal R package data
catalog.data.gov
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). sptotal R package data [Dataset]. https://catalog.data.gov/dataset/sptotal-r-package-data
Explore at:
Dataset updated
Aug 28, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The data are just illustrative tools to help users understand how the sptotal R package works. As a result, those interested in sptotal, which may include EPA and the general public, may be interested in the data. This dataset is associated with the following publication: Higham, M., J. Ver Hoef, B. Frank, and M. Dumelle. sptotal: an R package for predicting totals and weighted sums from spatial data. Journal of Open Source Software. Journal of Open Source Software, 8(85): 05363, (2023).
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
nada-demo.ihsn.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
R
R package WallomicsData
entrepot.recherche.data.gouv.fr
Updated May 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harold Durufle; Harold Durufle (2023). R package WallomicsData [Dataset]. http://doi.org/10.57745/3J1XPO
Explore at:
Unique identifier
https://doi.org/10.57745/3J1XPO
Dataset updated
May 15, 2023
Dataset provided by
Recherche Data Gouv
Authors
Harold Durufle; Harold Durufle
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
Datasets from the WallOmics project. Contains phenomics, metabolomics, proteomics and transcriptomics data collected from two organs of five ecotypes of the model plant Arabidopsis thaliana exposed to two temperature growth conditions. Exploratory and integrative analyses of these data are presented in Durufle et al (2020) (doi:10.1093/bib/bbaa166) and Durufle et al (2020) (doi:10.3390/cells9102249).
Income data
kaggle.com
zip
Updated Apr 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatemeh Khosravi (2022). Income data [Dataset]. https://www.kaggle.com/datasets/fatimaarefi/adult-data
Explore at:
zip(41366 bytes)Available download formats
Dataset updated
Apr 25, 2022
Authors
Fatemeh Khosravi
Description
Dataset

This dataset was created by Fatemeh Khosravi

Contents

Predict income with neural networks in R.
f
Data and R code for analysis.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillips, Gareth L.; Bray, Peran; Fernandes, Leanne; Mosquera, Enrique; Bonin, Mary; Beard, Daniel; Morris, Sheriden; Godoy, Dan; Abom, Rickard T. M.; Matthews, Samuel A.; Campili, Adriana R.; Tracey, Dieter; Taylor, Sascha; Jonker, Michelle J.; Lang, Bethan J.; Sinclair-Taylor, Tane H.; Emslie, Michael J.; Hemingson, Christopher R.; Wilmes, Jennifer C.; Ceccarelli, Daniela M.; Williamson, David H.; Quincey, Richard; Beeden, Roger; Fletcher, Cameron S. (2024). Data and R code for analysis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001394297
Explore at:
Dataset updated
Apr 24, 2024
Authors
Phillips, Gareth L.; Bray, Peran; Fernandes, Leanne; Mosquera, Enrique; Bonin, Mary; Beard, Daniel; Morris, Sheriden; Godoy, Dan; Abom, Rickard T. M.; Matthews, Samuel A.; Campili, Adriana R.; Tracey, Dieter; Taylor, Sascha; Jonker, Michelle J.; Lang, Bethan J.; Sinclair-Taylor, Tane H.; Emslie, Michael J.; Hemingson, Christopher R.; Wilmes, Jennifer C.; Ceccarelli, Daniela M.; Williamson, David H.; Quincey, Richard; Beeden, Roger; Fletcher, Cameron S.
Description
Detailed code for data wrangling (“Coral Protection Outcomes_Wrangle.Rmd”) as well as analysis and figure generation (“Coral Protection Outcomes_FinguresAnalysis.Rmd”). Outputs from the data wrangling step to be used in the analysis script are included in the “CoralProtection.Rdata” file. (ZIP)
R
Detect R Dataset
universe.roboflow.com
zip
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
suranaree university of technology (2023). Detect R Dataset [Dataset]. https://universe.roboflow.com/suranaree-university-of-technology-khvje/detect-r/model/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 29, 2023
Dataset authored and provided by
suranaree university of technology
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
R Bounding Boxes
Description
Detect R

## Overview Detect R is a dataset for object detection tasks - it contains R annotations for 1,227 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Facebook

Twitter

Click to copy link

Link copied

Cite

Lucas Yukio Imafuko (2024). Friends - R Package Dataset [Dataset]. https://www.kaggle.com/datasets/lucasyukioimafuko/friends-r-package-dataset

Friends - R Package Dataset

The One with the Transcript

Explore at:

zip(2018791 bytes)Available download formats

Dataset updated

Nov 11, 2024

Authors

Lucas Yukio Imafuko

Description

The whole data and source can be found at https://emilhvitfeldt.github.io/friends/

"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."

Content

friends.csv - Contains the scenes and lines for each character, including season and episodes.
friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.
friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.

Uses

Text mining, sentiment analysis and word statistics.
Data visualizations.

Clear search

Close search

Google apps

Main menu

Friends - R Package Dataset

Content

Uses

Political Analysis Using R: Example Code and Data, Plus Data for Practice...

R-script to Analyse Data

R codes and dataset for Visualisation of Diachronic Constructional Change...

Basic R for Data Analysis

MICRON Data (2015-2016) with associated R Markdown code

Reddit: /r/stocks

Reddit: /r/stocks

Analyzing User Engagement to Identify Market Trends

About this dataset

More Datasets

Featured Notebooks

Acknowledgements

License

Columns

Acknowledgements

Protocol data (R version)

Simulation Data Set

Data and R code used in: Plant geographic distribution influences chemical...

Reddit: /r/travel

Reddit: /r/travel

An Exploration of Users & Posts

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Reddit r/developersindia Community Posts Dataset

Reddit r/developersindia Community Posts Dataset

Overview

Columns

Potential Use Cases

Fiber object

Data_Sheet_1_NeuroDecodeR: a package for neural decoding in R.docx

sptotal R package data

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

R package WallomicsData

Income data

Dataset

Contents

Data and R code for analysis.

Detect R Dataset

Detect R

Friends - R Package Dataset

The One with the Transcript

Content

Uses