Facebook
TwitterPackages for the R programming language often include datasets. This dataset collects information on those datasets to make them easier to find.
Rdatasets is a collection of 1072 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.
This data was collected by Vincent Arel-Bundock, @vincentarelbundock on Github. The version here was taken from Github on July 11, 2017 and is not actively maintained.
In addition to helping find a specific dataset, this dataset can help answer questions about what data is included in R packages. Are specific topics very popular or unpopular? How big are datasets included in R packages? What the naming conventions/trends for packages that include data? What are the naming conventions/trends for datasets included in packages?
This dataset is licensed under the GNU General Public License .
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By SocialGrep [source]
A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.
Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.
In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.
You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes
- Finding correlations between different types of datasets
- Determining which datasets are most popular on Reddit
- Analyzing the sentiments of post and comments on Reddit's /r/datasets board
If you use this dataset in your research, please credit the original authors.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |
File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.
Facebook
TwitterWells: https://vincentarelbundock.github.io/Rdatasets/doc/carData/Wells.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For the preprint https://psyarxiv.com/8ryue/---------------------------------------------------------Data-frame 'p-value_adjustment' has p-values from all analyses included in paper (named by test), plus their adjusted values after Bonferroni-Holm.Data-frame 'flow_LC_sEBR1-9' has participant-wise values for mean Flow, learning curve slope, spontaneous blink rate, to replicate analyses for RQ3.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets... In a way, the Kaggle community is built around them. You can't analyze data without having it. Here, we aim to create a meta-corpus of datasets posted to Reddit. A dataset dataset, if you will.
The following dataset is the comprehensive corpus of all the posts and comments made on Reddit's /r/datasets board, from its inception all the way to the first of March, 2022.
The dataset was procured using SocialGrep.
To preserve users' anonymity and to prevent targeted harassment, the data does not include usernames.
We would like to thank Chris Liverani for generously providing the cover image for this dataset.
Datasets are nice - we like our data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The file 'Code_Object-centered_sensorimotor_bias.R' contains all analyses conducted in the R environment (R version 4.1.1). The R-data file 'Dataset_Schneider&Hermsdoerfer.RData' contains the analyzed data sets.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
I found this toy dataset online and wanted to have an easy way to use it on Kaggle
It's a very small data with 154 rows and 6 columns
The data comes from this page: https://vincentarelbundock.github.io/Rdatasets/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created during the research carrried out for the PhD of Negin Afsharzadeh and the subsequent manuscript arising from this research. The main purpose of this dataset is to create a record of the raw data that was used in the analyses in the manuscript.
This dataset includes:
In this study, we aimed to optimize approaches to improve the biotechnological production of important metabolites in G. glabra. The study is made up of four experiments that correspond to particular figures/tables in the manuscript and data, as described below.
We tested approaches for the cultivation of G. glabra, specifically the breaking of seed dormancy, to ensure timely and efficient seed germination. To do this, we tested the effect of different pretreatments, sterilization treatments and growth media on the germination success of G. glabra.
This experiment corresponds to:
We aimed to optimize the induction of hairy roots in G. glabra. Four strains of R. rhizogenes were tested to identify the most effective strain for inducing hairy root formation and we tested different tissue explants (cotyledons/hypocotyls) and methods of R. rhizogenes infection (injection or soaking for different durations) in these tissues.
This experiment corresponds to:
Eight distinct hairy root lines were established and the growth rate of these lines was measured over 40 days.
This experiment corresponds to:
We aimed to test different qualities of light on hairy root cultures in order to induce higher growth and possible enhanced metabolite production. A line with a high growth rate from experiment 3, line S, was selected for growth under different light treatments: red light, blue light, and a combination of blue and red light. To assess the overall impact of these treatments, the growth of line S, as well as the increase in antioxidant capacity and total phenolic content, were tracked over this induction period.
This experiment corresponds to:
To work with the .R file and the R datasets, it is necessary to use R: A Language and Environment for Statistical Computing and a package within R, aDHARMA. The versions used for the analyses are R version 4.4.1 and aDHARMA version 0.4.6.
The references for these are:
R Core Team, R: A Language and Environment for Statistical Computing 2024. https://www.R-project.org/
Hartig F, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models 2022. https://CRAN.R-project.org/package=DHARMa
Facebook
TwitterThis data set contains a subset of the fuel economy data. It contains only models which had a new release every year between 1999 and 2008 .
Format of a data set: Data frame with 234 rows and 11 variables 1 manufacturer 2 model > model name 3 displ > engine displacement, in litres or size of engine 4 year > year of manufacture 5 cyl > number of cylinders 6 trans > type of transmission 7 drv > f = front-wheel drive, r = rear wheel drive, 4 = 4 wheel drive 8 cty > city miles per gallon 9 hwy > highway miles per gallon or efficiency 10 fl > fuel type 11 class > “type” of car
you can download or check data set for mpg in below mentioned link: https://vincentarelbundock.github.io/Rdatasets/datasets.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mortality.Rds: Monthly suicide deaths observed in each state during the study period. Nowcasts.Rds: Model hindcast estimates. Forecasts.Rds: Model forecast estimates. (ZIP)
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset contains measurements of nearly 900 birds from three different species: Cooper's Hawks, Red-tailed Hawks and Sharp-shinned Hawks.
"Students and faculty at Cornell College in Mount Vernon, Iowa, collected data over many years at the hawk blind at Lake MacBride near Iowa City, Iowa." (From Rdatasets.)
The data was included in the Stats2Data package to accompany the book Stat2: Building Models for a World of Data.
License: GPL-3
The original dataset has been simplified somewhat for teaching purposes.
* Five features are retained: Year, Species, Weight, Wing, Tail and Hallux.
* Rows with missing values on these features have been dropped.
* Three new binary features have been added: Red-tailed?, Coopers? and Sharp-shinned?
* Each observation has then been randomly assigned to one of two files, hawks_main.csv and hawks_new.csv, for training and testing.
| Feature | Description |
|---|---|
| Year | Year: 1992-2003 |
| Species | CH=Cooper's, RT=Red-tailed, SS=Sharp-shinned |
| Wing | Length (in mm) of primary wing feather from tip to wrist it attaches to |
| Weight | Body weight (in gram) |
| Tail | Measurement (in mm) related to the length of the tail (invented at the MacBride Raptor Center) |
| Hallux | Length (in mm) of the killing talon |
| Coopers? | 1 if Species = CH, 0 otherwise |
| Red-tailed? | 1 if Species = RT, 0 otherwise |
| Sharp-sinned? | 1 if Species = SS, 0 otherwise |
Cover image by Deborah Freeman (CC BY-SA 2.0)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data you provided seems to represent sales or some measurement (labeled "BJ sales") recorded over a period of 150 time intervals (likely days, but this is not explicitly stated). Here’s a detailed analysis of the data:
https://www.stat.auckland.ac.nz/~wild/data/Rdatasets/?utm_source=chatgpt.com
Time Periods: The data spans from time period 1 to time period 150. This suggests it could represent daily or weekly sales (or measurements) for a certain product or service.
Sales Data (BJ sales): This column contains values that likely represent the sales or some performance metric at each time point. It fluctuates over time, showing trends that we can analyze.
Increasing Trend (Time 1 to Time 96):
Fluctuations Around Time 100 to Time 150:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is described here:https://www.reddit.com/r/datasets/comments/63spoc/19gb_of_urban_dictionary_definitions_1999_may_2016/
Facebook
TwitterThese data come from the 2016 CCES and allow interested students to model the individual correlates of the Trump vote in 2016. Code/analysis heavily indebted to a 2017 analysis I did on my blog (see references).
Cooperative Congressional Election Study, 2016
http://svmiller.com/blog/2017/04/age-income-racism-partisanship-trump-vote-2016/
https://github.com/svmiller/2016-cces-trump-vote/blob/master/1-2016-cces-trump.R
Facebook
Twitterhttps://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Analyse the popularity of public subreddits
The CSV contains a long list of every subreddit on Reddit. There are a total of 1067472 subreddits and the columns in the dataset are:
This dataset was originally published on /r/datasets by /u/Stuck_In_the_Matrix
Facebook
TwitterThe counts of insects found in plots in agricultural experimental units treated with different insecticides.
These are results from an insecticidal experiments arranged by Geoffrey Beall at Chatham, Ontario. The work was carried out with replicated blocks containing plots subjected to treatments of which the assignment was random. The counts are not complete counts but random sampling.
These are the results of Experiment VII in the paper, which counts the tobacco hornworm, Phlegethontius quinquemaculata.
This is the same dataset as the one included in R datasets: https://rdrr.io/r/datasets/InsectSprays.html
For the complete dataset (including the 7 experiments of the paper) see https://www.kaggle.com/datasets/cpitrat/insectsprays-complete
A data frame with 72 observations on 2 variables. - count: Insect count - spray: The type of spray used
Facebook
TwitterThis dataset was obtained from Reddit user u/jwolle1 on https://www.reddit.com/r/datasets/comments/cj3ipd/jeopardy_dataset_with_349000_clues/
Notes: - 349,641 clues in TSV format. Source: They prefer not to be named. DM for info. - I made one large complete dataset and also individual datasets for each season. The season files are small enough to open with Excel. - I tried to clean up all the formatting and encoding issues so there is minimal , \u201c, etc. - I tried to filter out all the impossible audio and video clues. - I included Alex's comments when he reads the categories at the beginning of each round. - I included a column that specifies whether a clue was a Daily Double or not (yes or no). - I made a note when clues come from special episodes (Teen Tournament, Celebrity Jeopardy, etc.). I was on the fence about including this but I decided it was the best way to find relatively easy or difficult clues. - I organized the data into chronological order from 1984 to present (July 2019, end of Season 35). And each category is grouped together so you can read it from top to bottom.
Facebook
TwitterI found this at https://www.reddit.com/r/datasets/comments/47a7wh/ufc_fights_and_fighter_data/
All credit goes to reddit user geyges and Sherdog.
I do not own the data.
This data has multiple categorical variables from every UFC fight from UFC 1 in 1993 - 2/23/2016.
Reddit u/geyges Sherdog UFC
So much information can be gained from this relevant to understanding how the sport has evolved over the years.
Facebook
TwitterThis data set contains the value of the Dow Jones Industrial Average on daily close for all available dates (to the best of my knowledge) from 1885 to the most recently concluded calendar year. Extensions shouldn't be too difficult with existing packages.
Observations before October 7, 1896 are from the single Dow Jones Average. Observations from October 7, 1896 to July 30, 1914 are from the first DJIA. Observations before the 1914 closure of the first DJIA in July 1914 come from MeasuringWorth. Observations from its reopening in Dec. 12, 1914 to January 28, 1985 come from Pinnacle Systems. Observations from January 29, 1985 to the most recent observation come from a quantmod call.
Samuel H. Williamson, 'Daily Closing Value of the Dow Jones Average, 1885 to Present,' MeasuringWorth, 2019.
Jeffrey A. Ryan and Joshua M. Ulrich, 'quantmod: Quantitative Financial Modelling Framework,' 2018.
Foto von Aditya Vyas auf Unsplash
Facebook
TwitterPackages for the R programming language often include datasets. This dataset collects information on those datasets to make them easier to find.
Rdatasets is a collection of 1072 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.
This data was collected by Vincent Arel-Bundock, @vincentarelbundock on Github. The version here was taken from Github on July 11, 2017 and is not actively maintained.
In addition to helping find a specific dataset, this dataset can help answer questions about what data is included in R packages. Are specific topics very popular or unpopular? How big are datasets included in R packages? What the naming conventions/trends for packages that include data? What are the naming conventions/trends for datasets included in packages?
This dataset is licensed under the GNU General Public License .