21 datasets found
  1. Rdatasets

    • kaggle.com
    zip
    Updated Jul 11, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Tatman (2017). Rdatasets [Dataset]. https://www.kaggle.com/rtatman/rdatasets
    Explore at:
    zip(35365 bytes)Available download formats
    Dataset updated
    Jul 11, 2017
    Authors
    Rachael Tatman
    Description

    Context:

    Packages for the R programming language often include datasets. This dataset collects information on those datasets to make them easier to find.

    Content:

    Rdatasets is a collection of 1072 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.

    Acknowledgements:

    This data was collected by Vincent Arel-Bundock, @vincentarelbundock on Github. The version here was taken from Github on July 11, 2017 and is not actively maintained.

    Inspiration:

    In addition to helping find a specific dataset, this dataset can help answer questions about what data is included in R packages. Are specific topics very popular or unpopular? How big are datasets included in R packages? What the naming conventions/trends for packages that include data? What are the naming conventions/trends for datasets included in packages?

    License:

    This dataset is licensed under the GNU General Public License .

  2. Reddit /r/datasets Dataset

    • kaggle.com
    zip
    Updated Nov 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit /r/datasets Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-meta-corpus-of-datasets-the-reddit-dataset
    Explore at:
    zip(9619636 bytes)Available download formats
    Dataset updated
    Nov 28, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Meta-Corpus of Datasets: The Reddit Dataset

    The Complete Collection of Datasets Posted on Reddit

    By SocialGrep [source]

    About this dataset

    A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.

    Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.

    In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.

    You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes

    Research Ideas

    • Finding correlations between different types of datasets
    • Determining which datasets are most popular on Reddit
    • Analyzing the sentiments of post and comments on Reddit's /r/datasets board

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |

    File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.

  3. Demo tables

    • redivis.com
    application/jsonl +7
    Updated Mar 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2022). Demo tables [Dataset]. https://redivis.com/datasets/pt3b-a3xsg3h7x
    Explore at:
    application/jsonl, arrow, parquet, avro, sas, spss, csv, stataAvailable download formats
    Dataset updated
    Mar 9, 2022
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Time period covered
    Jan 1, 2015 - Mar 8, 2022
    Description
  4. R datasets to replicate 'Global Learning, Local Flow' 2018 preprint analyses...

    • figshare.com
    application/gzip
    Updated Jan 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Cowley (2021). R datasets to replicate 'Global Learning, Local Flow' 2018 preprint analyses [Dataset]. http://doi.org/10.6084/m9.figshare.7268387.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 13, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Benjamin Cowley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For the preprint https://psyarxiv.com/8ryue/---------------------------------------------------------Data-frame 'p-value_adjustment' has p-values from all analyses included in paper (named by test), plus their adjusted values after Bonferroni-Holm.Data-frame 'flow_LC_sEBR1-9' has participant-wise values for mean Flow, learning curve slope, spontaneous blink rate, to replicate analyses for RQ3.

  5. The Reddit Dataset Dataset

    • kaggle.com
    zip
    Updated Apr 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lexyr (2022). The Reddit Dataset Dataset [Dataset]. https://www.kaggle.com/datasets/pavellexyr/the-reddit-dataset-dataset/data
    Explore at:
    zip(9381796 bytes)Available download formats
    Dataset updated
    Apr 4, 2022
    Authors
    Lexyr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    Datasets... In a way, the Kaggle community is built around them. You can't analyze data without having it. Here, we aim to create a meta-corpus of datasets posted to Reddit. A dataset dataset, if you will.

    Content

    The following dataset is the comprehensive corpus of all the posts and comments made on Reddit's /r/datasets board, from its inception all the way to the first of March, 2022.

    The dataset was procured using SocialGrep.

    To preserve users' anonymity and to prevent targeted harassment, the data does not include usernames.

    Acknowledgements

    We would like to thank Chris Liverani for generously providing the cover image for this dataset.

    Inspiration

    Datasets are nice - we like our data.

  6. Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN

    • ckan.americaview.org
    Updated Sep 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.americaview.org (2022). Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/open-source-spatial-analytics-r
    Explore at:
    Dataset updated
    Sep 10, 2022
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.

  7. Data from: Object-centered sensorimotor bias of torque control in the...

    • figshare.com
    txt
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Schneider; Joachim Hermsdörfer (2022). Object-centered sensorimotor bias of torque control in the chronic stage following stroke [Dataset]. http://doi.org/10.6084/m9.figshare.17057675.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Thomas Schneider; Joachim Hermsdörfer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file 'Code_Object-centered_sensorimotor_bias.R' contains all analyses conducted in the R environment (R version 4.1.1). The R-data file 'Dataset_Schneider&Hermsdoerfer.RData' contains the analyzed data sets.

  8. Days not Spent at School

    • kaggle.com
    zip
    Updated Oct 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DavidS (2019). Days not Spent at School [Dataset]. https://www.kaggle.com/davids1992/days-not-spent-at-school
    Explore at:
    zip(1006 bytes)Available download formats
    Dataset updated
    Oct 3, 2019
    Authors
    DavidS
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    I found this toy dataset online and wanted to have an easy way to use it on Kaggle

    Content

    It's a very small data with 154 rows and 6 columns

    Acknowledgements

    The data comes from this page: https://vincentarelbundock.github.io/Rdatasets/

  9. t

    Raw data, R scripts and R datasets for statistical analyses from the...

    • researchdata.tuwien.ac.at
    bin, txt
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hester Sheehan; Hester Sheehan; Negin Afsharzadeh; Negin Afsharzadeh (2024). Raw data, R scripts and R datasets for statistical analyses from the research article 'Advancing Glycyrrhiza glabra L. cultivation and hairy root transformation and elicitation for future metabolite overexpression' [Dataset]. http://doi.org/10.48436/jczhc-srh29
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    TU Wien
    Authors
    Hester Sheehan; Hester Sheehan; Negin Afsharzadeh; Negin Afsharzadeh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description

    This dataset was created during the research carrried out for the PhD of Negin Afsharzadeh and the subsequent manuscript arising from this research. The main purpose of this dataset is to create a record of the raw data that was used in the analyses in the manuscript.

    This dataset includes:

    • raw data generated from experiments stored in an Excel spreadsheet with each sheet corresponding to a specific experiment or part of an experiment (Afsharzadeh_et_al_2024.xlsx)
    • R script used to analyse the raw data in the software, R (Afsharzadeh_et_al.R)
    • datasets that were used to analyse the data in the statistical software, R (germindata.txt, light.txt)

    Context and methodology

    Brief description of experiments:

    In this study, we aimed to optimize approaches to improve the biotechnological production of important metabolites in G. glabra. The study is made up of four experiments that correspond to particular figures/tables in the manuscript and data, as described below.

    Experiment 1:

    We tested approaches for the cultivation of G. glabra, specifically the breaking of seed dormancy, to ensure timely and efficient seed germination. To do this, we tested the effect of different pretreatments, sterilization treatments and growth media on the germination success of G. glabra.

    This experiment corresponds to:

    • Manuscript: Table 1 and Figure 1
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_1'); Afsharzadeh_et_al.R; germindata.txt

    Experiment 2 (Table 2):

    We aimed to optimize the induction of hairy roots in G. glabra. Four strains of R. rhizogenes were tested to identify the most effective strain for inducing hairy root formation and we tested different tissue explants (cotyledons/hypocotyls) and methods of R. rhizogenes infection (injection or soaking for different durations) in these tissues.

    This experiment corresponds to:

    • Manuscript: Table 2
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_2')

    Experiment 3 (Figure 2):

    Eight distinct hairy root lines were established and the growth rate of these lines was measured over 40 days.

    This experiment corresponds to:

    • Manuscript: Figure 2, Table S2
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Figure_2')

    Experiment 4 (Figure 3):

    We aimed to test different qualities of light on hairy root cultures in order to induce higher growth and possible enhanced metabolite production. A line with a high growth rate from experiment 3, line S, was selected for growth under different light treatments: red light, blue light, and a combination of blue and red light. To assess the overall impact of these treatments, the growth of line S, as well as the increase in antioxidant capacity and total phenolic content, were tracked over this induction period.

    This experiment corresponds to:

    • Manuscript: Figure 3, Figure S4
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheets 'Figure_3_FW', 'Figure_3_FRAP', 'Figure_3_Phenol'); Afsharzadeh_et_al.R; light.txt

    Technical details

    To work with the .R file and the R datasets, it is necessary to use R: A Language and Environment for Statistical Computing and a package within R, aDHARMA. The versions used for the analyses are R version 4.4.1 and aDHARMA version 0.4.6.

    The references for these are:

    R Core Team, R: A Language and Environment for Statistical Computing 2024. https://www.R-project.org/

    Hartig F, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models 2022. https://CRAN.R-project.org/package=DHARMa

  10. mpgdata

    • kaggle.com
    zip
    Updated Jun 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yscho (2022). mpgdata [Dataset]. https://www.kaggle.com/datasets/yeonseokcho/mpgdataset/discussion
    Explore at:
    zip(2164 bytes)Available download formats
    Dataset updated
    Jun 26, 2022
    Authors
    yscho
    Description

    This data set contains a subset of the fuel economy data. It contains only models which had a new release every year between 1999 and 2008 .

    Format of a data set: Data frame with 234 rows and 11 variables 1 manufacturer 2 model > model name 3 displ > engine displacement, in litres or size of engine 4 year > year of manufacture 5 cyl > number of cylinders 6 trans > type of transmission 7 drv > f = front-wheel drive, r = rear wheel drive, 4 = 4 wheel drive 8 cty > city miles per gallon 9 hwy > highway miles per gallon or efficiency 10 fl > fuel type 11 class > “type” of car

    you can download or check data set for mpg in below mentioned link: https://vincentarelbundock.github.io/Rdatasets/datasets.html

  11. All files are R datasets.

    • plos.figshare.com
    zip
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sasikiran Kandula; Mark Olfson; Madelyn S. Gould; Katherine M. Keyes; Jeffrey Shaman (2023). All files are R datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010945.s003
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sasikiran Kandula; Mark Olfson; Madelyn S. Gould; Katherine M. Keyes; Jeffrey Shaman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mortality.Rds: Monthly suicide deaths observed in each state during the study period. Nowcasts.Rds: Model hindcast estimates. Forecasts.Rds: Model forecast estimates. (ZIP)

  12. MacBride Hawks (MEI version)

    • kaggle.com
    zip
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian Dickerson (2023). MacBride Hawks (MEI version) [Dataset]. https://www.kaggle.com/datasets/mathsian/three-hawk-species
    Explore at:
    zip(8750 bytes)Available download formats
    Dataset updated
    Oct 18, 2023
    Authors
    Ian Dickerson
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The dataset contains measurements of nearly 900 birds from three different species: Cooper's Hawks, Red-tailed Hawks and Sharp-shinned Hawks.

    Source data

    "Students and faculty at Cornell College in Mount Vernon, Iowa, collected data over many years at the hawk blind at Lake MacBride near Iowa City, Iowa." (From Rdatasets.)

    The data was included in the Stats2Data package to accompany the book Stat2: Building Models for a World of Data.

    License: GPL-3

    Data processing

    The original dataset has been simplified somewhat for teaching purposes. * Five features are retained: Year, Species, Weight, Wing, Tail and Hallux. * Rows with missing values on these features have been dropped. * Three new binary features have been added: Red-tailed?, Coopers? and Sharp-shinned? * Each observation has then been randomly assigned to one of two files, hawks_main.csv and hawks_new.csv, for training and testing.

    Features

    FeatureDescription
    YearYear: 1992-2003
    SpeciesCH=Cooper's, RT=Red-tailed, SS=Sharp-shinned
    WingLength (in mm) of primary wing feather from tip to wrist it attaches to
    WeightBody weight (in gram)
    TailMeasurement (in mm) related to the length of the tail (invented at the MacBride Raptor Center)
    HalluxLength (in mm) of the killing talon
    Coopers?1 if Species = CH, 0 otherwise
    Red-tailed?1 if Species = RT, 0 otherwise
    Sharp-sinned?1 if Species = SS, 0 otherwise

    Cover image by Deborah Freeman (CC BY-SA 2.0)

  13. Sales Data with Leading Indicator

    • kaggle.com
    zip
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abid_Hussain (2025). Sales Data with Leading Indicator [Dataset]. https://www.kaggle.com/datasets/abidhussai512/sales-data-with-leading-indicator
    Explore at:
    zip(771 bytes)Available download formats
    Dataset updated
    Mar 8, 2025
    Authors
    Abid_Hussain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The data you provided seems to represent sales or some measurement (labeled "BJ sales") recorded over a period of 150 time intervals (likely days, but this is not explicitly stated). Here’s a detailed analysis of the data:

    https://www.stat.auckland.ac.nz/~wild/data/Rdatasets/?utm_source=chatgpt.com

    Key Characteristics:

    1. Time Periods: The data spans from time period 1 to time period 150. This suggests it could represent daily or weekly sales (or measurements) for a certain product or service.

    2. Sales Data (BJ sales): This column contains values that likely represent the sales or some performance metric at each time point. It fluctuates over time, showing trends that we can analyze.

    General Observations:

    • The values begin at 200.1 at time 1 and generally show a steady upward trend for the first 80 time periods.
    • There are fluctuations, but the overall pattern seems to be gradually increasing during the first 90 intervals.
    • After around time 90, the values seem to peak, reaching values around 246 in time period 96 and fluctuating around those levels, gradually increasing further around time 120.
    • The data appears to stabilize after around time period 130 and remains mostly in the 257 range, with some small variations.

    Trends:

    1. Increasing Trend (Time 1 to Time 96):

      • Sales grow from around 200 to a peak around 247.8.
      • There is a noticeable upward slope, indicating either seasonal growth, promotions, or other factors contributing to higher sales.
    2. Fluctuations Around Time 100 to Time 150:

      • After hitting the peak in the early 240s, the values seem to fluctuate between 247.6 and 262.7.
      • While there is a general high point around 260 starting from time 117 onward, there is some instability, suggesting potential changes in the market, product demand, or other factors influencing sales.

    Statistical Insights:

    • Maximum Value: The highest value is 263.3 (at time 146).
    • Minimum Value: The lowest value is 198.6 (at time 7).
    • Range: The difference between the maximum and minimum value is approximately 64.7 (263.3 - 198.6).
    • Average Sales: To compute an average (mean) sales figure, we would calculate the sum of all sales divided by 150 time periods.
  14. UrbanDictionary 1999-May2016 Definitions Corpus

    • figshare.com
    7z
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mita Lu (2023). UrbanDictionary 1999-May2016 Definitions Corpus [Dataset]. http://doi.org/10.6084/m9.figshare.4828954.v1
    Explore at:
    7zAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mita Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  15. 🇺🇸 Correlates of the Trump Vote in 2016

    • kaggle.com
    zip
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 🇺🇸 Correlates of the Trump Vote in 2016 [Dataset]. https://www.kaggle.com/datasets/mexwell/correlates-of-the-trump-vote-in-2016
    Explore at:
    zip(1524554 bytes)Available download formats
    Dataset updated
    Aug 14, 2023
    Authors
    mexwell
    Description

    These data come from the 2016 CCES and allow interested students to model the individual correlates of the Trump vote in 2016. Code/analysis heavily indebted to a 2017 analysis I did on my blog (see references).

    Cooperative Congressional Election Study, 2016

    References

    Original Data

    http://svmiller.com/blog/2017/04/age-income-racism-partisanship-trump-vote-2016/

    https://github.com/svmiller/2016-cces-trump-vote/blob/master/1-2016-cces-trump.R

  16. Subreddits

    • kaggle.com
    zip
    Updated May 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Severan (2018). Subreddits [Dataset]. https://www.kaggle.com/datasets/rayraegah/subreddits
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 12, 2018
    Authors
    Severan
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Context

    Analyse the popularity of public subreddits

    Content

    The CSV contains a long list of every subreddit on Reddit. There are a total of 1067472 subreddits and the columns in the dataset are:

    • base10 id,
    • base36 reddit id,
    • creation epoch,
    • subreddit name,
    • number of subscribers

    Acknowledgements

    This dataset was originally published on /r/datasets by /u/Stuck_In_the_Matrix

    Inspiration

  17. InsectSprays

    • kaggle.com
    zip
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Pitrat (2024). InsectSprays [Dataset]. https://www.kaggle.com/datasets/cpitrat/insectsprays
    Explore at:
    zip(441 bytes)Available download formats
    Dataset updated
    Jun 4, 2024
    Authors
    Colin Pitrat
    Description

    The counts of insects found in plots in agricultural experimental units treated with different insecticides.

    These are results from an insecticidal experiments arranged by Geoffrey Beall at Chatham, Ontario. The work was carried out with replicated blocks containing plots subjected to treatments of which the assignment was random. The counts are not complete counts but random sampling.

    These are the results of Experiment VII in the paper, which counts the tobacco hornworm, Phlegethontius quinquemaculata.

    This is the same dataset as the one included in R datasets: https://rdrr.io/r/datasets/InsectSprays.html

    For the complete dataset (including the 7 experiments of the paper) see https://www.kaggle.com/datasets/cpitrat/insectsprays-complete

    A data frame with 72 observations on 2 variables. - count: Insect count - spray: The type of spray used

  18. 350,000+ Jeopardy Questions

    • kaggle.com
    zip
    Updated Feb 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul (2020). 350,000+ Jeopardy Questions [Dataset]. https://www.kaggle.com/prondeau/350000-jeopardy-questions
    Explore at:
    zip(19977471 bytes)Available download formats
    Dataset updated
    Feb 4, 2020
    Authors
    Paul
    Description

    Context

    This dataset was obtained from Reddit user u/jwolle1 on https://www.reddit.com/r/datasets/comments/cj3ipd/jeopardy_dataset_with_349000_clues/

    Content

    Notes: - 349,641 clues in TSV format. Source: They prefer not to be named. DM for info. - I made one large complete dataset and also individual datasets for each season. The season files are small enough to open with Excel. - I tried to clean up all the formatting and encoding issues so there is minimal , \u201c, etc. - I tried to filter out all the impossible audio and video clues. - I included Alex's comments when he reads the categories at the beginning of each round. - I included a column that specifies whether a clue was a Daily Double or not (yes or no). - I made a note when clues come from special episodes (Teen Tournament, Celebrity Jeopardy, etc.). I was on the fence about including this but I decided it was the best way to find relatively easy or difficult clues. - I organized the data into chronological order from 1984 to present (July 2019, end of Season 35). And each category is grouped together so you can read it from top to bottom.

  19. UFC Fights Data 1993 - 2/23/2016

    • kaggle.com
    zip
    Updated Sep 13, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Formey (2017). UFC Fights Data 1993 - 2/23/2016 [Dataset]. https://www.kaggle.com/cformey24/ufc-fights-data-1993-2232016
    Explore at:
    zip(370479 bytes)Available download formats
    Dataset updated
    Sep 13, 2017
    Authors
    Chris Formey
    Description

    Context

    I found this at https://www.reddit.com/r/datasets/comments/47a7wh/ufc_fights_and_fighter_data/

    All credit goes to reddit user geyges and Sherdog.

    I do not own the data.

    Content

    This data has multiple categorical variables from every UFC fight from UFC 1 in 1993 - 2/23/2016.

    Acknowledgements

    Reddit u/geyges Sherdog UFC

    Inspiration

    So much information can be gained from this relevant to understanding how the sport has evolved over the years.

  20. 📈 Dow Jones Industrial Average, 1885 - 2021

    • kaggle.com
    zip
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 📈 Dow Jones Industrial Average, 1885 - 2021 [Dataset]. https://www.kaggle.com/datasets/mexwell/dow-jones-industrial-average-1885-2021
    Explore at:
    zip(287782 bytes)Available download formats
    Dataset updated
    Aug 14, 2023
    Authors
    mexwell
    Description

    This data set contains the value of the Dow Jones Industrial Average on daily close for all available dates (to the best of my knowledge) from 1885 to the most recently concluded calendar year. Extensions shouldn't be too difficult with existing packages.

    Observations before October 7, 1896 are from the single Dow Jones Average. Observations from October 7, 1896 to July 30, 1914 are from the first DJIA. Observations before the 1914 closure of the first DJIA in July 1914 come from MeasuringWorth. Observations from its reopening in Dec. 12, 1914 to January 28, 1985 come from Pinnacle Systems. Observations from January 29, 1985 to the most recent observation come from a quantmod call.

    References

    Samuel H. Williamson, 'Daily Closing Value of the Dow Jones Average, 1885 to Present,' MeasuringWorth, 2019.

    Jeffrey A. Ryan and Joshua M. Ulrich, 'quantmod: Quantitative Financial Modelling Framework,' 2018.

    Acknowledgement

    Original Data

    Foto von Aditya Vyas auf Unsplash

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rachael Tatman (2017). Rdatasets [Dataset]. https://www.kaggle.com/rtatman/rdatasets
Organization logo

Rdatasets

An archive of datasets distributed with different R packages

Explore at:
zip(35365 bytes)Available download formats
Dataset updated
Jul 11, 2017
Authors
Rachael Tatman
Description

Context:

Packages for the R programming language often include datasets. This dataset collects information on those datasets to make them easier to find.

Content:

Rdatasets is a collection of 1072 datasets that were originally distributed alongside the statistical software environment R and some of its add-on packages. The goal is to make these data more broadly accessible for teaching and statistical software development.

Acknowledgements:

This data was collected by Vincent Arel-Bundock, @vincentarelbundock on Github. The version here was taken from Github on July 11, 2017 and is not actively maintained.

Inspiration:

In addition to helping find a specific dataset, this dataset can help answer questions about what data is included in R packages. Are specific topics very popular or unpopular? How big are datasets included in R packages? What the naming conventions/trends for packages that include data? What are the naming conventions/trends for datasets included in packages?

License:

This dataset is licensed under the GNU General Public License .

Search
Clear search
Close search
Google apps
Main menu