13 datasets found

Reddit: /r/Art
kaggle.com
zip
Updated Dec 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit: /r/Art [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-online-art-trends-with-reddit-posting/discussion?sort=undefined
Explore at:
zip(84621 bytes)Available download formats
Dataset updated
Dec 17, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

By Reddit [source]

About this dataset

This dataset offers an in-depth exploration of the artistic world of Reddit, with a focus on the posts available on the website. By examining the titles, scores, ID's, URLs, comments, creation dates and timestamps associated with each post about art on Reddit, researchers can gain invaluable insight into how art enthusiasts share their work and build networks within this platform. Through analyzing this data we can understand what sorts of topics attract more attention from viewers and how members interact with one another in online discussions. Moreover, this dataset has potential to explore some of the larger underlying issues that shape art communities today - from examining production trends to better understanding consumption patterns. Overall, this comprehensive dataset is an essential resource for those aiming to analyze and comprehend digital spaces where art is circulated and discussed - giving unique insight into how ideas are created and promoted throughout creative networks

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is an excellent source of information related to online art trends, providing comprehensive analysis of Reddit posts related to art. In this guide, we’ll discuss how you can use this dataset to gather valuable insights about the way in which art is produced and shared on the web.
First and foremost, you should start by familiarizing yourself with the columns included in the dataset. Each post contains a title, score (number of upvotes), URL, comments (number of comments), created date and timestamp. When interpreting each column individually or comparing different posts/threads, these values will provide invaluable insight into topics such as most discussed or favored content within the Reddit community.
After exploring the general features within each post/thread in your analysis it’s time to move onto more specific components such as body content (including images) and creative dates - when users began responding and interacting with content posted about a specific topic or action related item). Utilizing these variables will help researchers uncover meaningful patterns regarding how communities interact with certain types of content over longer periods of time & also give context from what type of topics are trending at any given moment when analyzing at shorter intervals.
Finally one last creative output that can stem from using this data set revolves around examining titles for common words & phrases that appear often among posts discussing similar types of artwork or other forms media production - identifying potential keywords & symbols associated across several different groups can paint a holistic picture regards what kind engagement each group desires while they engage amongst other like-minded individuals further aided by parameters presented through number scores what helps measure overall reception per submissions or individual thoughts presented in comment thread discussions among others known similar outlets available on site itself! Here's hoping utilizing these techniques may bring attention to some possible conclusions derived already exists previously undiscovered apart our eyes – good luck everyone!

Research Ideas

Analyzing topics and themes within art posts to determine what content is most popular.

Examining the score of art posts to determine how the responding audience engages with each piece.

Comparing across different subreddits to explore the ‘meta-discourse’ of topics that appear in multiple forums or platforms

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Art.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | ...
Data and tools for studying isograms
figshare.com
Updated Jul 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5245810.v1
Dataset updated
Jul 31, 2017
Dataset provided by
Figsharehttp://figshare.com/
Authors
Florian Breit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

Label Data type Description

isogramy int The order of isogramy, e.g. "2" is a second order isogram

length int The length of the word in letters

word text The actual word/isogram in ASCII

source_pos text The Part of Speech tag from the original corpus

count int Token count (total number of occurences)

vol_count int Volume count (number of different sources which contain the word)

count_per_million int Token count per million words

vol_count_as_percent int Volume count as percentage of the total number of volumes

is_palindrome bool Whether the word is a palindrome (1) or not (0)

is_tautonym bool Whether the word is a tautonym (1) or not (0)

The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

Label

Data type

Description

!total_1grams

int

The total number of words in the corpus

!total_volumes

int

The total number of volumes (individual sources) in the corpus

!total_isograms

int

The total number of isograms found in the corpus (before compacting)

!total_palindromes

int

How many of the isograms found are palindromes

!total_tautonyms

int

How many of the isograms found are tautonyms

The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
R Package History on CRAN
kaggle.com
zip
Updated Jul 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heads or Tails (2022). R Package History on CRAN [Dataset]. https://www.kaggle.com/datasets/headsortails/r-package-history-on-cran/code
Explore at:
zip(5637913 bytes)Available download formats
Dataset updated
Jul 18, 2022
Authors
Heads or Tails
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Comprehensive R Archive Network (CRAN) is the central repository for software packages in the powerful R programming language for statistical computing. It describes itself as "a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R." If you're installing an R package in the standard way then it is provided by one of the CRAN mirrors.

The ecosystem of R packages continues to grow at an accelerated pace, covering a multitude of aspects of statistics, machine learning, data visualisation, and many other areas. This dataset provides monthly updates of all the packages available through CRAN, as well as their release histories. Explore the evolution of the R multiverse and all of its facets through this comprehensive data.

Content

I'm providing 2 csv tables that describe the current set of R packages on CRAN, as well as the version history of these packages. To derive the data, I made use of the fantastic functionality of the tools package, via the CRAN_package_db function, and the equally wonderful packageRank package and its packageHistory function. The results from those function were slightly adjusted and formatted. I might add further related tables over time.

See the associated blog post for how the data was derived, and for some ideas on how to explore this dataset.

These are the tables contained in this dataset:

cran_package_overview.csv: all R packages currently available through CRAN, with (usually) 1 row per package. (At the time of the creation of this Kaggle dataset there were a few packages with 2 entries and different dependencies. Feel free to contribute some EDA investigating those.) Packages are listed in alphabetical order according to their names.

cran_package_history.csv: version history of virtually all packages in the previous table. This table has one row for each combination of package name and version number, which in most cases leads to multiple rows per package. Packages are listed in alphabetical order according to their names.

I will update this dataset on a roughly monthly cadence by checking which packages have newer version in the overview table, and then replacing

Column Description

Table cran_package_overview.csv: I decided to simplify the large number of columns provided by CRAN and tools::CRAN_package_db into a smaller set of more focus features. All columns are formatted as strings, except for the boolean feature needs_compilation, but the date_published can be read as a ymd date:

package: package name following the official spelling and capitalisation. Table is sorted alphabetically according to this column.

version: current version.

depends: package depends on which other packages.

imports: package imports which other packages.

licence: the licence under which the package is distributed (e.g. GPL versions)

needs_compilation: boolean feature describing whether the package needs to be compiled.

author: package author.

bug_reports: where to send bugs.

url: where to read more.

date_published: when the current version of the package was published. Note: this is not the date of the initial package release. See the package history table for that.

description: relatively detailed description of what the package is doing.

title: the title and tagline of the package.

Table cran_package_history.csv: The output of packageRank::packageHistory for each package from the overview table. Almost all of them have a match in this table, and can be matched by package and version. All columns are strings, and the date can again be parsed as a ymd date:

package: package name. Joins to the feature of the same name in the overview table. Table is sorted alphabetically according to this column.

version: historical or current package version. Also joins. Secondary sorting column within each package name.

date: when this version was published. Should sort in the same way as the version does.

repository: on CRAN or in the Archive.

Acknowledgements

All data is being made publicly available by the Comprehensive R Archive Network (CRAN). I'm grateful to the authors and maintainers of the packages tools and packageRank for providing the functionality to query CRAN packages smoothly and easily.

The vignette photo is the official logo for the R language © 2016 The R Foundation. You can distribute the logo under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license...
w
myview
data.wu.ac.at
Updated Dec 16, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sindhu (2015). myview [Dataset]. https://data.wu.ac.at/schema/data_kcmo_org/aG11ay1qdGk3
Explore at:
Dataset updated
Dec 16, 2015
Dataset provided by
Sindhu
Description
This dataset contains basic data for each page on kcmo.gov. The data is monthly aggregate data and contains every page on the kcmo.gov domain.

This data is pulled directly from Google Analytics into R via the RGoogleAnalytics package (https://github.com/Tatvic/RGoogleAnalytics). The data is then manipulated to change variable names (column headers) and to assign a row ID and sort them in the order page title > Year Month.
g
Integration of Slurry Separation Technology & Refrigeration Units: Air...
gimi9.com
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
Explore at:
Dataset updated
Jun 25, 2024
Description
This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
Data from: Projections of Definitive Screening Designs by Dropping Columns:...
tandf.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan R. Vazquez; Peter Goos; Eric D. Schoen (2023). Projections of Definitive Screening Designs by Dropping Columns: Selection and Evaluation [Dataset]. http://doi.org/10.6084/m9.figshare.7624412.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7624412.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Alan R. Vazquez; Peter Goos; Eric D. Schoen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract–Definitive screening designs permit the study of many quantitative factors in a few runs more than twice the number of factors. In practical applications, researchers often require a design for m quantitative factors, construct a definitive screening design for more than m factors and drop the superfluous columns. This is done when the number of runs in the standard m-factor definitive screening design is considered too limited or when no standard definitive screening design (sDSD) exists for m factors. In these cases, it is common practice to arbitrarily drop the last columns of the larger design. In this article, we show that certain statistical properties of the resulting experimental design depend on the exact columns dropped and that other properties are insensitive to these columns. We perform a complete search for the best sets of 1–8 columns to drop from sDSDs with up to 24 factors. We observed the largest differences in statistical properties when dropping four columns from 8- and 10-factor definitive screening designs. In other cases, the differences are small, or even nonexistent.
g
Integration of Slurry Separation Technology & Refrigeration Units: Air...
gimi9.com
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CH4 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-ch4-8abb6/
Explore at:
Dataset updated
Jun 25, 2024
Description
Methane concentration of biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.

Supplement 1. R code and data files used to train and evaluate species...

wiley.figshare.com

html

Updated Jun 2, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephen J. Tulowiecki; Chris P. S. Larsen (2023). Supplement 1. R code and data files used to train and evaluate species distribution models (SDMs). [Dataset]. http://doi.org/10.6084/m9.figshare.3569064.v1

Explore at:

htmlAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3569064.v1

Dataset updated

Jun 2, 2023

Dataset provided by

Wileyhttps://www.wiley.com/

Authors

Stephen J. Tulowiecki; Chris P. S. Larsen

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

File List Ecol_Monograph_supplement_code_biomod2.txt (md5: 1468e75dbf74ed624a8dce871743f924) Ecol_Monograph_supplement_code_dismo_1.txt (md555b20fbe747f7601c53d5b56a93459ea: ) Ecol_Monograph_supplement_code_dismo_2.txt (md5: a33a1745062f1bf816c3d9ec797cdd46) Ecol_Monograph_supplement_code_dismo_3.txt (md5: aff301c5ba52f04eff85e561122964c4) Ecol_Monograph_supplement_code_dismo_4.txt (md5: 244ff730dbd9da02a5439cfd95a439ca) Ecol_Monograph_supplement_code_dismo_5.txt (md5: bec6a05bf1d737b941d0a7a00bde3658) lot_line_section_with_predictors.csv (md5: 48dc1b92e2d3d3b3e4875ef0dc3b87a7) township_bt_post_with_predictors.csv (md5: 86f08554a0a65fec8065f85335aa8ec5) township_line_section_with_predictors.csv (md5: d028af68dcd8f7bca5b28e969cc5c796) biomod2_predictors.zip (md5: 7ab5a1d2ef1847fe64a47483e8220d70)

  Description

   This supplement contains the data and code that were used to train and evaluate species distribution models (SDMs). Included are six (6) .txt files that contain code to be run in R, and three (3) .csv files that contain the training data and evaluation data. For all files that contain code, comments are included (“#...”) to describe its functioning.

    There are two notes regarding the code files in this supplement. First, users seeking to recreate the results should be aware that minor edits to the code are necessary, in order to make sure all pathnames that are referenced in the code will match the locations where the user is storing the data files. Second, the presented code is for training SDMs that include Native American variables (NAVs). A few minor edits to the code would need to be made, in order to run SDMs that exclude NAVs; these edits are documented in the comments of the code files. Both edits are minor and should take little time to make.

    Also worth noting is the considerable processing time required to train and evaluate the models. While the “biomod2” code is highly-automated, it could still require several hours to a few days to run, on a personal computer. The “dismo” codes could take several days to one week to run properly; these codes also involve much more “manual” inputting of blocks of code into R. Alternatively, more advanced users of R could edit the code to function as a script and/or be more automated.

   The following is a description of each individual file.


     Ecol_Monograph_supplement_code_biomod2.txt – this file contains the code for training SDMs from the Holland Land Company (HLC) line-description (or “line section”) data, using three SDM algorithms from the “biomod2” package in R: Generalized Additive Models (GAMs), Generalized Linear Models (GLMs), and Multivariate Adaptive Regression Splines (MARS).
     Five .txt files contain additional code for training and evaluating boosted regression tree (BRT) models, using the “dismo” package in R. The code for BRT model development was broken down into five files, which must be run in succession. Note that due to the “stochastic” nature of BRT models, slightly different model results may result, in comparison to the results reported in the article.
     Ecol_Monograph_supplement_code_dismo_1.txt – this code loads the training data, and trains an initial set of BRT models. 
     Ecol_Monograph_supplement_code_dismo_2.txt – this code runs a procedure that suggests the number of variables that can be dropped from the initial set of BRT models.
     Ecol_Monograph_supplement_code_dismo_3.txt – this code creates a set of simplified BRT models with fewer variables, as determined by the previous step.
     Ecol_Monograph_supplement_code_dismo_4.txt – this code loads evaluation data, loads raster versions of predictor variables, projects models into geographic space, calculates variable importance, plots response curves, and evaluates models upon training data and evaluation data.
     Ecol_Monograph_supplement_code_dismo_5.txt – this code saves false positive rates and false negative rates for each model, when evaluated upon the training data and evaluation data.
    .csv files – these files contain the training data and evaluation data:
     lot_line_section_with_predictors.csv – this file contains the line-description data that was used to train SDMs.
     township_bt_post_with_predictors.csv – this file contains the township bearing-tree data, which was used to evaluate SDMs.
     township_line_section_with_predictors.csv – this file contains the township line-description data, which was used to evaluate SDMs.
     The township data above were used with the permission of Dr. Yi-Chen Wang. For more information regarding these datasets, see:

      Wang, Y.-C. 2007. Spatial patterns and vegetation-site relationships of the presettlement forests in western New York, USA. Journal of Biogeography 34:500–513.
      Tulowiecki, S. J., C. P. S. Larsen, and Y.-C. Wang. 2014. Effects of positional error on modeling species distributions: a perspective using presettlement land survey records. Plant Ecology 216:67–85. 


   The following table contains descriptions of the columns, and checksum values, for the .csv files (sorted alphabetically by column name). With the exception of the “weights” columns, the three .csv files share the same column names (but obviously with different values). The evaluation data (“township_bt_post_with_ predictors.csv” and “township_line_section_with_predictors.csv”) do not contain case weight columns, because case weights were only used when training models using the training data (“lot_line_section_with_ predictors.csv”). There are no blank cell values in these .csv files.
    -- TABLE: Please see in attached file. --

    biomod2_predictors.zip – this zipped file contains the predictor variables in raster format (coordinate system: UTM Zone 17N) that were used to project SDMs into geographic space, in order to train SDMs and create prediction surfaces.

Data from: Global Superstore Dataset
kaggle.com
zip
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih İlhan (2023). Global Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset
Explore at:
zip(3349507 bytes)Available download formats
Dataset updated
Nov 16, 2023
Authors
Fatih İlhan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About this file The Kaggle Global Superstore dataset is a comprehensive dataset containing information about sales and orders in a global superstore. It is a valuable resource for data analysis and visualization tasks. This dataset has been processed and transformed from its original format (txt) to CSV using the R programming language. The original dataset is available here, and the transformed CSV file used in this analysis can be found here.

Here is a description of the columns in the dataset:

category: The category of products sold in the superstore.

city: The city where the order was placed.

country: The country in which the superstore is located.

customer_id: A unique identifier for each customer.

customer_name: The name of the customer who placed the order.

discount: The discount applied to the order.

market: The market or region where the superstore operates.

ji_lu_shu: An unknown or unspecified column.

order_date: The date when the order was placed.

order_id: A unique identifier for each order.

order_priority: The priority level of the order.

product_id: A unique identifier for each product.

product_name: The name of the product.

profit: The profit generated from the order.

quantity: The quantity of products ordered.

region: The region where the order was placed.

row_id: A unique identifier for each row in the dataset.

sales: The total sales amount for the order.

segment: The customer segment (e.g., consumer, corporate, or home office).

ship_date: The date when the order was shipped.

ship_mode: The shipping mode used for the order.

shipping_cost: The cost of shipping for the order.

state: The state or region within the country.

sub_category: The sub-category of products within the main category.

year: The year in which the order was placed.

market2: Another column related to market information.

weeknum: The week number when the order was placed.

This dataset can be used for various data analysis tasks, including understanding sales patterns, customer behavior, and profitability in the context of a global superstore.
Reddit: /r/Damnthatsinteresting
kaggle.com
zip
Updated Dec 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit: /r/Damnthatsinteresting [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-the-power-of-user-engagement-on-damnth
Explore at:
zip(139409 bytes)Available download formats
Dataset updated
Dec 18, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit: /r/Damnthatsinteresting

Investigating Popularity, Score and Engagement Across Subreddits

By Reddit [source]

About this dataset

This dataset provides valuable insights into user engagement and popularity across the subreddit Damnthatsinteresting. With detailed metrics on various discussions such as the title, score, id, URL, comments, created date and time, body and timestamp of each discussion. This dataset opens a window into the world of user interaction on Reddit by letting researchers align their questions with data-driven results to understand social media behavior. Gain an understanding of what drives people to engage in certain conversations as well as why certain topics become trending phenomena – it’s all here for analysis. Enjoy exploring this fascinating collection of information about Reddit users' activities!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides valuable insights into user engagement and the impact of users interactions on the popular subreddit DamnThatsInteresting. Exploring this dataset can help uncover trends in participation, what content is resonating with viewers, and how different users are engaging with each other. In order to get the most out of this dataset, you will need to understand its structure in order to explore and extract meaningful insights. The columns provided include: title, score, url, comms_num, created date/time (created), body and timestamp.

Research Ideas

Analyzing the impact of user comments on the popularity and engagement of discussions

Examining trends in user behavior over time to gain insight into popular topics of discussion

Investigating which discussions reach higher levels of score, popularity or engagement to identify successful strategies for engaging users

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Damnthatsinteresting.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------| | title | The title of the discussion thread. (String) | | score | The number of upvotes the discussion has received from users. (Integer) | | url | The URL link for the discussion thread itself. (String) | | comms_num | The number of comments made on a particular discussion. (Integer) | | created | The date and time when the discussion was first created on Reddit by its original poster (OP). (DateTime) | | body | Full content including text body with rich media embedded within posts such as images/videos etc. (String) | | timestamp | When was last post updated by any particular user. (DateTime) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.
movies
kaggle.com
zip
Updated Mar 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vinay malik (2023). movies [Dataset]. https://www.kaggle.com/datasets/vinaymalik06/movies/discussion?sort=undefined
Explore at:
zip(1459362 bytes)Available download formats
Dataset updated
Mar 9, 2023
Authors
vinay malik
Description
The Kaggle Movies dataset is available in CSV format and consists of one file: "movies.csv".

The file contains data on over 10,000 movies and includes fields such as title, release date, director, cast, genre, language, budget, revenue, and rating. The file is approximately 3 MB in size and can be easily imported into popular data analysis tools such as Excel, Python, R, and Tableau.

The data is organized into rows and columns, with each row representing a single movie and each column representing a specific attribute of the movie. The file contains a header row that provides a description of each column.

The file has been cleaned and processed to remove any duplicates or inconsistencies. However, the data is provided as-is, without any warranties or guarantees of accuracy or completeness.

The "movies.csv" file in the Kaggle Movies dataset includes the following columns:

id: The unique identifier for each movie. title: The title of the movie. overview: A brief summary of the movie. release_date: The date when the movie was released (in YYYY-MM-DD format). Popularity: A numerical score indicating the relative popularity of each movie, based on factors such as user ratings, social media mentions, and box office performance. Vote Average: The average rating given to the movie by users of the IMDb website (on a scale of 0-10). Vote Count: The number of ratings given to the movie by users of the IMDb website.
Reddit's /r/funny Subreddit
kaggle.com
zip
Updated Dec 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit's /r/funny Subreddit [Dataset]. https://www.kaggle.com/datasets/thedevastator/explore-reddit-s-funny-subreddit-analyze-communi/code
Explore at:
zip(93052 bytes)Available download formats
Dataset updated
Dec 15, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Explore Reddit's Funny Subreddit & Analyze Community Engagement!

Quantifying Community Interaction Through Reddit Posts

By Reddit [source]

About this dataset

This dataset offers an insightful analysis into one of the most talked-about online communities today: Reddit. Specifically, we are focusing on the funny subreddit, a subsection of the main forum that enjoys the highest engagement across all Reddit users. Not only does this dataset include post titles, scores and other details regarding post creation and engagement; it also includes powerful metrics to measure active community interaction such as comment numbers and timestamps. By diving deep into this data, we can paint a fuller picture in terms of what people find funny in our digital age - how well do certain topics draw responses? How does sentiment change over time? And how can community managers use these insights to grow their platforms and better engage their userbase for lasting success? With this comprehensive dataset at your fingertips, you'll be able to answer each question - and more

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Introduction

Welcome to the Reddit's Funny Subreddit Kaggle Dataset. In this dataset you will explore and analyze posts from the popular subreddit to gain insights into community engagement. With this dataset, you can understand user engagement trends and learn how people interact with content from different topics. This guide will provide further information about how to use this dataset for your data analysis projects.

Important Columns

This datasets contains columns such as: title, score, url, comms_num (number of comments), created (date of post), body (content of post) and timestamp. All these columns are important in understanding user interactions with each post on Reddit’s Funny Subreddit.

Exploratory Data Analysis

In order to get a better understanding of user engagement on the subreddit, some initial exploration is necessary. By using graphical tools such as histograms or boxplots we can understand basic parameter values like scores or comments numbers for each post in the subreddit easily by just observing their distribution over time or through different parameters (for example: type of joke).

Dimensionality reduction

For more advanced analytics it is recommended that a dimensionality reduction technique like PCA should be used first before tackling any real analysis tasks so that similar posts can be grouped together and easier conclusions regarding them can be drawn out later on more confidently by leaving out any kind of conflicting/irrelevant variables which could cloud up any data-driven decisions taken forward at a later date if not properly accounted for early on in an appropriate manner after dimensional consolidation has been performed successfully first correctly effectively right off the bat once starting out cleanly and properly upfront accordingly throughout..

Further Guidance

If further assistance with using this dataset is required then further readings into topics like text mining, natural language processing , machine learning , etc are highly recommended where detailed explanation related to various steps which could help unlock greater value from Reddit's funny subreddits are explained elaborately hopefully giving readers or researchers ideas over what sort of approaches need being taking when it comes analyzing text-based online service platforms such as Reddit during data analytics/science related tasks

Research Ideas

Analyzing post title length vs. engagement (i.e., score, comments).

Comparing sentiment of post bodies between posts that have high/low scores and comments.

Comparing topics within the posts that have high/low scores and comments to look for any differences in content or style of writing based on engagement level

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: funny.csv | Column name | Description | |:--------------|:------------------------...

Financial Transactions Dataset for Analysis

kaggle.com

zip

Updated Jul 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Md Hossan R. (2024). Financial Transactions Dataset for Analysis [Dataset]. https://www.kaggle.com/datasets/mdhossanr/financial-transactions-dataset-for-analysis

Explore at:

zip(769156 bytes)Available download formats

Dataset updated

Jul 12, 2024

Authors

Md Hossan R.

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Synthetic Financial Transaction Dataset

This dataset contains a comprehensive collection of 37,417 synthetic financial transactions, designed to simulate a realistic and diverse range of financial activities. It includes detailed records of various transaction types, making it an ideal resource for machine learning tasks such as fraud detection, financial analysis, and predictive modeling.

Dataset Description

The dataset consists of the following columns:

TransactionID: A unique identifier for each transaction, ranging from 1 to 37,417.
AccountID: A unique identifier for each account, randomly assigned within the range of 1000 to 9999. This simulates multiple account holders and their respective transactions.
Timestamp: The date and time when the transaction occurred, randomly generated between January 1, 2016, and July 1, 2024. The timestamps are sorted in ascending order to reflect the chronological order of transactions.
TransactionType: The type of transaction, randomly selected from four categories:
- deposit: Money added to the account.
- withdrawal: Money taken out from the account.
- transfer: Money transferred between accounts.
- payment: Money paid for goods or services.
TransactionAmount: The amount of money involved in the transaction, randomly generated within the range of $1 to $5000. The amounts are rounded to two decimal places to mimic real-world financial data.
AccountBalance: The balance of the account after the transaction, randomly generated within the range of $0 to $100,000. This field provides a snapshot of the account's financial status after each transaction.

Sample Data

TransactionID	AccountID	Timestamp	TransactionType	TransactionAmount	AccountBalance
0	16633	2016-01-01 03:47:23	transfer	2446.41	96273.47
1	23660	2016-01-01 04:20:25	transfer	2640.83	98629.95
2	11806	2016-01-01 05:12:44	withdrawal	574.82	65602.63
3	27498	2016-01-01 05:48:42	payment	1740.12	81461.66
4	9345	2016-01-01 06:26:04	transfer	292.43	18084.81

Applications

This dataset can be utilized for various machine learning and data analysis tasks, including but not limited to: - Fraud Detection: Identifying unusual patterns and anomalies in transaction behavior that may indicate fraudulent activity. - Financial Analysis: Analyzing transaction trends, account balances, and transaction types to gain insights into financial behavior. - Predictive Modeling: Developing models to predict future transactions, account balances, and potential risks.

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Reddit: /r/Art [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-online-art-trends-with-reddit-posting/discussion?sort=undefined

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

zip(84621 bytes)Available download formats

Dataset updated

Dec 17, 2022

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

By Reddit [source]

About this dataset

This dataset offers an in-depth exploration of the artistic world of Reddit, with a focus on the posts available on the website. By examining the titles, scores, ID's, URLs, comments, creation dates and timestamps associated with each post about art on Reddit, researchers can gain invaluable insight into how art enthusiasts share their work and build networks within this platform. Through analyzing this data we can understand what sorts of topics attract more attention from viewers and how members interact with one another in online discussions. Moreover, this dataset has potential to explore some of the larger underlying issues that shape art communities today - from examining production trends to better understanding consumption patterns. Overall, this comprehensive dataset is an essential resource for those aiming to analyze and comprehend digital spaces where art is circulated and discussed - giving unique insight into how ideas are created and promoted throughout creative networks

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is an excellent source of information related to online art trends, providing comprehensive analysis of Reddit posts related to art. In this guide, we’ll discuss how you can use this dataset to gather valuable insights about the way in which art is produced and shared on the web.
First and foremost, you should start by familiarizing yourself with the columns included in the dataset. Each post contains a title, score (number of upvotes), URL, comments (number of comments), created date and timestamp. When interpreting each column individually or comparing different posts/threads, these values will provide invaluable insight into topics such as most discussed or favored content within the Reddit community.
After exploring the general features within each post/thread in your analysis it’s time to move onto more specific components such as body content (including images) and creative dates - when users began responding and interacting with content posted about a specific topic or action related item). Utilizing these variables will help researchers uncover meaningful patterns regarding how communities interact with certain types of content over longer periods of time & also give context from what type of topics are trending at any given moment when analyzing at shorter intervals.
Finally one last creative output that can stem from using this data set revolves around examining titles for common words & phrases that appear often among posts discussing similar types of artwork or other forms media production - identifying potential keywords & symbols associated across several different groups can paint a holistic picture regards what kind engagement each group desires while they engage amongst other like-minded individuals further aided by parameters presented through number scores what helps measure overall reception per submissions or individual thoughts presented in comment thread discussions among others known similar outlets available on site itself! Here's hoping utilizing these techniques may bring attention to some possible conclusions derived already exists previously undiscovered apart our eyes – good luck everyone!

Research Ideas

Analyzing topics and themes within art posts to determine what content is most popular.

Examining the score of art posts to determine how the responding audience engages with each piece.

Comparing across different subreddits to explore the ‘meta-discourse’ of topics that appear in multiple forums or platforms

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Art.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | ...

Clear search

Close search

Google apps

Main menu

Reddit: /r/Art

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data and tools for studying isograms

R Package History on CRAN

Context

Content

Column Description

Acknowledgements

myview

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Data from: Projections of Definitive Screening Designs by Dropping Columns:...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Supplement 1. R code and data files used to train and evaluate species...

Data from: Global Superstore Dataset

Reddit: /r/Damnthatsinteresting

Reddit: /r/Damnthatsinteresting

Investigating Popularity, Score and Engagement Across Subreddits

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

movies

Reddit's /r/funny Subreddit

Explore Reddit's Funny Subreddit & Analyze Community Engagement!

Quantifying Community Interaction Through Reddit Posts

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Introduction

Important Columns

Exploratory Data Analysis

Dimensionality reduction

Further Guidance

Research Ideas

Acknowledgements

License

Columns

Financial Transactions Dataset for Analysis

Synthetic Financial Transaction Dataset

Dataset Description

Sample Data

Applications

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns