13 datasets found
  1. Reddit: /r/Art

    • kaggle.com
    zip
    Updated Dec 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit: /r/Art [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-online-art-trends-with-reddit-posting/discussion?sort=undefined
    Explore at:
    zip(84621 bytes)Available download formats
    Dataset updated
    Dec 17, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reddit: /r/Art

    Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

    By Reddit [source]

    About this dataset

    This dataset offers an in-depth exploration of the artistic world of Reddit, with a focus on the posts available on the website. By examining the titles, scores, ID's, URLs, comments, creation dates and timestamps associated with each post about art on Reddit, researchers can gain invaluable insight into how art enthusiasts share their work and build networks within this platform. Through analyzing this data we can understand what sorts of topics attract more attention from viewers and how members interact with one another in online discussions. Moreover, this dataset has potential to explore some of the larger underlying issues that shape art communities today - from examining production trends to better understanding consumption patterns. Overall, this comprehensive dataset is an essential resource for those aiming to analyze and comprehend digital spaces where art is circulated and discussed - giving unique insight into how ideas are created and promoted throughout creative networks

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an excellent source of information related to online art trends, providing comprehensive analysis of Reddit posts related to art. In this guide, we’ll discuss how you can use this dataset to gather valuable insights about the way in which art is produced and shared on the web.
    First and foremost, you should start by familiarizing yourself with the columns included in the dataset. Each post contains a title, score (number of upvotes), URL, comments (number of comments), created date and timestamp. When interpreting each column individually or comparing different posts/threads, these values will provide invaluable insight into topics such as most discussed or favored content within the Reddit community.
    After exploring the general features within each post/thread in your analysis it’s time to move onto more specific components such as body content (including images) and creative dates - when users began responding and interacting with content posted about a specific topic or action related item). Utilizing these variables will help researchers uncover meaningful patterns regarding how communities interact with certain types of content over longer periods of time & also give context from what type of topics are trending at any given moment when analyzing at shorter intervals.
    Finally one last creative output that can stem from using this data set revolves around examining titles for common words & phrases that appear often among posts discussing similar types of artwork or other forms media production - identifying potential keywords & symbols associated across several different groups can paint a holistic picture regards what kind engagement each group desires while they engage amongst other like-minded individuals further aided by parameters presented through number scores what helps measure overall reception per submissions or individual thoughts presented in comment thread discussions among others known similar outlets available on site itself! Here's hoping utilizing these techniques may bring attention to some possible conclusions derived already exists previously undiscovered apart our eyes – good luck everyone!

    Research Ideas

    • Analyzing topics and themes within art posts to determine what content is most popular.
    • Examining the score of art posts to determine how the responding audience engages with each piece.
    • Comparing across different subreddits to explore the ‘meta-discourse’ of topics that appear in multiple forums or platforms

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: Art.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | ...

  2. Data and tools for studying isograms

    • figshare.com
    Updated Jul 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Jul 31, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Florian Breit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

    Label Data type Description

    isogramy int The order of isogramy, e.g. "2" is a second order isogram

    length int The length of the word in letters

    word text The actual word/isogram in ASCII

    source_pos text The Part of Speech tag from the original corpus

    count int Token count (total number of occurences)

    vol_count int Volume count (number of different sources which contain the word)

    count_per_million int Token count per million words

    vol_count_as_percent int Volume count as percentage of the total number of volumes

    is_palindrome bool Whether the word is a palindrome (1) or not (0)

    is_tautonym bool Whether the word is a tautonym (1) or not (0)

    The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

    Label

    Data type

    Description

    !total_1grams

    int

    The total number of words in the corpus

    !total_volumes

    int

    The total number of volumes (individual sources) in the corpus

    !total_isograms

    int

    The total number of isograms found in the corpus (before compacting)

    !total_palindromes

    int

    How many of the isograms found are palindromes

    !total_tautonyms

    int

    How many of the isograms found are tautonyms

    The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.

  3. R Package History on CRAN

    • kaggle.com
    zip
    Updated Jul 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heads or Tails (2022). R Package History on CRAN [Dataset]. https://www.kaggle.com/datasets/headsortails/r-package-history-on-cran/code
    Explore at:
    zip(5637913 bytes)Available download formats
    Dataset updated
    Jul 18, 2022
    Authors
    Heads or Tails
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Comprehensive R Archive Network (CRAN) is the central repository for software packages in the powerful R programming language for statistical computing. It describes itself as "a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R." If you're installing an R package in the standard way then it is provided by one of the CRAN mirrors.

    The ecosystem of R packages continues to grow at an accelerated pace, covering a multitude of aspects of statistics, machine learning, data visualisation, and many other areas. This dataset provides monthly updates of all the packages available through CRAN, as well as their release histories. Explore the evolution of the R multiverse and all of its facets through this comprehensive data.

    Content

    I'm providing 2 csv tables that describe the current set of R packages on CRAN, as well as the version history of these packages. To derive the data, I made use of the fantastic functionality of the tools package, via the CRAN_package_db function, and the equally wonderful packageRank package and its packageHistory function. The results from those function were slightly adjusted and formatted. I might add further related tables over time.

    See the associated blog post for how the data was derived, and for some ideas on how to explore this dataset.

    These are the tables contained in this dataset:

    • cran_package_overview.csv: all R packages currently available through CRAN, with (usually) 1 row per package. (At the time of the creation of this Kaggle dataset there were a few packages with 2 entries and different dependencies. Feel free to contribute some EDA investigating those.) Packages are listed in alphabetical order according to their names.

    • cran_package_history.csv: version history of virtually all packages in the previous table. This table has one row for each combination of package name and version number, which in most cases leads to multiple rows per package. Packages are listed in alphabetical order according to their names.

    I will update this dataset on a roughly monthly cadence by checking which packages have newer version in the overview table, and then replacing

    Column Description

    Table cran_package_overview.csv: I decided to simplify the large number of columns provided by CRAN and tools::CRAN_package_db into a smaller set of more focus features. All columns are formatted as strings, except for the boolean feature needs_compilation, but the date_published can be read as a ymd date:

    • package: package name following the official spelling and capitalisation. Table is sorted alphabetically according to this column.
    • version: current version.
    • depends: package depends on which other packages.
    • imports: package imports which other packages.
    • licence: the licence under which the package is distributed (e.g. GPL versions)
    • needs_compilation: boolean feature describing whether the package needs to be compiled.
    • author: package author.
    • bug_reports: where to send bugs.
    • url: where to read more.
    • date_published: when the current version of the package was published. Note: this is not the date of the initial package release. See the package history table for that.
    • description: relatively detailed description of what the package is doing.
    • title: the title and tagline of the package.

    Table cran_package_history.csv: The output of packageRank::packageHistory for each package from the overview table. Almost all of them have a match in this table, and can be matched by package and version. All columns are strings, and the date can again be parsed as a ymd date:

    • package: package name. Joins to the feature of the same name in the overview table. Table is sorted alphabetically according to this column.
    • version: historical or current package version. Also joins. Secondary sorting column within each package name.
    • date: when this version was published. Should sort in the same way as the version does.
    • repository: on CRAN or in the Archive.

    Acknowledgements

    All data is being made publicly available by the Comprehensive R Archive Network (CRAN). I'm grateful to the authors and maintainers of the packages tools and packageRank for providing the functionality to query CRAN packages smoothly and easily.

    The vignette photo is the official logo for the R language © 2016 The R Foundation. You can distribute the logo under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license...

  4. w

    myview

    • data.wu.ac.at
    Updated Dec 16, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sindhu (2015). myview [Dataset]. https://data.wu.ac.at/schema/data_kcmo_org/aG11ay1qdGk3
    Explore at:
    Dataset updated
    Dec 16, 2015
    Dataset provided by
    Sindhu
    Description

    This dataset contains basic data for each page on kcmo.gov. The data is monthly aggregate data and contains every page on the kcmo.gov domain.

    This data is pulled directly from Google Analytics into R via the RGoogleAnalytics package (https://github.com/Tatvic/RGoogleAnalytics). The data is then manipulated to change variable names (column headers) and to assign a row ID and sort them in the order page title > Year Month.

  5. g

    Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • gimi9.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
    Explore at:
    Dataset updated
    Jun 25, 2024
    Description

    This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.

  6. Data from: Projections of Definitive Screening Designs by Dropping Columns:...

    • tandf.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan R. Vazquez; Peter Goos; Eric D. Schoen (2023). Projections of Definitive Screening Designs by Dropping Columns: Selection and Evaluation [Dataset]. http://doi.org/10.6084/m9.figshare.7624412.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Alan R. Vazquez; Peter Goos; Eric D. Schoen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract–Definitive screening designs permit the study of many quantitative factors in a few runs more than twice the number of factors. In practical applications, researchers often require a design for m quantitative factors, construct a definitive screening design for more than m factors and drop the superfluous columns. This is done when the number of runs in the standard m-factor definitive screening design is considered too limited or when no standard definitive screening design (sDSD) exists for m factors. In these cases, it is common practice to arbitrarily drop the last columns of the larger design. In this article, we show that certain statistical properties of the resulting experimental design depend on the exact columns dropped and that other properties are insensitive to these columns. We perform a complete search for the best sets of 1–8 columns to drop from sDSDs with up to 24 factors. We observed the largest differences in statistical properties when dropping four columns from 8- and 10-factor definitive screening designs. In other cases, the differences are small, or even nonexistent.

  7. g

    Integration of Slurry Separation Technology & Refrigeration Units: Air...

    • gimi9.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CH4 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-ch4-8abb6/
    Explore at:
    Dataset updated
    Jun 25, 2024
    Description

    Methane concentration of biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.

  8. Supplement 1. R code and data files used to train and evaluate species...

    • wiley.figshare.com
    html
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen J. Tulowiecki; Chris P. S. Larsen (2023). Supplement 1. R code and data files used to train and evaluate species distribution models (SDMs). [Dataset]. http://doi.org/10.6084/m9.figshare.3569064.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Stephen J. Tulowiecki; Chris P. S. Larsen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File List Ecol_Monograph_supplement_code_biomod2.txt (md5: 1468e75dbf74ed624a8dce871743f924) Ecol_Monograph_supplement_code_dismo_1.txt (md555b20fbe747f7601c53d5b56a93459ea: ) Ecol_Monograph_supplement_code_dismo_2.txt (md5: a33a1745062f1bf816c3d9ec797cdd46) Ecol_Monograph_supplement_code_dismo_3.txt (md5: aff301c5ba52f04eff85e561122964c4) Ecol_Monograph_supplement_code_dismo_4.txt (md5: 244ff730dbd9da02a5439cfd95a439ca) Ecol_Monograph_supplement_code_dismo_5.txt (md5: bec6a05bf1d737b941d0a7a00bde3658) lot_line_section_with_predictors.csv (md5: 48dc1b92e2d3d3b3e4875ef0dc3b87a7) township_bt_post_with_predictors.csv (md5: 86f08554a0a65fec8065f85335aa8ec5) township_line_section_with_predictors.csv (md5: d028af68dcd8f7bca5b28e969cc5c796) biomod2_predictors.zip (md5: 7ab5a1d2ef1847fe64a47483e8220d70)

      Description
    
       This supplement contains the data and code that were used to train and evaluate species distribution models (SDMs). Included are six (6) .txt files that contain code to be run in R, and three (3) .csv files that contain the training data and evaluation data. For all files that contain code, comments are included (“#...”) to describe its functioning.
    
        There are two notes regarding the code files in this supplement. First, users seeking to recreate the results should be aware that minor edits to the code are necessary, in order to make sure all pathnames that are referenced in the code will match the locations where the user is storing the data files. Second, the presented code is for training SDMs that include Native American variables (NAVs). A few minor edits to the code would need to be made, in order to run SDMs that exclude NAVs; these edits are documented in the comments of the code files. Both edits are minor and should take little time to make.
    
        Also worth noting is the considerable processing time required to train and evaluate the models. While the “biomod2” code is highly-automated, it could still require several hours to a few days to run, on a personal computer. The “dismo” codes could take several days to one week to run properly; these codes also involve much more “manual” inputting of blocks of code into R. Alternatively, more advanced users of R could edit the code to function as a script and/or be more automated.
    
       The following is a description of each individual file.
    
    
         Ecol_Monograph_supplement_code_biomod2.txt – this file contains the code for training SDMs from the Holland Land Company (HLC) line-description (or “line section”) data, using three SDM algorithms from the “biomod2” package in R: Generalized Additive Models (GAMs), Generalized Linear Models (GLMs), and Multivariate Adaptive Regression Splines (MARS).
         Five .txt files contain additional code for training and evaluating boosted regression tree (BRT) models, using the “dismo” package in R. The code for BRT model development was broken down into five files, which must be run in succession. Note that due to the “stochastic” nature of BRT models, slightly different model results may result, in comparison to the results reported in the article.
         Ecol_Monograph_supplement_code_dismo_1.txt – this code loads the training data, and trains an initial set of BRT models. 
         Ecol_Monograph_supplement_code_dismo_2.txt – this code runs a procedure that suggests the number of variables that can be dropped from the initial set of BRT models.
         Ecol_Monograph_supplement_code_dismo_3.txt – this code creates a set of simplified BRT models with fewer variables, as determined by the previous step.
         Ecol_Monograph_supplement_code_dismo_4.txt – this code loads evaluation data, loads raster versions of predictor variables, projects models into geographic space, calculates variable importance, plots response curves, and evaluates models upon training data and evaluation data.
         Ecol_Monograph_supplement_code_dismo_5.txt – this code saves false positive rates and false negative rates for each model, when evaluated upon the training data and evaluation data.
        .csv files – these files contain the training data and evaluation data:
         lot_line_section_with_predictors.csv – this file contains the line-description data that was used to train SDMs.
         township_bt_post_with_predictors.csv – this file contains the township bearing-tree data, which was used to evaluate SDMs.
         township_line_section_with_predictors.csv – this file contains the township line-description data, which was used to evaluate SDMs.
         The township data above were used with the permission of Dr. Yi-Chen Wang. For more information regarding these datasets, see:
    
          Wang, Y.-C. 2007. Spatial patterns and vegetation-site relationships of the presettlement forests in western New York, USA. Journal of Biogeography 34:500–513.
          Tulowiecki, S. J., C. P. S. Larsen, and Y.-C. Wang. 2014. Effects of positional error on modeling species distributions: a perspective using presettlement land survey records. Plant Ecology 216:67–85. 
    
    
       The following table contains descriptions of the columns, and checksum values, for the .csv files (sorted alphabetically by column name). With the exception of the “weights” columns, the three .csv files share the same column names (but obviously with different values). The evaluation data (“township_bt_post_with_ predictors.csv” and “township_line_section_with_predictors.csv”) do not contain case weight columns, because case weights were only used when training models using the training data (“lot_line_section_with_ predictors.csv”). There are no blank cell values in these .csv files.
        -- TABLE: Please see in attached file. --
    
        biomod2_predictors.zip – this zipped file contains the predictor variables in raster format (coordinate system: UTM Zone 17N) that were used to project SDMs into geographic space, in order to train SDMs and create prediction surfaces.
    
  9. Data from: Global Superstore Dataset

    • kaggle.com
    zip
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih İlhan (2023). Global Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset
    Explore at:
    zip(3349507 bytes)Available download formats
    Dataset updated
    Nov 16, 2023
    Authors
    Fatih İlhan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    About this file The Kaggle Global Superstore dataset is a comprehensive dataset containing information about sales and orders in a global superstore. It is a valuable resource for data analysis and visualization tasks. This dataset has been processed and transformed from its original format (txt) to CSV using the R programming language. The original dataset is available here, and the transformed CSV file used in this analysis can be found here.

    Here is a description of the columns in the dataset:

    category: The category of products sold in the superstore.

    city: The city where the order was placed.

    country: The country in which the superstore is located.

    customer_id: A unique identifier for each customer.

    customer_name: The name of the customer who placed the order.

    discount: The discount applied to the order.

    market: The market or region where the superstore operates.

    ji_lu_shu: An unknown or unspecified column.

    order_date: The date when the order was placed.

    order_id: A unique identifier for each order.

    order_priority: The priority level of the order.

    product_id: A unique identifier for each product.

    product_name: The name of the product.

    profit: The profit generated from the order.

    quantity: The quantity of products ordered.

    region: The region where the order was placed.

    row_id: A unique identifier for each row in the dataset.

    sales: The total sales amount for the order.

    segment: The customer segment (e.g., consumer, corporate, or home office).

    ship_date: The date when the order was shipped.

    ship_mode: The shipping mode used for the order.

    shipping_cost: The cost of shipping for the order.

    state: The state or region within the country.

    sub_category: The sub-category of products within the main category.

    year: The year in which the order was placed.

    market2: Another column related to market information.

    weeknum: The week number when the order was placed.

    This dataset can be used for various data analysis tasks, including understanding sales patterns, customer behavior, and profitability in the context of a global superstore.

  10. Reddit: /r/Damnthatsinteresting

    • kaggle.com
    zip
    Updated Dec 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit: /r/Damnthatsinteresting [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-the-power-of-user-engagement-on-damnth
    Explore at:
    zip(139409 bytes)Available download formats
    Dataset updated
    Dec 18, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reddit: /r/Damnthatsinteresting

    Investigating Popularity, Score and Engagement Across Subreddits

    By Reddit [source]

    About this dataset

    This dataset provides valuable insights into user engagement and popularity across the subreddit Damnthatsinteresting. With detailed metrics on various discussions such as the title, score, id, URL, comments, created date and time, body and timestamp of each discussion. This dataset opens a window into the world of user interaction on Reddit by letting researchers align their questions with data-driven results to understand social media behavior. Gain an understanding of what drives people to engage in certain conversations as well as why certain topics become trending phenomena – it’s all here for analysis. Enjoy exploring this fascinating collection of information about Reddit users' activities!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides valuable insights into user engagement and the impact of users interactions on the popular subreddit DamnThatsInteresting. Exploring this dataset can help uncover trends in participation, what content is resonating with viewers, and how different users are engaging with each other. In order to get the most out of this dataset, you will need to understand its structure in order to explore and extract meaningful insights. The columns provided include: title, score, url, comms_num, created date/time (created), body and timestamp.

    Research Ideas

    • Analyzing the impact of user comments on the popularity and engagement of discussions
    • Examining trends in user behavior over time to gain insight into popular topics of discussion
    • Investigating which discussions reach higher levels of score, popularity or engagement to identify successful strategies for engaging users

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: Damnthatsinteresting.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------| | title | The title of the discussion thread. (String) | | score | The number of upvotes the discussion has received from users. (Integer) | | url | The URL link for the discussion thread itself. (String) | | comms_num | The number of comments made on a particular discussion. (Integer) | | created | The date and time when the discussion was first created on Reddit by its original poster (OP). (DateTime) | | body | Full content including text body with rich media embedded within posts such as images/videos etc. (String) | | timestamp | When was last post updated by any particular user. (DateTime) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.

  11. movies

    • kaggle.com
    zip
    Updated Mar 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vinay malik (2023). movies [Dataset]. https://www.kaggle.com/datasets/vinaymalik06/movies/discussion?sort=undefined
    Explore at:
    zip(1459362 bytes)Available download formats
    Dataset updated
    Mar 9, 2023
    Authors
    vinay malik
    Description

    The Kaggle Movies dataset is available in CSV format and consists of one file: "movies.csv".

    The file contains data on over 10,000 movies and includes fields such as title, release date, director, cast, genre, language, budget, revenue, and rating. The file is approximately 3 MB in size and can be easily imported into popular data analysis tools such as Excel, Python, R, and Tableau.

    The data is organized into rows and columns, with each row representing a single movie and each column representing a specific attribute of the movie. The file contains a header row that provides a description of each column.

    The file has been cleaned and processed to remove any duplicates or inconsistencies. However, the data is provided as-is, without any warranties or guarantees of accuracy or completeness.

    The "movies.csv" file in the Kaggle Movies dataset includes the following columns:

    id: The unique identifier for each movie. title: The title of the movie. overview: A brief summary of the movie. release_date: The date when the movie was released (in YYYY-MM-DD format). Popularity: A numerical score indicating the relative popularity of each movie, based on factors such as user ratings, social media mentions, and box office performance. Vote Average: The average rating given to the movie by users of the IMDb website (on a scale of 0-10). Vote Count: The number of ratings given to the movie by users of the IMDb website.

  12. Reddit's /r/funny Subreddit

    • kaggle.com
    zip
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit's /r/funny Subreddit [Dataset]. https://www.kaggle.com/datasets/thedevastator/explore-reddit-s-funny-subreddit-analyze-communi/code
    Explore at:
    zip(93052 bytes)Available download formats
    Dataset updated
    Dec 15, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Explore Reddit's Funny Subreddit & Analyze Community Engagement!

    Quantifying Community Interaction Through Reddit Posts

    By Reddit [source]

    About this dataset

    This dataset offers an insightful analysis into one of the most talked-about online communities today: Reddit. Specifically, we are focusing on the funny subreddit, a subsection of the main forum that enjoys the highest engagement across all Reddit users. Not only does this dataset include post titles, scores and other details regarding post creation and engagement; it also includes powerful metrics to measure active community interaction such as comment numbers and timestamps. By diving deep into this data, we can paint a fuller picture in terms of what people find funny in our digital age - how well do certain topics draw responses? How does sentiment change over time? And how can community managers use these insights to grow their platforms and better engage their userbase for lasting success? With this comprehensive dataset at your fingertips, you'll be able to answer each question - and more

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Introduction

    Welcome to the Reddit's Funny Subreddit Kaggle Dataset. In this dataset you will explore and analyze posts from the popular subreddit to gain insights into community engagement. With this dataset, you can understand user engagement trends and learn how people interact with content from different topics. This guide will provide further information about how to use this dataset for your data analysis projects.

    Important Columns

    This datasets contains columns such as: title, score, url, comms_num (number of comments), created (date of post), body (content of post) and timestamp. All these columns are important in understanding user interactions with each post on Reddit’s Funny Subreddit.

    Exploratory Data Analysis

    In order to get a better understanding of user engagement on the subreddit, some initial exploration is necessary. By using graphical tools such as histograms or boxplots we can understand basic parameter values like scores or comments numbers for each post in the subreddit easily by just observing their distribution over time or through different parameters (for example: type of joke).

    Dimensionality reduction

    For more advanced analytics it is recommended that a dimensionality reduction technique like PCA should be used first before tackling any real analysis tasks so that similar posts can be grouped together and easier conclusions regarding them can be drawn out later on more confidently by leaving out any kind of conflicting/irrelevant variables which could cloud up any data-driven decisions taken forward at a later date if not properly accounted for early on in an appropriate manner after dimensional consolidation has been performed successfully first correctly effectively right off the bat once starting out cleanly and properly upfront accordingly throughout..

    Further Guidance

    If further assistance with using this dataset is required then further readings into topics like text mining, natural language processing , machine learning , etc are highly recommended where detailed explanation related to various steps which could help unlock greater value from Reddit's funny subreddits are explained elaborately hopefully giving readers or researchers ideas over what sort of approaches need being taking when it comes analyzing text-based online service platforms such as Reddit during data analytics/science related tasks

    Research Ideas

    • Analyzing post title length vs. engagement (i.e., score, comments).
    • Comparing sentiment of post bodies between posts that have high/low scores and comments.
    • Comparing topics within the posts that have high/low scores and comments to look for any differences in content or style of writing based on engagement level

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: funny.csv | Column name | Description | |:--------------|:------------------------...

  13. Financial Transactions Dataset for Analysis

    • kaggle.com
    zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Hossan R. (2024). Financial Transactions Dataset for Analysis [Dataset]. https://www.kaggle.com/datasets/mdhossanr/financial-transactions-dataset-for-analysis
    Explore at:
    zip(769156 bytes)Available download formats
    Dataset updated
    Jul 12, 2024
    Authors
    Md Hossan R.
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Synthetic Financial Transaction Dataset

    This dataset contains a comprehensive collection of 37,417 synthetic financial transactions, designed to simulate a realistic and diverse range of financial activities. It includes detailed records of various transaction types, making it an ideal resource for machine learning tasks such as fraud detection, financial analysis, and predictive modeling.

    Dataset Description

    The dataset consists of the following columns:

    1. TransactionID: A unique identifier for each transaction, ranging from 1 to 37,417.

    2. AccountID: A unique identifier for each account, randomly assigned within the range of 1000 to 9999. This simulates multiple account holders and their respective transactions.

    3. Timestamp: The date and time when the transaction occurred, randomly generated between January 1, 2016, and July 1, 2024. The timestamps are sorted in ascending order to reflect the chronological order of transactions.

    4. TransactionType: The type of transaction, randomly selected from four categories:

      • deposit: Money added to the account.
      • withdrawal: Money taken out from the account.
      • transfer: Money transferred between accounts.
      • payment: Money paid for goods or services.
    5. TransactionAmount: The amount of money involved in the transaction, randomly generated within the range of $1 to $5000. The amounts are rounded to two decimal places to mimic real-world financial data.

    6. AccountBalance: The balance of the account after the transaction, randomly generated within the range of $0 to $100,000. This field provides a snapshot of the account's financial status after each transaction.

    Sample Data

    TransactionIDAccountIDTimestampTransactionTypeTransactionAmountAccountBalance
    0166332016-01-01 03:47:23transfer2446.4196273.47
    1236602016-01-01 04:20:25transfer2640.8398629.95
    2118062016-01-01 05:12:44withdrawal574.8265602.63
    3274982016-01-01 05:48:42payment1740.1281461.66
    493452016-01-01 06:26:04transfer292.4318084.81

    Applications

    This dataset can be utilized for various machine learning and data analysis tasks, including but not limited to: - Fraud Detection: Identifying unusual patterns and anomalies in transaction behavior that may indicate fraudulent activity. - Financial Analysis: Analyzing transaction trends, account balances, and transaction types to gain insights into financial behavior. - Predictive Modeling: Developing models to predict future transactions, account balances, and potential risks.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2022). Reddit: /r/Art [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-online-art-trends-with-reddit-posting/discussion?sort=undefined
Organization logo

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
zip(84621 bytes)Available download formats
Dataset updated
Dec 17, 2022
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Reddit: /r/Art

Examining Content by Title, Score, ID, URL, Comments, Create Date, and Timestamp

By Reddit [source]

About this dataset

This dataset offers an in-depth exploration of the artistic world of Reddit, with a focus on the posts available on the website. By examining the titles, scores, ID's, URLs, comments, creation dates and timestamps associated with each post about art on Reddit, researchers can gain invaluable insight into how art enthusiasts share their work and build networks within this platform. Through analyzing this data we can understand what sorts of topics attract more attention from viewers and how members interact with one another in online discussions. Moreover, this dataset has potential to explore some of the larger underlying issues that shape art communities today - from examining production trends to better understanding consumption patterns. Overall, this comprehensive dataset is an essential resource for those aiming to analyze and comprehend digital spaces where art is circulated and discussed - giving unique insight into how ideas are created and promoted throughout creative networks

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is an excellent source of information related to online art trends, providing comprehensive analysis of Reddit posts related to art. In this guide, we’ll discuss how you can use this dataset to gather valuable insights about the way in which art is produced and shared on the web.
First and foremost, you should start by familiarizing yourself with the columns included in the dataset. Each post contains a title, score (number of upvotes), URL, comments (number of comments), created date and timestamp. When interpreting each column individually or comparing different posts/threads, these values will provide invaluable insight into topics such as most discussed or favored content within the Reddit community.
After exploring the general features within each post/thread in your analysis it’s time to move onto more specific components such as body content (including images) and creative dates - when users began responding and interacting with content posted about a specific topic or action related item). Utilizing these variables will help researchers uncover meaningful patterns regarding how communities interact with certain types of content over longer periods of time & also give context from what type of topics are trending at any given moment when analyzing at shorter intervals.
Finally one last creative output that can stem from using this data set revolves around examining titles for common words & phrases that appear often among posts discussing similar types of artwork or other forms media production - identifying potential keywords & symbols associated across several different groups can paint a holistic picture regards what kind engagement each group desires while they engage amongst other like-minded individuals further aided by parameters presented through number scores what helps measure overall reception per submissions or individual thoughts presented in comment thread discussions among others known similar outlets available on site itself! Here's hoping utilizing these techniques may bring attention to some possible conclusions derived already exists previously undiscovered apart our eyes – good luck everyone!

Research Ideas

  • Analyzing topics and themes within art posts to determine what content is most popular.
  • Examining the score of art posts to determine how the responding audience engages with each piece.
  • Comparing across different subreddits to explore the ‘meta-discourse’ of topics that appear in multiple forums or platforms

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Art.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | ...

Search
Clear search
Close search
Google apps
Main menu