100+ datasets found
  1. Kaggle Getting Started Discussion

    • kaggle.com
    zip
    Updated Mar 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Chauhan (2022). Kaggle Getting Started Discussion [Dataset]. https://www.kaggle.com/datasets/lazrus/kaggle-getting-started-discussion
    Explore at:
    zip(81195 bytes)Available download formats
    Dataset updated
    Mar 13, 2022
    Authors
    Yash Chauhan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    A While back I was going through the Discussion section of Kaggle here. and whilst looking at the discussions with various number of upvotes, a question was seeded inside my brain that would eventually stay there rent free for about 2 weeks. "What makes a good discussion" or specifically is there a conceivable pattern that can explain which discussion would be upvoted more often than others.

    And such is the bug of curiosity that I have to feed, I ended up scrapping top 2000(Based on Upvotes) Discussion to analyze and play with.

    This dataset contains any information I thought could factor into popularity of a Discussion from the point of view of both the users and algorithms of kaggle.

    If there is anyone out there with the same question lingering in their mind, this data is for you.

    Content

    The dataset contains information from the discussion itself like title, number of comments, elapsed time, upvotes as well as information about the author of the said discussion like author tier, medals, author discussion tier.

    What the dataset doesn't have is the actual text of the discussion, That is because, The aim is to find things that attracts the reader to open a discussion rather than liking it, at least for now. Incase you want to try something with the full text, I have included the links to scrapped discussion you can just scrape the full text yourself and add it in for further analyses.

    Format: CSV; Kaggle_Getting_Started_Discussion_2k.csv

    Columns:

    1. Author_Name: Name of the Author of the discussion\
    2. Title: Title of the discussion
    3. Elapsed_Time: Time elapsed from when the discussion was published
    4. Num_comments: Number of comments on the discussion
    5. Number_of_Upvotes: Number of upvotes
    6. Author_Tier: The highest tier of the author
    7. Author_Discussion_Tier: The Discussion tier of the author
    8. Num_Followers: Number of followers of the author
    9. Discussion_Golds: Discussion gold medal gained by the author
    10. Discussion_Silvers: Discussion silver medal gained by the author
    11. Discussion_Bronze: Discussion bronze medal gained by the author
    12. Current_Discussion_Rank: The current discussion rank of the author
    13. Highest_Discussion_Rank: The highest discussion rank of the author
    14. Discussion_link: Link to the discussion

    Inspiration

    Lets find what makes the kaggle discussion forum tick.

  2. h

    kaggle-nlp-getting-start

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hui, kaggle-nlp-getting-start [Dataset]. https://huggingface.co/datasets/gdwangh/kaggle-nlp-getting-start
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    hui
    Description

    Dataset Summary

    Natural Language Processing with Disaster Tweets: https://www.kaggle.com/competitions/nlp-getting-started/data This particular challenge is perfect for data scientists looking to get started with Natural Language Processing. The competition dataset is not too big, and even if you don’t have much personal computing power, you can do all of the work in our free, no-setup, Jupyter Notebooks environment called Kaggle Notebooks.

    Columns

    id - a unique identifier for each tweet… See the full description on the dataset page: https://huggingface.co/datasets/gdwangh/kaggle-nlp-getting-start.

  3. Titanic Competition

    • kaggle.com
    zip
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Togae Noh (2021). Titanic Competition [Dataset]. https://www.kaggle.com/datasets/togaenoh/titanic-competition
    Explore at:
    zip(34877 bytes)Available download formats
    Dataset updated
    Jul 19, 2021
    Authors
    Togae Noh
    Description

    Dataset

    This dataset was created by Togae Noh

    Contents

  4. home data for ml course

    • kaggle.com
    zip
    Updated Jan 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DanB (2019). home data for ml course [Dataset]. https://www.kaggle.com/datasets/dansbecker/home-data-for-ml-course/discussion
    Explore at:
    zip(96211 bytes)Available download formats
    Dataset updated
    Jan 23, 2019
    Authors
    DanB
    Description

    Dataset

    This dataset was created by DanB

    Contents

  5. R

    Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Oct 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    k (2022). Kaggle Dataset [Dataset]. https://universe.roboflow.com/k-5hqao/kaggle-wlshw
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 2, 2022
    Dataset authored and provided by
    k
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    K Bounding Boxes
    Description

    Kaggle

    ## Overview
    
    Kaggle is a dataset for object detection tasks - it contains K annotations for 779 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  6. Kaggle Annotation Dataset

    • universe.roboflow.com
    zip
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaggle annotate (2025). Kaggle Annotation Dataset [Dataset]. https://universe.roboflow.com/kaggle-annotate/kaggle-annotation
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    kaggle annotate
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    Kaggle Annotation

    ## Overview
    
    Kaggle Annotation is a dataset for object detection tasks - it contains Objects annotations for 965 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. Machine Learning Basics for Beginners🤖🧠

    • kaggle.com
    zip
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanupratap Biswas (2023). Machine Learning Basics for Beginners🤖🧠 [Dataset]. https://www.kaggle.com/datasets/bhanupratapbiswas/machine-learning-basics-for-beginners
    Explore at:
    zip(492015 bytes)Available download formats
    Dataset updated
    Jun 22, 2023
    Authors
    Bhanupratap Biswas
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Sure! I'd be happy to provide you with an introduction to machine learning basics for beginners. Machine learning is a subfield of artificial intelligence (AI) that focuses on enabling computers to learn and make predictions or decisions without being explicitly programmed. Here are some key concepts and terms to help you get started:

    1. Supervised Learning: In supervised learning, the machine learning algorithm learns from labeled training data. The training data consists of input examples and their corresponding correct output or target values. The algorithm learns to generalize from this data and make predictions or classify new, unseen examples.

    2. Unsupervised Learning: Unsupervised learning involves learning patterns and relationships from unlabeled data. Unlike supervised learning, there are no target values provided. Instead, the algorithm aims to discover inherent structures or clusters in the data.

    3. Training Data and Test Data: Machine learning models require a dataset to learn from. The dataset is typically split into two parts: the training data and the test data. The model learns from the training data, and the test data is used to evaluate its performance and generalization ability.

    4. Features and Labels: In supervised learning, the input examples are often represented by features or attributes. For example, in a spam email classification task, features might include the presence of certain keywords or the length of the email. The corresponding output or target values are called labels, indicating the class or category to which the example belongs (e.g., spam or not spam).

    5. Model Evaluation Metrics: To assess the performance of a machine learning model, various evaluation metrics are used. Common metrics include accuracy (the proportion of correctly predicted examples), precision (the proportion of true positives among all positive predictions), recall (the proportion of true positives predicted correctly), and F1 score (a combination of precision and recall).

    6. Overfitting and Underfitting: Overfitting occurs when a model becomes too complex and learns to memorize the training data instead of generalizing well to unseen examples. On the other hand, underfitting happens when a model is too simple and fails to capture the underlying patterns in the data. Balancing the complexity of the model is crucial to achieve good generalization.

    7. Feature Engineering: Feature engineering involves selecting or creating relevant features that can help improve the performance of a machine learning model. It often requires domain knowledge and creativity to transform raw data into a suitable representation that captures the important information.

    8. Bias and Variance Trade-off: The bias-variance trade-off is a fundamental concept in machine learning. Bias refers to the errors introduced by the model's assumptions and simplifications, while variance refers to the model's sensitivity to small fluctuations in the training data. Reducing bias may increase variance and vice versa. Finding the right balance is important for building a well-performing model.

    9. Supervised Learning Algorithms: There are various supervised learning algorithms, including linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), and neural networks. Each algorithm has its own strengths, weaknesses, and specific use cases.

    10. Unsupervised Learning Algorithms: Unsupervised learning algorithms include clustering algorithms like k-means clustering and hierarchical clustering, dimensionality reduction techniques like principal component analysis (PCA) and t-SNE, and anomaly detection algorithms, among others.

    These concepts provide a starting point for understanding the basics of machine learning. As you delve deeper, you can explore more advanced topics such as deep learning, reinforcement learning, and natural language processing. Remember to practice hands-on with real-world datasets to gain practical experience and further refine your skills.

  8. R

    Kaggle Fish Detection Dataset

    • universe.roboflow.com
    zip
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Innoweave (2024). Kaggle Fish Detection Dataset [Dataset]. https://universe.roboflow.com/innoweave/kaggle-fish-detection-o8ghb
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 27, 2024
    Dataset authored and provided by
    Innoweave
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Fish Polygons
    Description

    Kaggle Fish Detection

    ## Overview
    
    Kaggle Fish Detection is a dataset for instance segmentation tasks - it contains Fish annotations for 2,208 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. R

    Kaggle Conversion Dataset

    • universe.roboflow.com
    zip
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aasd (2025). Kaggle Conversion Dataset [Dataset]. https://universe.roboflow.com/aasd-e8cun/kaggle-dataset-conversion
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 13, 2025
    Dataset authored and provided by
    aasd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cars Swimminpool Bounding Boxes
    Description

    Kaggle Dataset Conversion

    ## Overview
    
    Kaggle Dataset Conversion is a dataset for object detection tasks - it contains Cars Swimminpool annotations for 3,744 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  10. R

    Iranian Plate From Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Dec 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BarzanSaeedpour (2023). Iranian Plate From Kaggle Dataset [Dataset]. https://universe.roboflow.com/barzansaeedpour/iranian-plate-from-kaggle
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 9, 2023
    Dataset authored and provided by
    BarzanSaeedpour
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iran
    Variables measured
    Plate REgl Bounding Boxes
    Description

    Iranian Plate From Kaggle

    ## Overview
    
    Iranian Plate From Kaggle is a dataset for object detection tasks - it contains Plate REgl annotations for 433 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. R

    Kaggle Wheat Detection Dataset

    • universe.roboflow.com
    zip
    Updated Jan 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tyler Chinn (2023). Kaggle Wheat Detection Dataset [Dataset]. https://universe.roboflow.com/tyler-chinn-xnddb/kaggle-wheat-detection-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 28, 2023
    Dataset authored and provided by
    Tyler Chinn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Wheat Bounding Boxes
    Description

    Kaggle Wheat Detection Dataset

    ## Overview
    
    Kaggle Wheat Detection Dataset is a dataset for object detection tasks - it contains Wheat annotations for 3,373 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  12. Practice Panda and Dictionary

    • kaggle.com
    zip
    Updated Sep 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brenda N (2020). Practice Panda and Dictionary [Dataset]. https://www.kaggle.com/brendan45774/dictionary-and-pandas-csv
    Explore at:
    zip(316 bytes)Available download formats
    Dataset updated
    Sep 4, 2020
    Authors
    Brenda N
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If you need a practice dataset to improve your skills then use this dataset. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2681031%2F9dfecc01e0d719b732e69389b592de91%2Fpython%20panda2.jpg?generation=1599246568164860&alt=media" alt="">

    I created this dataset for my notebook Getting started with Dictionary and Pandas. To help people improve their dictionary and panda skills. https://www.kaggle.com/brendan45774/getting-started-with-dictionary-and-pandas

  13. R

    Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ML (2023). Kaggle Dataset [Dataset]. https://universe.roboflow.com/ml-pzfty/kaggle-pe7zv/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 8, 2023
    Dataset authored and provided by
    ML
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vest Helmet Bounding Boxes
    Description

    KAGGLE

    ## Overview
    
    KAGGLE is a dataset for object detection tasks - it contains Vest Helmet annotations for 2,696 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. R

    Mask Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Nov 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mask Detection (2021). Mask Kaggle Dataset [Dataset]. https://universe.roboflow.com/mask-detection-jdtze/mask-kaggle-du4ji
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 13, 2021
    Dataset authored and provided by
    Mask Detection
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Faces Bounding Boxes
    Description

    Mask Kaggle

    ## Overview
    
    Mask Kaggle is a dataset for object detection tasks - it contains Faces annotations for 848 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. R

    Tomato Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Jan 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    new-workspace-zioyq (2022). Tomato Kaggle Dataset [Dataset]. https://universe.roboflow.com/new-workspace-zioyq/tomato-kaggle
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 14, 2022
    Dataset authored and provided by
    new-workspace-zioyq
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Tomatoes Bounding Boxes
    Description

    Tomato Kaggle

    ## Overview
    
    Tomato Kaggle is a dataset for object detection tasks - it contains Tomatoes annotations for 895 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. Getting Started Power BI

    • kaggle.com
    zip
    Updated Oct 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shwetank Chaudhary (2022). Getting Started Power BI [Dataset]. https://www.kaggle.com/datasets/shwetankchaudhary/getting-started-power-bi
    Explore at:
    zip(3635408 bytes)Available download formats
    Dataset updated
    Oct 21, 2022
    Authors
    Shwetank Chaudhary
    Description

    Dataset

    This dataset was created by Shwetank Chaudhary

    Contents

  17. R

    Daun Jagung (kaggle) Dataset

    • universe.roboflow.com
    zip
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilham syah (2025). Daun Jagung (kaggle) Dataset [Dataset]. https://universe.roboflow.com/ilham-syah/daun-jagung-kaggle-g2pha
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset authored and provided by
    Ilham syah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects DoCx Bounding Boxes
    Description

    Daun Jagung (kaggle)

    ## Overview
    
    Daun Jagung (kaggle) is a dataset for object detection tasks - it contains Objects DoCx annotations for 2,690 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. R

    Potholes Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Apr 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Schellinger (2022). Potholes Kaggle Dataset [Dataset]. https://universe.roboflow.com/kevin-schellinger-48g9l/potholes-kaggle
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 4, 2022
    Dataset authored and provided by
    Kevin Schellinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Pothole Bounding Boxes
    Description

    Potholes Kaggle

    ## Overview
    
    Potholes Kaggle is a dataset for object detection tasks - it contains Pothole annotations for 647 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. R

    Labeled Licence Plates Kaggle Dataset

    • universe.roboflow.com
    zip
    Updated Jul 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    folio3 (2022). Labeled Licence Plates Kaggle Dataset [Dataset]. https://universe.roboflow.com/folio3-wjauw/labeled-licence-plates-dataset---kaggle/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset authored and provided by
    folio3
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Number Plate Bounding Boxes
    Description

    Labeled Licence Plates Dataset Kaggle

    ## Overview
    
    Labeled Licence Plates Dataset  Kaggle is a dataset for object detection tasks - it contains Number Plate annotations for 702 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  20. tpu-getting-started

    • kaggle.com
    zip
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxim Kosarev (2023). tpu-getting-started [Dataset]. https://www.kaggle.com/datasets/maximkosarev/tpu-getting-started/code
    Explore at:
    zip(5141132445 bytes)Available download formats
    Dataset updated
    Dec 28, 2023
    Authors
    Maxim Kosarev
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Maxim Kosarev

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yash Chauhan (2022). Kaggle Getting Started Discussion [Dataset]. https://www.kaggle.com/datasets/lazrus/kaggle-getting-started-discussion
Organization logo

Kaggle Getting Started Discussion

Top 2000 Discussion (Getting Started) on Kaggle.

Explore at:
zip(81195 bytes)Available download formats
Dataset updated
Mar 13, 2022
Authors
Yash Chauhan
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

A While back I was going through the Discussion section of Kaggle here. and whilst looking at the discussions with various number of upvotes, a question was seeded inside my brain that would eventually stay there rent free for about 2 weeks. "What makes a good discussion" or specifically is there a conceivable pattern that can explain which discussion would be upvoted more often than others.

And such is the bug of curiosity that I have to feed, I ended up scrapping top 2000(Based on Upvotes) Discussion to analyze and play with.

This dataset contains any information I thought could factor into popularity of a Discussion from the point of view of both the users and algorithms of kaggle.

If there is anyone out there with the same question lingering in their mind, this data is for you.

Content

The dataset contains information from the discussion itself like title, number of comments, elapsed time, upvotes as well as information about the author of the said discussion like author tier, medals, author discussion tier.

What the dataset doesn't have is the actual text of the discussion, That is because, The aim is to find things that attracts the reader to open a discussion rather than liking it, at least for now. Incase you want to try something with the full text, I have included the links to scrapped discussion you can just scrape the full text yourself and add it in for further analyses.

Format: CSV; Kaggle_Getting_Started_Discussion_2k.csv

Columns:

  1. Author_Name: Name of the Author of the discussion\
  2. Title: Title of the discussion
  3. Elapsed_Time: Time elapsed from when the discussion was published
  4. Num_comments: Number of comments on the discussion
  5. Number_of_Upvotes: Number of upvotes
  6. Author_Tier: The highest tier of the author
  7. Author_Discussion_Tier: The Discussion tier of the author
  8. Num_Followers: Number of followers of the author
  9. Discussion_Golds: Discussion gold medal gained by the author
  10. Discussion_Silvers: Discussion silver medal gained by the author
  11. Discussion_Bronze: Discussion bronze medal gained by the author
  12. Current_Discussion_Rank: The current discussion rank of the author
  13. Highest_Discussion_Rank: The highest discussion rank of the author
  14. Discussion_link: Link to the discussion

Inspiration

Lets find what makes the kaggle discussion forum tick.

Search
Clear search
Close search
Google apps
Main menu