100+ datasets found
  1. SSH login attempts on a Raspberry Pi

    • kaggle.com
    Updated Mar 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas (2022). SSH login attempts on a Raspberry Pi [Dataset]. https://www.kaggle.com/datasets/booroom/ssh-login-attempts-on-my-raspberry-pi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Thomas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I was scared when I saw the large number of connection attempts to my little raspberry pi. I had the idea to share some of them with you. Small mistake on my part, I can't get the exact date (month, year).

    Content

    The csv file contains : - The month - The hour when the login attempt is - The username which is used - The IP address used - The port which is used

    Inspiration

    What can you tell me about all these connections? Where do they come from? What are the most used usernames? Are there days when it is better to cut off the internet? At what time are the bots most active? Which port do I have to use?

  2. Data from: Spam email Dataset

    • kaggle.com
    Updated Sep 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    _w1998 (2023). Spam email Dataset [Dataset]. https://www.kaggle.com/datasets/jackksoncsie/spam-email-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    _w1998
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Dataset Name: Spam Email Dataset

    Description: This dataset contains a collection of email text messages, labeled as either spam or not spam. Each email message is associated with a binary label, where "1" indicates that the email is spam, and "0" indicates that it is not spam. The dataset is intended for use in training and evaluating spam email classification models.

    Columns:

    text (Text): This column contains the text content of the email messages. It includes the body of the emails along with any associated subject lines or headers.

    spam_or_not (Binary): This column contains binary labels to indicate whether an email is spam or not. "1" represents spam, while "0" represents not spam.

    Usage: This dataset can be used for various Natural Language Processing (NLP) tasks, such as text classification and spam detection. Researchers and data scientists can train and evaluate machine learning models using this dataset to build effective spam email filters.

  3. Daily website users

    • kaggle.com
    Updated Feb 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bertie (2022). Daily website users [Dataset]. https://www.kaggle.com/bertiemackie/daily-website-users
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bertie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data set contains the number of unique customers who logged in to their accounts on a website. The value column shows this count.

    Potential use cases; - timeseries modelling - in month targeting

  4. Iris Species

    • kaggle.com
    zip
    Updated Sep 27, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2016). Iris Species [Dataset]. https://www.kaggle.com/datasets/uciml/iris
    Explore at:
    zip(3687 bytes)Available download formats
    Dataset updated
    Sep 27, 2016
    Dataset authored and provided by
    UCI Machine Learning
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

    It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

    The columns in this dataset are:

    • Id
    • SepalLengthCm
    • SepalWidthCm
    • PetalLengthCm
    • PetalWidthCm
    • Species

    Sepal Width vs. Sepal Length

  5. Pii-Mistral-2k-fit-competition

    • kaggle.com
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silvestre Bahi (2024). Pii-Mistral-2k-fit-competition [Dataset]. https://www.kaggle.com/datasets/mandrilator/pii-mistral-2k-fit-competition
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Silvestre Bahi
    Description

    Probabilities of getting a certain label (On v2, v1 are reverted): - name = 0.8 - email = 0.5 - phone_num = 0.3 - address= 0.3 - url= 0.5 - username= 0.5

    The subject of the essay varies with the same probability among the following. I precised it was a design thinking tool: - "Visualization tool", - "Storytelling tool", - "Mind Mapping tool", - "Learning launch tool"

    Model: - mistralai/Mistral-7B-Instruct-v0.2

    Warnings: - I-USERNAME and I-EMAIL appear to be in the dataset - Some addresses and other entities can be split by some punctuation

  6. Cost of Living | +144k Tweets - ENG | Aug/Sep 2022

    • kaggle.com
    Updated Sep 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tleonel (2022). Cost of Living | +144k Tweets - ENG | Aug/Sep 2022 [Dataset]. http://doi.org/10.34740/kaggle/ds/2438280
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tleonel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    💸💸💸 Cost of Living - 144k Tweets in English | Aug - Sept 2022 💸💸💸

    UPDATED Sept 9th

    The cost of living is a scorching topic. This dataset is composed of tweets sent from August 20 to Sept 9 2022, with over 144k tweets. All tweets are in English and are from different countries. Below is a breakdown of columns and the data in them.

    https://images.unsplash.com/photo-1553729459-efe14ef6055d?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=1770&q=80" alt="">

    Columns Description

    • [x] date_time - Date and Time tweet was sent
    • [x] username - Username that sent the tweet
    • [x] user_location - Location entered in the account location info on Twitter
    • [x] user_description - Text added to "about" in account
    • [x] verified - If the user has the "verified by Twitter" blue tick
    • [x] followers_count - Number of Followers
    • [x] following_count - Number of accounts followed by the person who sent the tweet
    • [x] tweet_like_count - How many people liked the tweet
    • [x] tweet_retweet_count - How many people retweeted the tweet
    • [x] tweet_reply_count - How many people replied to that tweet
    • [x] source - Where was the tweet sent from. The link has info if using iPhone, Android and others
    • [x] tweet_text - Text sent in the tweet
  7. Unicorn Startups (Cleaned)

    • kaggle.com
    Updated Dec 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niek van der Zwaag (2021). Unicorn Startups (Cleaned) [Dataset]. https://www.kaggle.com/datasets/niekvanderzwaag/unicorn-startups-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Niek van der Zwaag
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    In business, a unicorn is a privately held startup company valued at over $1 billion. The term was first popularised in 2013 by venture capitalist Aileen Lee, choosing the mythical animal to represent the statistical rarity of such successful ventures.

    This dataset is a tidied up version of https://www.kaggle.com/ramjasmaurya/unicorn-startups/ shared by @ramjasmaurya

  8. 🎸🎹🎙️Speakers Sales Conversion Dataset🎸🎹🎙️

    • kaggle.com
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandeep SD (2025). 🎸🎹🎙️Speakers Sales Conversion Dataset🎸🎹🎙️ [Dataset]. https://www.kaggle.com/datasets/sandeep1080/bassburst
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sandeep SD
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🌟 Enjoying the Dataset? 🌟

    If this dataset helped you uncover new insights or make your day a little brighter. Thanks a ton for checking it out! Let’s keep those insights rolling! 🔥📈

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F23961675%2Ff3761bd2d7ee460ad464de8f25634f63%2Fsteve-johnson-z6LlNgsDeug-unsplash.jpg?generation=1740481184467263&alt=media" alt="">

    Dataset Description:

    This dataset contains website conversion data for Bluetooth speaker sales. The dataset tracks user sessions on different landing page variants, with the primary goal of analyzing conversion rates, user behavior, and other factors influencing sales. It includes detailed user engagement metrics such as time spent, pages visited, device type, sign-in methods, and geographical information.

    Use Case:

    This dataset can be used for various analytical tasks including:

    A/B testing and multivariate analysis to compare landing page designs.
    User segmentation by demographics (age, gender, location, etc.).
    Conversion rate optimization (CRO) analysis.
    Predictive modeling for conversion likelihood based on session characteristics.
    Revenue and payment analysis.

  9. Stroke Risk Prediction Dataset based on Literature

    • kaggle.com
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahatir Ahmed Tusher (2025). Stroke Risk Prediction Dataset based on Literature [Dataset]. http://doi.org/10.34740/kaggle/dsv/10892812
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mahatir Ahmed Tusher
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Stroke Risk Prediction Dataset (Version 2)

    Medically Validated, Age-Accurate, and Balanced
    Samples: 35,000 | Features: 16 | Targets: 2 (Binary + Regression)

    📌 Overview

    This dataset is designed for predicting stroke risk using symptoms, demographics, and medical literature-inspired risk modeling. Version 2 significantly improves upon Version 1 by incorporating age-dependent symptom probabilities, gender-specific risk modifiers, and medically validated feature engineering.

    Key Enhancements in Version 2:

    1. Age-Accurate Risk Modeling:

      • Stroke risk now follows a sigmoidal curve (sharp increase after age 50), reflecting real-world epidemiological trends.
      • Symptom probabilities (e.g., hypertension, chest pain) scale with age (see Medical Validity).
    2. Gender-Specific Risk:

      • Males under 60 have 1.5× higher risk, while females over 60 have 1.8× higher risk (post-menopausal hormonal changes).
    3. Balanced and Expanded Data:

      • 35,000 samples (vs. 10,000 in Version 1) to improve model generalizability and capture rare symptom combinations.
      • 50% at-risk (stroke risk ≥50%) and 50% not-at-risk (stroke risk <50%).

    📊 Dataset Statistics

    ColumnTypeDescription
    ageIntegerAge (18–90)
    genderStringMale/Female
    chest_painBinary1 = Present, 0 = Absent
    shortness_of_breathBinary1 = Present, 0 = Absent
    irregular_heartbeatBinary1 = Present, 0 = Absent
    fatigue_weaknessBinary1 = Present, 0 = Absent
    dizzinessBinary1 = Present, 0 = Absent
    swelling_edemaBinary1 = Present, 0 = Absent
    neck_jaw_painBinary1 = Present, 0 = Absent
    excessive_sweatingBinary1 = Present, 0 = Absent
    persistent_coughBinary1 = Present, 0 = Absent
    nausea_vomitingBinary1 = Present, 0 = Absent
    high_blood_pressureBinary1 = Present, 0 = Absent
    chest_discomfortBinary1 = Present, 0 = Absent
    cold_hands_feetBinary1 = Present, 0 = Absent
    snoring_sleep_apneaBinary1 = Present, 0 = Absent
    anxiety_doomBinary1 = Present, 0 = Absent
    at_riskBinaryTarget for classification (1 = At Risk, 0 = Not At Risk)
    stroke_risk_percentageFloatTarget for regression (0–100%)

    Age distribution in Version 2 vs. Version 1
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21100322%2F6317df05bc7526268853e24a5ce831ba%2FAge%20Distribution%20Plot.png?generation=1740875866152537&alt=media" alt="">

    🔬 Medical Validity

    This dataset is grounded in peer-reviewed medical literature, with symptom probabilities, risk weights, and demographic relationships directly derived from clinical guidelines and epidemiological studies. Below is a detailed breakdown of how medical knowledge was translated into dataset parameters:

    1. Age-Dependent Symptom Probabilities

    The prevalence of symptoms increases with age, reflecting real-world clinical observations. Probabilities are calibrated using population-level data from medical literature:

    Hypertension (High Blood Pressure)

    • Probability by Age: 10% (18–30), 25% (31–50), 45% (51–70), 60% (71–90).
    • Source: WHO Global Report on Stroke (2023) identifies hypertension as the leading modifiable stroke risk factor, with prevalence rising from ~12% in adults <30 to ~65% in adults >70.
    • Clinical Basis: Arterial stiffness and cumulative vascular damage over time explain the age-dependent increase (Chapter 4, Harrison’s Principles of Internal Medicine).

    Chest Pain

    • Probability by Age: 5% (18–30), 15% (31–50), 25% (51–70), 35% (71–90).
    • Source: The Stroke Book (Cambridge Medicine) notes that chest pain is rare in young adults but becomes prevalent in older populations due to atherosclerosis and coronary artery disease.
    • Clinical Basis: Atherosclerotic plaque buildup accelerates after age ...
  10. List of all the skills

    • kaggle.com
    Updated Aug 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ask9 (2020). List of all the skills [Dataset]. https://www.kaggle.com/datasets/arbazkhan971/allskillandnonskill
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ask9
    Description

    Context

    which contains all the skills from linkedin ,Github and stackoverflow and all the skills from job descriptions across different platform like naukri ,indeed and monster.com

    This is the World's Largest Collection of Dataset for skills which covers all the skills.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  11. ICR-integer-data

    • kaggle.com
    Updated May 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    raddar (2023). ICR-integer-data [Dataset]. https://www.kaggle.com/datasets/raddar/icr-integer-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    raddar
    Description

    The dataset contains https://www.kaggle.com/competitions/icr-identify-age-related-conditions competition dataset transformed into integerized data. The common denominator is found for each column. Distribution of even/odd numbers were performed to identify if some values should be a fraction.

    Columns 'FL' and 'GL' were untouched, probably float by nature.

    Please refer to notebook for exact transformations: https://www.kaggle.com/code/raddar/convert-icr-data-to-integers

  12. Social Media Dataset

    • kaggle.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nixie6254 (2025). Social Media Dataset [Dataset]. https://www.kaggle.com/datasets/nixie6254/social-media-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nixie6254
    Description

    This dataset consists of 734 entries representing social media activity and performance from a local SME (Micro, Small, and Medium Enterprise) across TikTok, Instagram, and Twitter platforms. It captures key metrics related to audience interaction and content strategy effectiveness, and is valuable for evaluating and optimizing digital marketing efforts for small businesses.

    Area : Target location or customer region where the UMKM's content is directed. Category : The business content category (e.g., product promotion, education, seasonal campaign). Day : The day of the week the content was published. Month : The month the post went live. Platform : The social media platform used by the UMKM (TikTok, Instagram, or Twitter). Post Type : The format of the content posted: image, video, carousel, or text. Timestamp : The exact date and time when the content was posted. User : The username or business account that posted the content. Week : Week number within the year for time-based analysis. Year : The year the content was posted. Comments : Total number of comments received on the post. Engagement Rate : A calculated metric showing how engaging the content is (based on likes, comments, shares vs. reach/impressions). Hour : Hour of the day the post was published. Impressions : Number of times the content appeared on users' feeds. Likes : Number of likes the post received. Reach : Number of unique users who saw the content. Shares : Number of times users shared the content.

  13. All ISIC Data 20240629

    • kaggle.com
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tomoo inubushi (2024). All ISIC Data 20240629 [Dataset]. https://www.kaggle.com/datasets/tomooinubushi/all-isic-data-20240629
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    tomoo inubushi
    Description

    All images and metadata in ISIC archive.

    !pip install isic-cli
    !isic image download images/
    
    • image.hdf: Images in hdf5 format with no postprocessing
    • image_256sq.hdf: Images in hdf5 format with square cropping and resizing to 256x256

    See also - https://www.kaggle.com/competitions/isic-2024-challenge/discussion/515356 - https://www.kaggle.com/competitions/siim-isic-melanoma-classification/discussion/171801 - https://www.kaggle.com/competitions/siim-isic-melanoma-classification/discussion/161943

  14. Google Maps Restaurant Reviews

    • kaggle.com
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deniz Bilgin (2023). Google Maps Restaurant Reviews [Dataset]. https://www.kaggle.com/datasets/denizbilginn/google-maps-restaurant-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Deniz Bilgin
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Data includes reviews of different restaurants on Google Maps. There are 1100 comments in total and pictures of each comment in the data set. The data is labeled according to 4 classes (Taste, Menu, Indoor atmosphere, Outdoor atmosphere) for the artificial intelligence to predict. The dataset has been prepared in a way that can be used in both text processing and image processing fields.

    The dataset contains the following columns: business_name, author_name, text, photo, rating, rating_category

    IMPORTANT: The rating_category column is related to the photo of the review. If you want to use this dataset for NLP, you need to label it yourself. I will label it for you when I am available.

  15. Heart Disease Prediction Dataset

    • kaggle.com
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krish Ujeniya (2024). Heart Disease Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/krishujeniya/heart-diseae
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Krish Ujeniya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    This dataset contains medical data used for predicting heart disease. The data includes various attributes such as age, sex, chest pain type (cp), resting blood pressure (trestbps), cholesterol (chol), fasting blood sugar (fbs), resting electrocardiographic results (restecg), maximum heart rate achieved (thalach), exercise-induced angina (exang), and ST depression induced by exercise relative to rest (oldpeak).

    Columns

    age: Age of the patient (in years) sex: Sex of the patient (1 = male, 0 = female) cp: Chest pain type (1-4) trestbps: Resting blood pressure (in mm Hg on admission to the hospital) chol: Serum cholesterol in mg/dl fbs: Fasting blood sugar > 120 mg/dl (1 = true; 0 = false) restecg: Resting electrocardiographic results (0-2) thalach: Maximum heart rate achieved exang: Exercise-induced angina (1 = yes; 0 = no) oldpeak: ST depression induced by exercise relative to rest

  16. Multimodal Single-Cell Integration Related Data 01

    • kaggle.com
    Updated Sep 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chervov (2022). Multimodal Single-Cell Integration Related Data 01 [Dataset]. https://www.kaggle.com/datasets/alexandervc/multimodal-singlecell-integration-related-data-01
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alexander Chervov
    Description

    The preprocessed and "denoised" data for competition: https://www.kaggle.com/competitions/open-problems-multimodal

    For further information see discussion: https://www.kaggle.com/competitions/open-problems-multimodal/discussion/350856

  17. Incident_event_log_dataset

    • kaggle.com
    Updated Mar 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    winmedals (2022). Incident_event_log_dataset [Dataset]. https://www.kaggle.com/datasets/winmedals/incident-event-log-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 24, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    winmedals
    Description

    Source: https://archive.ics.uci.edu/ml/datasets/Incident+management+process+enriched+event+log

    Reposting as kaggle dataset for convenience and fast usage

  18. Multilabel classification music emotions dataset

    • kaggle.com
    zip
    Updated Oct 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    srinivas365 (2018). Multilabel classification music emotions dataset [Dataset]. https://www.kaggle.com/datasets/srinivas365/multilabel-classification-emotions
    Explore at:
    zip(322050 bytes)Available download formats
    Dataset updated
    Oct 25, 2018
    Authors
    srinivas365
    Description

    Dataset

    This dataset was created by srinivas365

    Contents

  19. Customer Shopping Trends Dataset

    • kaggle.com
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sourav Banerjee
    Description

    Context

    The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

    Content

    This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

    Dataset Glossary (Column-wise)

    • Customer ID - Unique identifier for each customer
    • Age - Age of the customer
    • Gender - Gender of the customer (Male/Female)
    • Item Purchased - The item purchased by the customer
    • Category - Category of the item purchased
    • Purchase Amount (USD) - The amount of the purchase in USD
    • Location - Location where the purchase was made
    • Size - Size of the purchased item
    • Color - Color of the purchased item
    • Season - Season during which the purchase was made
    • Review Rating - Rating given by the customer for the purchased item
    • Subscription Status - Indicates if the customer has a subscription (Yes/No)
    • Shipping Type - Type of shipping chosen by the customer
    • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
    • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
    • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
    • Payment Method - Customer's most preferred payment method
    • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

    Structure of the Dataset

    https://i.imgur.com/6UEqejq.png" alt="">

    Acknowledgement

    This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

    Cover Photo by: Freepik

    Thumbnail by: Clothing icons created by Flat Icons - Flaticon

  20. E-commerce Business Transaction

    • kaggle.com
    Updated May 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Ramos (2022). E-commerce Business Transaction [Dataset]. https://www.kaggle.com/datasets/gabrielramos87/an-online-shop-business
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2022
    Dataset provided by
    Kaggle
    Authors
    Gabriel Ramos
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    E-commerce has become a new channel to support businesses development. Through e-commerce, businesses can get access and establish a wider market presence by providing cheaper and more efficient distribution channels for their products or services. E-commerce has also changed the way people shop and consume products and services. Many people are turning to their computers or smart devices to order goods, which can easily be delivered to their homes.

    Content

    This is a sales transaction data set of UK-based e-commerce (online retail) for one year. This London-based shop has been selling gifts and homewares for adults and children through the website since 2007. Their customers come from all over the world and usually make direct purchases for themselves. There are also small businesses that buy in bulk and sell to other customers through retail outlet channels.

    The data set contains 500K rows and 8 columns. The following is the description of each column. 1. TransactionNo (categorical): a six-digit unique number that defines each transaction. The letter “C” in the code indicates a cancellation. 2. Date (numeric): the date when each transaction was generated. 3. ProductNo (categorical): a five or six-digit unique character used to identify a specific product. 4. Product (categorical): product/item name. 5. Price (numeric): the price of each product per unit in pound sterling (£). 6. Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions. 7. CustomerNo (categorical): a five-digit unique number that defines each customer. 8. Country (categorical): name of the country where the customer resides.

    There is a small percentage of order cancellation in the data set. Most of these cancellations were due to out-of-stock conditions on some products. Under this situation, customers tend to cancel an order as they want all products delivered all at once.

    Inspiration

    Information is a main asset of businesses nowadays. The success of a business in a competitive environment depends on its ability to acquire, store, and utilize information. Data is one of the main sources of information. Therefore, data analysis is an important activity for acquiring new and useful information. Analyze this dataset and try to answer the following questions. 1. How was the sales trend over the months? 2. What are the most frequently purchased products? 3. How many products does the customer purchase in each transaction? 4. What are the most profitable segment customers? 5. Based on your findings, what strategy could you recommend to the business to gain more profit?

    Photo by CardMapr on Unsplash

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Thomas (2022). SSH login attempts on a Raspberry Pi [Dataset]. https://www.kaggle.com/datasets/booroom/ssh-login-attempts-on-my-raspberry-pi
Organization logo

SSH login attempts on a Raspberry Pi

All the failed authentications on my little strawberry

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Thomas
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

I was scared when I saw the large number of connection attempts to my little raspberry pi. I had the idea to share some of them with you. Small mistake on my part, I can't get the exact date (month, year).

Content

The csv file contains : - The month - The hour when the login attempt is - The username which is used - The IP address used - The port which is used

Inspiration

What can you tell me about all these connections? Where do they come from? What are the most used usernames? Are there days when it is better to cut off the internet? At what time are the bots most active? Which port do I have to use?

Search
Clear search
Close search
Google apps
Main menu