2 datasets found
  1. Data from: Spam email Dataset

    • kaggle.com
    Updated Sep 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    _w1998 (2023). Spam email Dataset [Dataset]. https://www.kaggle.com/datasets/jackksoncsie/spam-email-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    _w1998
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Dataset Name: Spam Email Dataset

    Description: This dataset contains a collection of email text messages, labeled as either spam or not spam. Each email message is associated with a binary label, where "1" indicates that the email is spam, and "0" indicates that it is not spam. The dataset is intended for use in training and evaluating spam email classification models.

    Columns:

    text (Text): This column contains the text content of the email messages. It includes the body of the emails along with any associated subject lines or headers.

    spam_or_not (Binary): This column contains binary labels to indicate whether an email is spam or not. "1" represents spam, while "0" represents not spam.

    Usage: This dataset can be used for various Natural Language Processing (NLP) tasks, such as text classification and spam detection. Researchers and data scientists can train and evaluate machine learning models using this dataset to build effective spam email filters.

  2. Cheltenham's Facebook Groups

    • kaggle.com
    zip
    Updated Apr 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Chirico (2018). Cheltenham's Facebook Groups [Dataset]. https://www.kaggle.com/datasets/mchirico/cheltenham-s-facebook-group
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 2, 2018
    Authors
    Mike Chirico
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Facebook is becoming an essential tool for more than just family and friends. Discover how Cheltenham Township (USA), a diverse community just outside of Philadelphia, deals with major issues such as the Bill Cosby trial, everyday traffic issues, sewer I/I problems and lost cats and dogs. And yes, theft.

    Communities work when they're connected and exchanging information. What and who are the essential forces making a positive impact, and when and how do conversational threads get directed or misdirected?

    Use Any Facebook Public Group

    You can leverage the examples here for any public Facebook group. For an example of the source code used to collect this data, and a quick start docker image, take a look at the following project: facebook-group-scrape.

    Data Sources

    There are 4 csv files in the dataset, with data from the following 5 public Facebook groups:

    post.csv

    These are the main posts you will see on the page. It might help to take a quick look at the page. Commas in the msg field have been replaced with {COMMA}, and apostrophes have been replaced with {APOST}.

    • gid Group id (5 different Facebook groups)
    • pid Main Post id
    • id Id of the user posting
    • name User's name
    • timeStamp
    • shares
    • url
    • msg Text of the message posted.
    • likes Number of likes

    comment.csv

    These are comments to the main post. Note, Facebook postings have comments, and comments on comments.

    • gid Group id
    • pid Matches Main Post identifier in post.csv
    • cid Comment Id.
    • timeStamp
    • id Id of user commenting
    • name Name of user commenting
    • rid Id of user responding to first comment
    • msg Message

    like.csv

    These are likes and responses. The two keys in this file (pid,cid) will join to post and comment respectively.

    • gid Group id
    • pid Matches Main Post identifier in post.csv
    • cid Matches Comments id.
    • response Response such as LIKE, ANGRY etc.
    • id The id of user responding
    • name Name of the user responding

    member.csv

    These are all the members in the group. Some members never, or rarely, post or comment. You may find multiple entries in this table for the same person. The name of the individual never changes, but they change their profile picture. Each profile picture change is captured in this table. Facebook gives users a new id in this table when they change their profile picture.

    • gid Group id
    • id Id of the member
    • name Name of the member
    • url URL of the member
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
_w1998 (2023). Spam email Dataset [Dataset]. https://www.kaggle.com/datasets/jackksoncsie/spam-email-dataset
Organization logo

Data from: Spam email Dataset

This dataset contains a collection of email text messages, spam or not spam.

Related Article
Explore at:
7 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
_w1998
License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

Dataset Name: Spam Email Dataset

Description: This dataset contains a collection of email text messages, labeled as either spam or not spam. Each email message is associated with a binary label, where "1" indicates that the email is spam, and "0" indicates that it is not spam. The dataset is intended for use in training and evaluating spam email classification models.

Columns:

text (Text): This column contains the text content of the email messages. It includes the body of the emails along with any associated subject lines or headers.

spam_or_not (Binary): This column contains binary labels to indicate whether an email is spam or not. "1" represents spam, while "0" represents not spam.

Usage: This dataset can be used for various Natural Language Processing (NLP) tasks, such as text classification and spam detection. Researchers and data scientists can train and evaluate machine learning models using this dataset to build effective spam email filters.

Search
Clear search
Close search
Google apps
Main menu