2 datasets found

Data from: Spam email Dataset
kaggle.com
Updated Sep 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
_w1998 (2023). Spam email Dataset [Dataset]. https://www.kaggle.com/datasets/jackksoncsie/spam-email-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
_w1998
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Dataset Name: Spam Email Dataset

Description: This dataset contains a collection of email text messages, labeled as either spam or not spam. Each email message is associated with a binary label, where "1" indicates that the email is spam, and "0" indicates that it is not spam. The dataset is intended for use in training and evaluating spam email classification models.

Columns:

text (Text): This column contains the text content of the email messages. It includes the body of the emails along with any associated subject lines or headers.

spam_or_not (Binary): This column contains binary labels to indicate whether an email is spam or not. "1" represents spam, while "0" represents not spam.

Usage: This dataset can be used for various Natural Language Processing (NLP) tasks, such as text classification and spam detection. Researchers and data scientists can train and evaluate machine learning models using this dataset to build effective spam email filters.
Cheltenham's Facebook Groups
kaggle.com
zip
Updated Apr 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Chirico (2018). Cheltenham's Facebook Groups [Dataset]. https://www.kaggle.com/datasets/mchirico/cheltenham-s-facebook-group
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 2, 2018
Authors
Mike Chirico
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Facebook is becoming an essential tool for more than just family and friends. Discover how Cheltenham Township (USA), a diverse community just outside of Philadelphia, deals with major issues such as the Bill Cosby trial, everyday traffic issues, sewer I/I problems and lost cats and dogs. And yes, theft.

Communities work when they're connected and exchanging information. What and who are the essential forces making a positive impact, and when and how do conversational threads get directed or misdirected?

Use Any Facebook Public Group

You can leverage the examples here for any public Facebook group. For an example of the source code used to collect this data, and a quick start docker image, take a look at the following project: facebook-group-scrape.

Data Sources

There are 4 csv files in the dataset, with data from the following 5 public Facebook groups:

Unofficial Cheltenham Township

Elkins Park Happenings!

Free Speech Zone

Cheltenham Lateral Solutions

Cheltenham Township Residents

post.csv

These are the main posts you will see on the page. It might help to take a quick look at the page. Commas in the msg field have been replaced with {COMMA}, and apostrophes have been replaced with {APOST}.

gid Group id (5 different Facebook groups)

pid Main Post id

id Id of the user posting

name User's name

timeStamp

shares

url

msg Text of the message posted.

likes Number of likes

comment.csv

These are comments to the main post. Note, Facebook postings have comments, and comments on comments.

gid Group id

pid Matches Main Post identifier in post.csv

cid Comment Id.

timeStamp

id Id of user commenting

name Name of user commenting

rid Id of user responding to first comment

msg Message

like.csv

These are likes and responses. The two keys in this file (pid,cid) will join to post and comment respectively.

gid Group id

pid Matches Main Post identifier in post.csv

cid Matches Comments id.

response Response such as LIKE, ANGRY etc.

id The id of user responding

name Name of the user responding

member.csv

These are all the members in the group. Some members never, or rarely, post or comment. You may find multiple entries in this table for the same person. The name of the individual never changes, but they change their profile picture. Each profile picture change is captured in this table. Facebook gives users a new id in this table when they change their profile picture.

gid Group id

id Id of the member

name Name of the member

url URL of the member
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

_w1998 (2023). Spam email Dataset [Dataset]. https://www.kaggle.com/datasets/jackksoncsie/spam-email-dataset

Data from: Spam email Dataset

This dataset contains a collection of email text messages, spam or not spam.

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 1, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

_w1998

License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

Dataset Name: Spam Email Dataset

Description: This dataset contains a collection of email text messages, labeled as either spam or not spam. Each email message is associated with a binary label, where "1" indicates that the email is spam, and "0" indicates that it is not spam. The dataset is intended for use in training and evaluating spam email classification models.

Columns:

text (Text): This column contains the text content of the email messages. It includes the body of the emails along with any associated subject lines or headers.

spam_or_not (Binary): This column contains binary labels to indicate whether an email is spam or not. "1" represents spam, while "0" represents not spam.

Usage: This dataset can be used for various Natural Language Processing (NLP) tasks, such as text classification and spam detection. Researchers and data scientists can train and evaluate machine learning models using this dataset to build effective spam email filters.

Clear search

Close search

Google apps

Main menu

Data from: Spam email Dataset

Cheltenham's Facebook Groups

Data from: Spam email Dataset

This dataset contains a collection of email text messages, spam or not spam.