Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset presents a curated collection of over 200 listings from the Kids and Baby section of Etsy, showcasing a variety of trending items. Each entry encapsulates essential details such as product name, price, listing date, number of favorites, and reviews, offering a snapshot of consumer preferences and market trends within this niche.
Given its concise nature, this dataset is ideally suited for exploratory data analysis (EDA), trend identification, and preliminary market research. It provides a fertile ground for understanding consumer engagement through favorites and reviews, price point analysis, and temporal trends in product listings. Students and early-career data scientists can leverage this dataset to hone their skills in data cleaning, visualization, and basic predictive modeling, focusing on real-world e-commerce data.
This dataset is composed of ethically mined public data, adhering to data privacy norms and ethical guidelines. It respects user privacy by focusing on publicly available product information without delving into sensitive or personal data.
Gratitude is extended to Etsy for fostering a vibrant marketplace that connects creative entrepreneurs with consumers seeking unique, handcrafted items. This analysis acknowledges the Etsy platform's role in empowering small businesses and contributing to the diverse dataset.
The Etsy logo is acknowledged, with its image sourced from FreeLogoPNG, underlining the visual identity of the platform discussed in this dataset.
Facebook
TwitterThe FIFA World Cup (often simply called the World Cup ),  being the most prestigious association football tournament, as well as the most widely viewed and followed sporting event in the world, was one of the Top Trending topics frequently on Twitter while ongoing.
This dataset contains a random collection of tweets starting from the Knockout stages till the World Cup Final that took place on 15 July, 2018.
A preliminary analysis from the data (till the Knockout stages) is available at:
https://medium.com/@ritu_rg/nlp-text-visualization-twitter-sentiment-analysis-in-r-5ac22c778448
Data Collection:
The dataset was created using the Tweepy API, by streaming tweets from world-wide football fans before, during or after the matches.
Tweepy is a Python API for accessing the Twitter API, that provides an easy-to-use interface for streaming real-time data from Twitter. More information related to this API can be found at: http://tweepy.readthedocs.io/en/v3.5.0/
Data Pre-processing:
The dataset includes English language tweets containing any references to FIFA or the World Cup. The collected tweets have been pre-processed to facilitate analysis , while trying to ensure that any information from the original tweets is not lost.
- The original tweet has been stored in the column "Orig_tweet".
- As part of pre-processing, using the "BeautifulSoup" & "regex" libraries in Python, the tweets have been cleaned off any nuances as required for natural language processing, such as website names, hashtags, user mentions, special characters, RTs, tabs, heading/trailing/multiple spaces, among others.
- Words containing extensions such as n't 'll 're 've have been replaced with their proper English language counterparts. Duplicate tweets have been removed from the dataset.
- The original Hashtags & User Mentions extracted during the above step have also been stored in separate columns.
Data Storage:
The collected tweets have been consolidated into a single dataset & shared as a Comma Separated Values file.
Each tweet is uniquely identifiable by its ID, & characterized by the following attributes, per availability:
- "Lang" - Language of the tweet
- "Date" - When it was tweeted
- "Source" - The device/medium where it was tweeted from
- "len" - The length of the tweet
- "Orig_Tweet" - The tweet in its original form
- "Tweet" - The updated tweet after pre-processing
- "Likes" - The number of likes received by the tweet (till the time the extraction was done)
- "RTs" - The number of times the tweet was shared
- "Hashtags" - The Hashtags found in the original tweet
- "UserMentionNames" & "UserMentionID" -  xtracted from the original tweet
It also includes the following attributes about the person that the tweet is from:
- "Name" & "Place" of the user
- "Followers" - The number of followers that the user account has
- "Friends" - The number of friends the user account has
The following resources have helped me through using the Tweepy API:
http://tweepy.readthedocs.io/en/v3.5.0/auth_tutorial.html
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets
https://www.safaribooksonline.com/library/view/mining-the-social/9781449368180/ch01.html
This project gave me a fascinating look into the conversations & sentiments of people from all over the world, who were following this prestigious football tournament, while also giving me the opportunity to explore some of the streaming, natural language processing & visualizations techniques in both R & Python
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset presents a curated collection of over 200 listings from the Kids and Baby section of Etsy, showcasing a variety of trending items. Each entry encapsulates essential details such as product name, price, listing date, number of favorites, and reviews, offering a snapshot of consumer preferences and market trends within this niche.
Given its concise nature, this dataset is ideally suited for exploratory data analysis (EDA), trend identification, and preliminary market research. It provides a fertile ground for understanding consumer engagement through favorites and reviews, price point analysis, and temporal trends in product listings. Students and early-career data scientists can leverage this dataset to hone their skills in data cleaning, visualization, and basic predictive modeling, focusing on real-world e-commerce data.
This dataset is composed of ethically mined public data, adhering to data privacy norms and ethical guidelines. It respects user privacy by focusing on publicly available product information without delving into sensitive or personal data.
Gratitude is extended to Etsy for fostering a vibrant marketplace that connects creative entrepreneurs with consumers seeking unique, handcrafted items. This analysis acknowledges the Etsy platform's role in empowering small businesses and contributing to the diverse dataset.
The Etsy logo is acknowledged, with its image sourced from FreeLogoPNG, underlining the visual identity of the platform discussed in this dataset.