50 datasets found

Average number of comments for Instagram post types 2020
statista.com
Updated Jul 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average number of comments for Instagram post types 2020 [Dataset]. https://www.statista.com/statistics/1170479/median-comment-instagram-post-type/
Explore at:
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2017 - Jul 2020
Area covered
Worldwide
Description
A July 2020 study on the photo-sharing social media site Instagram revealed that video posts got the highest number of comments, with a median of **** comments per video post. Additionally, carousel posts received the second-highest median engagement with a median of **** comments per carousel post.
Instagram: engagement on posts 2023-2024, by type of interaction
statista.com
ai-chatbox.pro
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Instagram: engagement on posts 2023-2024, by type of interaction [Dataset]. https://www.statista.com/statistics/1356226/instagram-engagement-reels-posts-type-of-interaction/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 2023 - Mar 2024
Area covered
Worldwide
Description
In 2024, the average post on Instagram received ***** comments, ****** likes, and ***** saves. In 2023, the average post on the platform amassed ***** comments, ****** likes, and ***** saves.
BlogFeedback Data Set
kaggle.com
zip
Updated Jul 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julio Tentor (2022). BlogFeedback Data Set [Dataset]. https://www.kaggle.com/datasets/jtentor/blogfeedback-data-set
Explore at:
zip(2550651 bytes)Available download formats
Dataset updated
Jul 15, 2022
Authors
Julio Tentor
Description
Source:

Krisztian Buza Budapest University of Technology and Economics buza '@' cs.bme.hu http://www.cs.bme.hu/~buza

You can download a zip file from https://archive.ics.uci.edu/ml/datasets/BlogFeedback

Data Set Information:

This data originates from blog posts. The raw HTML-documents of the blog posts were crawled and processed.

The prediction task associated with the data is the prediction of the number of comments in the upcoming 24 hours.

In order to simulate this situation, we choose a basetime (in the past) and select the blog posts that were published at most 72 hours before the selected base date/time. Then, we calculate all the features of the selected blog posts from the information that was available at the basetime, therefore each instance corresponds to a blog post. The target is the number of comments that the blog post received in the next 24 hours relative to the base time.

In the train data, the base times were in the years 2010 and 2011. In the test data the base times were in February and March 2012.

This simulates the real-world situation in which training data from the past is available to predict events in the future.

The train data was generated from different base times that may temporally overlap.

Therefore, if you simply split the train into disjoint partitions, the underlying time intervals may overlap.

Therefore, you should use the provided, temporally disjoint train and test splits in order to ensure that the evaluation is fair.

** Attribute Information:**

1...50: Average, standard deviation, min, max and median of the Attributes 51...60 for the source of the current blog post. With source we mean the blog on which the post appeared. For example, myblog.blog.org would be the source of the post myblog.blog.org/post_2010_09_10

51: Total number of comments before basetime 52: Number of comments in the last 24 hours before the base time 53: Let T1 denote the datetime 48 hours before basetime, Let T2 denote the datetime 24 hours before basetime. This attribute is the number of comments in the time period between T1 and T2 54: Number of comments in the first 24 hours after the publication of the blog post, but before basetime 55: The difference of Attribute 52 and Attribute 53 56...60: The same features as the attributes 51...55, but features 56...60 refer to the number of links (trackbacks), while features 51...55 refer to the number of comments. 61: The length of time between the publication of the blog post and base time 62: The length of the blog post 63...262: The 200 bag of words features for 200 frequent words of the text of the blog post 263...269: binary indicator features (0 or 1) for the weekday (Monday...Sunday) of the basetime 270...276: binary indicator features (0 or 1) for the weekday (Monday...Sunday) of the date of publication of the blog post 277: Number of parent pages: we consider a blog post P as a parent of blog post B, if B is a reply (trackback) to blog post P. 278...280: Minimum, maximum, average number of comments that the parents received 281: The target: the number of comments in the next 24 hours (relative to base time)

** Relevant Papers:**

Buza, K. (2014). Feedback Prediction for Blogs. In Data Analysis, Machine Learning and Knowledge Discovery (pp. 145-152). Springer International Publishing (http://cs.bme.hu/~buza/pdfs/gfkl2012_blogs.pdf).
c
European Parliament Election 2024: German Candidates Social Media Activities...
datacatalogue.cessda.eu
Updated Mar 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Breuer, Johannes; Soldner, Felix; Shankaran, Vigneshwaran (2025). European Parliament Election 2024: German Candidates Social Media Activities [Dataset]. http://doi.org/10.4232/1.14457
Explore at:
Unique identifier
https://doi.org/10.4232/1.14457
Dataset updated
Mar 25, 2025
Dataset provided by
GESIS – Leibniz-Institut für Sozialwissenschaften
Authors
Breuer, Johannes; Soldner, Felix; Shankaran, Vigneshwaran
Time period covered
Apr 1, 2024 - Jun 9, 2024
Area covered
Germany
Variables measured
Media unit
Measurement technique
Automated data extraction: API query, Automated data extraction: Web scraping, A list of all the candidates´ accounts (i.e., handles) was created and provided by the GLES Candidate Study European Election 2024 project team to collect the social media content of the candidates. The candidates´ accounts are available in the Social Media Accounts dataset from GESIS (study nr. ZA7959). For some platforms, the number of collected accounts is lower than that of identified accounts since not all accounts were active (i.e., they did not post) during the collection period.Facebook and Instagram content was collected using the Crowdtangle platform, using its web interface and API, based on the list of the candidates’ Facebook and Instagram page names. Facebook and Instagram Ad data was collected using the Meta Ad Library API. The Meta Ad library is accessible through a Facebook account, and API access requires a verified developer account. Ad data was collected based on the Facebook page IDs in the provided list. We also collected Instagram data with the same ID. As many ads are paid for by the party accounts instead of individual candidate accounts, we also collected ad data for the main party pages (national level + EU level if such an account existed for the party).X (Twitter) data was collected through the paid X API and a web scraping approach. The candidates’ user handles were used for search queries, and we collected posts by the candidates and account metadata (nr. of followers, etc.). For the X API, the first 75% of the most active candidates were collected; hence, the dataset contains fewer candidates than in the scarped data.TikTok data was collected using a web scraping approach. The candidates’ user handles were used for search queries, and metadata about the account (nr. of followers, nr. of videos, etc.) and the videos posted by the account (e.g., nr. of likes) was collected.
Description
In close cooperation with the German Longitudinal Election Study (GLES), the GESIS Web Data service has collected social media data from German candidates for the 2024 European elections. All public posts by the candidates on Twitter, Facebook, Instagram and TikTok were collected, as well as data on the candidates´ adverts published on Facebook and Instagram. The collected data provides an insight into the candidates´ online activities and campaign strategies during the election cycle. The dataset includes aggregated data for the period from 1 April 2024 to 9 June 2024 (Election Day). Data was also collected from web searches and the candidates´ websites, but is not currently included in this dataset for privacy reasons. Should you be interested in this data or other aggregated statistics, please contact css.datacollection@gesis.org.
Detailed platform variables: For each platform, the data was aggregated on the account level. Account-level information (e.g., count of followers) was collected as close to the election as possible (in most cases, the election day itself).

Facebook: Unique identifier for candidates, links to Social Media Accounts dataset; candidates’ party affiliation during the 2024 European Parliament election; total number of followers of the candidate at the time of the last post before the election; total number of posts by the candidate; total number of pictures, videos, and links posted by the candidate; total number of status updates posted by the candidate; total number of likes received on the candidates’ accounts; average number of likes per post; median number of likes per post; lowest number of likes received on a single post; highest number of likes received on a single post; total number of comments received on the candidates´ accounts; average number of comments per post; median number of comments per post; lowest number of comments received on a single post; highest number of comments received on a single post; total number of re-shares of the candidates’ posts; average number of re-shares per post; Median number of re-shares per post; lowest number of re-shares of a single post; highest number of re-shares of a single post; total number of positive emojis (“Love”, “Wow”, “Haha” & ”Care” emoji) used as reactions on the candidates´ posts; average number of positive emojis used as reaction per post; median number of positive emojis used as reaction per post; lowest number of positive emojis used as reaction on a single post; highest number of positive emojis used as reaction on a single post; total number of negative emojis (“Angry” and “Sad” emoji) used as reaction on the candidates´ accounts; median number of negative emojis (“Angry” and “Sad” emoji) used as reaction per post; lowest number of negative emojis (“Angry” and “Sad” emoji) used as reaction on a single post; highest number of negative emojis (“Angry” and “Sad” emoji) used as reaction on a single post; total number of interactions (likes, comments, re-shares, emoji-reactions) on the candidates´ posts; average number of interactions per post; median number of interactions per post; lowest number of interactions with a single post; highest number of interactions with a single post.

Instagram: Unique identifier for candidates; links to Social Media Accounts dataset; candidates’ party affiliation during the 2024 European Parliament election; total number of followers of the candidate at the time of the last post before the election; total number of posts, of pictures, of videos, of albums posted by the candidate; total number of views of posts of the candidate; average number of views per post; median number of views per post; lowest number of views received on a single post; highest number of views received on a single post; total number of likes received on the candidates´ accounts; average number of likes per post; median number of likes per post; lowest number of likes received on a single post; highest number of likes received on a single post; total number of comments received on the candidates´ accounts; average number of comments per post; median number of comments per post; lowest number of comments received on a single post; highest number of comments received on a single post; total number of interactions (likes, comments) on the candidates´ posts; average number of interactions per post; median number of interactions per post; lowest number of interactions with a single post; highest number of interactions with a single post.

X (Twitter): Unique identifier for candidates; links to Social Media Accounts dataset; candidates’ party affiliation during the 2024 European Parliament election; total number of followers of the account; total number of accounts the account is following; total number of tweets published by the account; total number of likes received by the account´s tweets; average number of likes per tweet; lowest number of likes received on a single tweet; median number of likes received per...
Facebook Spam Dataset
kaggle.com
Updated Apr 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khaja Hussain SK (2021). Facebook Spam Dataset [Dataset]. https://www.kaggle.com/khajahussainsk/facebook-spam-dataset/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Khaja Hussain SK
Description
Context Collection of Facebook spam-legit profile and content-based data. It can be used for classification tasks.

Content The dataset can be used for building machine learning models. To collect the dataset, Facebook API and Facebook Graph API are used and the data is collected from public profiles. There are 500 legit profiles and 100 spam profiles. The list of features is as follows with Label (0-legit, 1-spam). 1. Number of friends 2. Number of followings 3. Number of Community 4. The age of the user account (in days) 5. Total number of posts shared 6. Total number of URLs shared 7. Total number of photos/videos shared 8. Fraction of the posts containing URLs 9. Fraction of the posts containing photos/videos 10. Average number of comments per post 11. Average number of likes per post 12. Average number of tags in a post (Rate of tagging) 13. Average number of hashtags present in a post

Inspiration Dataset helps the community to understand how features can help to differ Facebook legit users from spam users.
CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis
zenodo.org
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li (2025). CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis [Dataset]. http://doi.org/10.5281/zenodo.11409612
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11409612
Dataset updated
May 11, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 1, 2024
Description
Overview
The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.

Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.

Data Collection and Annotation
Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
• Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
• Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
• Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
• The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]

Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.

Dataset Description
• Total Samples: 61,734
• Total Samples Annotated: 57,222 after filtering.
• Total Posts: 3,646
• Average Likes per Post: 65.1
• Average Likes per Comment: 10.5
• Average Length of News Text: 655 words
• Average Number of Images per Post: 3.7

Components of the Dataset
The dataset comprises two main components:
• CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
• Images Folder: Contains folders with images corresponding to each post.

Data Format and Fields of the CSV File
The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
• Id: Unique identifier
• Post: The heading of the news article.
• News_text: The text of the news article.
• News_link: URL link to the original news article.
• News_Images: A path to the folder containing images related to the post.
• Post_shares: Number of times the post has been shared.
• Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
• Comment: Text of the user comment.
• Comment_like: Number of likes on the comment.
• Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
• Comment_link: URL link to the original comment on Facebook.
• Comment_rank: Rank of the comment based on engagement and relevance.
• Score: Sentiment score computed based on the consensus of sentiment analysis models.
• Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
• Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).

More Considerations During Dataset Construction
We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:

• Why not merge data from different social media platforms? We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.

• Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.

• Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.

• Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.

Ethical considerations, data privacy and misuse prevention
The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"
Engagement per post of leading sport and fitness influencers on Instagram...
statista.com
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Engagement per post of leading sport and fitness influencers on Instagram 2020 [Dataset]. https://www.statista.com/statistics/1126447/leading-sport-and-fitness-influencers-on-instagram-in-italy/
Explore at:
Dataset updated
Jul 21, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2020
Area covered
Italy
Description
As of March 2020, Giovanni Fois (@giovanni_fois) was the leading sport and fitness influencer in Italy recording the highest engagement per post on Instagram. Posts published by this Instagram influencer registered an average of **** thousand likes and comments as of the survey period. Letizia Paternoster (@letiziapaternoster) followed in the list with an average of roughly ** thousand comments and likes per post.
f
Data_Sheet_1_Implementation and Evaluation of a Social Media-Based...
figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muneera A. Rasheed; Alma Arshad Hookmani; Sana Waleed; H. Sundus Fatima; Soha Siddiqui; Muhammad Khurram; Babar S. Hasan (2023). Data_Sheet_1_Implementation and Evaluation of a Social Media-Based Communication Strategy to Enhance Employee Engagement: Experiences From a Children's Hospital, Pakistan.PDF [Dataset]. http://doi.org/10.3389/fpubh.2021.584179.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2021.584179.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Muneera A. Rasheed; Alma Arshad Hookmani; Sana Waleed; H. Sundus Fatima; Soha Siddiqui; Muhammad Khurram; Babar S. Hasan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pakistan
Description
Social media can complement organizational communication strategy which is integral to employee engagement. However, successful case studies which can allow replication are limited. The objective of the study is to describe the design, implementation, and evaluation of a social media-based communication strategy in a tertiary care hospital in Pakistan. The leadership of the pediatric service line developed an intervention plan to engage the employees with the newly reframed vision to improve patient and family experience. An online communication platform—Facebook page—was created for all employees of the pediatric service line. The strategy to influence employees was based on Cialdini's six principles of persuasion. Implementation of the strategy between October 2017 and December 2019 was evaluated for reach, discussion themes, and outcomes using the framework by Murdough (2009). Quantitative indicators included total posts, mean comments, and reactions per post. Posts were qualitatively analyzed with an emergent approach for insights into the discussion. The analysis revealed a total of 9,085 posts, with mean reactions per post of 8.4, mean comments of 7.2, and active viewership by 90% members on average. In terms of post types, photos were the highest (4,779), while videos were the lowest (1,163). Qualitative analysis indicated 54% of the posts were of the theme “inspirational and thought provoking,” while the greatest engagement was generated on the theme “challenges and solution.” The authors conclude that the strategy was successfully implemented to maintain active membership, engage employees in meaningful conversations, and have them express intent to execute quality improvement projects.
f
The median, mean and standard deviation of the number of interactions...
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Burger; Soeradj Kanhai; Alexander Pleijter; Suzan Verberne (2023). The median, mean and standard deviation of the number of interactions (reactions, comments, and shares) per post, for mainstream news and junk news. [Dataset]. http://doi.org/10.1371/journal.pone.0220446.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0220446.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Peter Burger; Soeradj Kanhai; Alexander Pleijter; Suzan Verberne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The median, mean and standard deviation of the number of interactions (reactions, comments, and shares) per post, for mainstream news and junk news.
Kaggle Discussion Votes Per Post Ranking
kaggle.com
Updated Jan 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kostiantyn Isaienkov (2023). Kaggle Discussion Votes Per Post Ranking [Dataset]. https://www.kaggle.com/datasets/isaienkov/kaggle-discussions-votes-per-post-ranking
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kostiantyn Isaienkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Kaggle Discussion Votes Per Post Ranking dataset contains publicly available information related to Discussion section in Kaggle. Each week dataset updates by top 400 users from Discussion.

The unique item in this dataset is VotePerPost ranking that show how many votes per post in average each user has.
Russia: number of online comments 2018, by platform and post type
statista.com
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Russia: number of online comments 2018, by platform and post type [Dataset]. https://www.statista.com/statistics/1028123/russia-number-of-social-media-comments-by-platform-and-post-type/
Explore at:
Dataset updated
Jul 8, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2018
Area covered
Russia
Description
In March 2018, Youtube was the first most commented social media platform on Russian internet in terms of comments to video-containing posts, with each post getting *** commentaries on average. On Instagram, the average number of *** comments was measured under Russian-language posts without videos.
A Decade of Reddit Politics: Comprehensive Dataset on User Political...
zenodo.org
zip
Updated Feb 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentina Pansanella; Valentina Pansanella; Giulio Rossetti; Giulio Rossetti; Virginia Morini; Virginia Morini (2024). A Decade of Reddit Politics: Comprehensive Dataset on User Political Leanings and Interaction Networks (2011-2021) [Dataset]. http://doi.org/10.5281/zenodo.10715427
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10715427
Dataset updated
Feb 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Valentina Pansanella; Valentina Pansanella; Giulio Rossetti; Giulio Rossetti; Virginia Morini; Virginia Morini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data in brief

Small description of the Reddit Politics 10 years dataset (starting from 2011-01).

Political leanings

node_month_leaning_full_headings.zip

The zip file contains a csv where each row identifies a user content (either post or comment) with the following structure

node_id,month_progressive_id,leaning_lable,leaning_score,post_comment

where

node_id: is the user uniq identifier
month_progressive:_id is a numeric value from 0 to 100 identifying the month in which the post/comment has been published
leaning_label: is a discrete variable identifying left/right/moderates (it is based on leaning_score and can be re-binned if needed)
leaning_score: is the continuos score describing the political leaning (range [0,1])
post_comment: a flag P/C to differentiate the submission type

monthly_scores_json.zip

This archive contains 3 json files:
monthly_scores.json: a dictionary month->node_id->{post: [list political leanings], comments: [list political leanings]};
monthly_scores_post_agg.json: a dictionary mont->node_id->political_leaning, where the aggregated score is the average of the interquartile range of the political leaning of the sole users' posts;
monthly_scores_agg.json: a dictionary mont->node_id->political_leaning, where the aggregated score is the weighted(*) average among (i) the mean value of the interquartile range of the political leaning of the users' posts, (ii) the mean value of the interquartile range of the political leaning of the users' comments;

(*) being posts' annotation more reliable than comments' ones we decided to weight the former 10 times the latter when aggregating.

monthly_networks_full.zip

This archive contains all the monthly undirected, unweighted, interaction network (each row identifying an edge among two node ids). The networks cover all users having having a political leanin computed (using *both* posts and comments).

monthly_networks_posts.zip

This archive contains all the monthly undirected, unweighted, interaction network (each row identifying an edge among two node ids). The networks cover all users having having a political leanin computed considering *only* posts.
f
Average frequency per post and content of comments to r/saplings before and...
figshare.com
xls
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hannah Reygaerts; Sidney Smith; Lynette M. Renner; Yumary Ruiz; Laura M. Schwab-Reese (2024). Average frequency per post and content of comments to r/saplings before and during the COVID-19 pandemic. [Dataset]. http://doi.org/10.1371/journal.pone.0304336.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0304336.t003
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Hannah Reygaerts; Sidney Smith; Lynette M. Renner; Yumary Ruiz; Laura M. Schwab-Reese
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Average frequency per post and content of comments to r/saplings before and during the COVID-19 pandemic.
P
Reddit Dataset
paperswithcode.com
opendatalab.com
Updated Jun 9, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William L. Hamilton; Rex Ying; Jure Leskovec (2017). Reddit Dataset [Dataset]. https://paperswithcode.com/dataset/reddit
Explore at:
Dataset updated
Jun 9, 2017
Authors
William L. Hamilton; Rex Ying; Jure Leskovec
Description
The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.
A
‘📸 Most Followed on Instagram’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘📸 Most Followed on Instagram’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-most-followed-on-instagram-0687/b584c59d/?iid=000-706&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘📸 Most Followed on Instagram’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/most-followed-on-instagrame on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Data from Iconsquare.com. The company delivers Instagram analytics to tens of thousands of businesses, agencies and individuals.

BRAND

CATEGORIES 1

CATEGORIES 2

FOLLOWERS

ER - Average engagement rate of all media. Engagement rate is based on the likes and comments received divided by the number of followers at the time of the post.

POSTS ON HASHTAG

MEDIA POSTED

Source: ICONSQUARE

This dataset was created by Social Media Data and contains around 100 samples along with Categories 1, Categories 2, technical information and other features such as: - Followers - Categories 1 - and more.

How to use this dataset

Analyze Categories 2 in relation to Followers

Study the influence of Categories 1 on Categories 2

More datasets

Acknowledgements

If you use this dataset in your research, please credit Social Media Data

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
X/Twitter average engagement on posts 2023-2024, by type of interaction
statista.com
ai-chatbox.pro
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). X/Twitter average engagement on posts 2023-2024, by type of interaction [Dataset]. https://www.statista.com/statistics/1483834/x-twitter-average-engagement-posts-type-interaction/
Explore at:
Dataset updated
Aug 8, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 2023 - Mar 2024
Area covered
Worldwide
Description
In 2024, posts on X (formerly Twitter) generated an average of 31.46 likes, down from 37.82 likes in 2023. Mentions and reposts also declines year-over-year, and in 2024, the average post on the platform received 1.56 likes and 8.47 mentions.
Fake/Authentic User Instagram
kaggle.com
zip
Updated Feb 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristo Radion Purba (2021). Fake/Authentic User Instagram [Dataset]. https://www.kaggle.com/krpurba/fakeauthentic-user-instagram
Explore at:
zip(3451107 bytes)Available download formats
Dataset updated
Feb 11, 2021
Authors
Kristo Radion Purba
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Kindly refer to my paper for more information. Please cite my work if you use my dataset in any work : K. R. Purba, D. Asirvatham and R. K. Murugesan, "Classification of instagram fake users using supervised machine learning algorithms," International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 3, pp. 2763-2772, 2020.

The dataset was collected using web scraping from third-party Instagram websites, to capture their metadata and up to 12 latest media posts from each user. The collection process was executed from September 1st, 2019, until September 20th, 2019. The dataset contains authentic users and fake users, which were filtered using human annotators. The authentic users were taken from followers of 24 private university pages (8 Indonesian, 8 Malaysian, 8 Australian) on Instagram. To reduce the number of users, they are picked using proportional random sampling based on their source university. All private users were removed, which is a total of 31,335 out of 63,795 users (49.11%). The final number of public users used in this research was 32,460 users.

Var name | Feature name | Description pos | Num posts | Number of total posts that the user has ever posted. flg | Num following | Number of following flr | Num followers | Number of followers bl | Biography length | Length (number of characters) of the user's biography pic | Picture availability | Value 0 if the user has no profile picture, or 1 if has lin | Link availability | Value 0 if the user has no external URL, or 1 if has cl | Average caption length | The average number of character of captions in media cz | Caption zero | Percentage (0.0 to 1.0) of captions that has almost zero (<=3) length ni | Non image percentage | Percentage (0.0 to 1.0) of non-image media. There are three types of media on an Instagram post, i.e. image, video, carousel erl | Engagement rate (Like) | Engagement rate (ER) is commonly defined as (num likes) divide by (num media) divide by (num followers) erc | Engagement rate (Comm.) | Similar to ER like, but it is for comments lt | Location tag percentage | Percentage (0.0 to 1.0) of posts tagged with location hc | Average hashtag count | Average number of hashtags used in a post pr | Promotional keywords | Average use of promotional keywords in hashtag, i.e. {regrann, contest, repost, giveaway, mention, share, give away, quiz} fo | Followers keywords | Average use of followers hunter keywords in hashtag, i.e. {follow, like, folback, follback, f4f} cs | Cosine similarity | Average cosine similarity of between all pair of two posts a user has pi | Post interval | Average interval between posts (in hours)

Output : 2-class User classes : r (real/authentic user), f (fake user / bought followers) 4-class User classes : r (authentic/real user), a (active fake user), i (inactive fake user), s (spammer fake user) Note that the 3 fake user classes (a, i, s) were judged by human annotators.
Social media post engagement 2021-2023
statista.com
Updated Sep 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Social media post engagement 2021-2023 [Dataset]. https://www.statista.com/statistics/1274133/engagement-rate-per-post-social-media/
Explore at:
Dataset updated
Sep 27, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2021 - Dec 2023
Area covered
Worldwide
Description
According to a global report conducted between January 2021 and December 2023, TikTok had the highest engagement rate per post when compared to Instagram, Facebook, and X (formerly Twitter). Overall, TikTok's average engagement rate per post was 2.65 percent in 2023. Instagram's engagement rate stood at 0.7 percent in 2023, and Facebook's stood at 0.15 percent.
Github Comments Dataset
kaggle.com
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SAMBHAJI PATIL (2025). Github Comments Dataset [Dataset]. https://www.kaggle.com/datasets/sampat05/github-comments-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SAMBHAJI PATIL
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
To train an effective spam detection model, we first needed a high-quality dataset. However, we quickly realized that there was no existing dataset for GitHub spam detection. So, we had to build our own.

We wrote a Python script that used GitHub's GraphQL API to fetch comments from some of the most active repositories, including:

TensorFlow VSCode Hugging Face OpenCV FreeCodeCamp

These repositories have high engagement, making them ideal for collecting a diverse dataset. However, there was a problem—over 95% of the comments were ham (non-spam), and spam comments were hard to find.

Why Were Spam Comments So Rare?

Even after scraping 70,000+ comments, we found only 500 spam comments. The primary reason was that spam comments are often moderated and deleted quickly by repo maintainers. We found spam only in hidden/minimized comments left by repo owners.

Initially, we tried training a binary classification model using standard spam detection datasets from Kaggle and other sources. However, the model performed poorly on GitHub comments because: Spam on GitHub differs from spam in normal messages.GitHub comments have technical jargon, which differs significantly from typical spammy content.

Our Solution: Mixing GitHub Ham with External Spam, Since GitHub spam was scarce, we trained our model using: Ham comments from GitHub and Spam comments from external datasets

This ensured that the model learned to accurately classify non-spam GitHub comments, minimizing false positives. The final dataset contains 109,662 comments.
g
Quarterly Average Second Hand Property Price by Area | gimi9.com
gimi9.com
Updated Sep 10, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Quarterly Average Second Hand Property Price by Area | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_c02b3b10-39ff-4b7a-83cb-f5c084196e0f
Explore at:
Dataset updated
Sep 10, 2016
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Average house prices are derived from data supplied by the mortgage lending agencies on loans approved by them rather than loans paid. In comparing house prices figures The most current data is published on these sheets. Previously published data may be subject to revision. Any change from the originally published data will be highlighted by a comment on the cell in question. These comments will be maintained for at least a year after the date of the value change. Figure changed on the 27/6/16 as revised data received from the Local authority Prices include houses and apartments measured in €

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Average number of comments for Instagram post types 2020 [Dataset]. https://www.statista.com/statistics/1170479/median-comment-instagram-post-type/

Average number of comments for Instagram post types 2020

Explore at:

Dataset updated

Jul 8, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jan 2017 - Jul 2020

Area covered

Worldwide

Description

A July 2020 study on the photo-sharing social media site Instagram revealed that video posts got the highest number of comments, with a median of **** comments per video post. Additionally, carousel posts received the second-highest median engagement with a median of **** comments per carousel post.

Clear search

Close search

Google apps

Main menu

Average number of comments for Instagram post types 2020

Instagram: engagement on posts 2023-2024, by type of interaction

BlogFeedback Data Set

Source:

Data Set Information:

** Attribute Information:**

** Relevant Papers:**

European Parliament Election 2024: German Candidates Social Media Activities...

Facebook Spam Dataset

CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

Engagement per post of leading sport and fitness influencers on Instagram...

Data_Sheet_1_Implementation and Evaluation of a Social Media-Based...

The median, mean and standard deviation of the number of interactions...

Kaggle Discussion Votes Per Post Ranking

Russia: number of online comments 2018, by platform and post type

A Decade of Reddit Politics: Comprehensive Dataset on User Political...

Data in brief

Political leanings

Average frequency per post and content of comments to r/saplings before and...

Reddit Dataset

‘📸 Most Followed on Instagram’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

X/Twitter average engagement on posts 2023-2024, by type of interaction

Fake/Authentic User Instagram

Social media post engagement 2021-2023

Github Comments Dataset

Quarterly Average Second Hand Property Price by Area | gimi9.com

Average number of comments for Instagram post types 2020See More Versions

Attribute Information:

Relevant Papers:

Average number of comments for Instagram post types 2020