100+ datasets found

Social Media Datasets
brightdata.com
.json, .csv, .xlsx
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2022). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Sep 7, 2022
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

Dataset Features

User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

Popular Use Cases

Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
u
Social Media and Mental Health - Dataset - BSOS Data Repository
bsos-data.umd.edu
Updated Jul 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Social Media and Mental Health - Dataset - BSOS Data Repository [Dataset]. https://bsos-data.umd.edu/dataset/social-media-and-mental-health
Explore at:
Dataset updated
Jul 24, 2024
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
The dataset encompasses demographic, health, and mental health information of students from 48 different states in the USA, born between 1971 and 2003. It includes data on general health ratings, responses to the PHQ-9 depression screening tool, and the GAD-7 anxiety assessment tool. It details how often students experienced various mental health symptoms over the past two weeks, their depression severity scores, and anxiety severity scores. Also, it covers experiences of feeling overwhelmed, exhausted, and hopeless within the last 12 months, along with diagnoses of depression, therapy, and medication usage. The dataset also includes information on various medical conditions, student status (full-time or international), sex, and race.
Facebook users worldwide 2017-2027
statista.com
de.statista.com
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Twitter Dataset
brightdata.com
.json, .csv, .xlsx
Updated Oct 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Oct 18, 2025
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
Social Media Influencers Dataset
figshare.com
bin
Updated Jun 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esther Leander (2023). Social Media Influencers Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.23576037.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23576037.v1
Dataset updated
Jun 25, 2023
Dataset provided by
figshare
Authors
Esther Leander
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data was used in a study to determine the role of social media influencers in shaping consumer behaviour for beauty products in the US market.
h
marketing_social_media
huggingface.co
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Montanez (2024). marketing_social_media [Dataset]. https://huggingface.co/datasets/RafaM97/marketing_social_media
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Authors
Rafael Montanez
Description
Marketing Campaigns Dataset

This repository contains a dataset specifically designed for generating marketing content. The dataset includes various features that are crucial for crafting effective marketing strategies, such as industry, channel, objective, and more. This dataset is ideal for use in machine learning models, AI-powered marketing tools, and data-driven marketing analyses.

Dataset Overview

The dataset consists of multiple entries, each representing a specific… See the full description on the dataset page: https://huggingface.co/datasets/RafaM97/marketing_social_media.
Developer Community and Code Datasets
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs, Developer Community and Code Datasets [Dataset]. https://datarade.ai/data-products/developer-community-and-code-datasets-oxylabs
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset authored and provided by
Oxylabs
Area covered
El Salvador, Tuvalu, Philippines, Bahamas, Marshall Islands, South Sudan, Djibouti, United Kingdom, Guyana, Saint Pierre and Miquelon
Description
Unlock the power of ready-to-use data sourced from developer communities and repositories with Developer Community and Code Datasets.

Data Sources:

GitHub: Access comprehensive data about GitHub repositories, developer profiles, contributions, issues, social interactions, and more.

StackShare: Receive information about companies, their technology stacks, reviews, tools, services, trends, and more.

DockerHub: Dive into data from container images, repositories, developer profiles, contributions, usage statistics, and more.

Developer Community and Code Datasets are a treasure trove of public data points gathered from tech communities and code repositories across the web.

With our datasets, you'll receive:

Usernames;

Companies;

Locations;

Job Titles;

Follower Counts;

Contact Details;

Employability Statuses;

And More.

Choose from various output formats, storage options, and delivery frequencies:

Get datasets in CSV, JSON, or other preferred formats.

Opt for data delivery via SFTP or directly to your cloud storage, such as AWS S3.

Receive datasets either once or as per your agreed-upon schedule.

Why choose our Datasets?

Fresh and accurate data: Access complete, clean, and structured data from scraping professionals, ensuring the highest quality.

Time and resource savings: Let us handle data extraction and processing cost-effectively, freeing your resources for strategic tasks.

Customized solutions: Share your unique data needs, and we'll tailor our data harvesting approach to fit your requirements perfectly.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is trusted by Fortune 500 companies and adheres to GDPR and CCPA standards.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Empower your data-driven decisions with Oxylabs Developer Community and Code Datasets!

Number of global social network users 2017-2028

statista.com
es.statista.com
+1more

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

Instagram Dataset
brightdata.com
.json, .csv, .xlsx
Updated Apr 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2022). Instagram Dataset [Dataset]. https://brightdata.com/products/datasets/instagram
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Apr 26, 2022
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Access detailed insights with our Instagram datasets, featuring follower counts, verified status, account types, and engagement scores. Explore post information including URLs, descriptions, hashtags, comments, likes, media, posting dates, locations, and reel URLs. Perfect for understanding user engagement and content trends to drive informed decisions and optimize your social media strategies. Over 750M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

Account Fbid Id Followers Posts Count Is Business Account Is Professional Account Is Verified Avg Engagement External Url Biography Business Category Name Category Name Post Hashtags Following Posts Profile Image Link Profile URL Profile Name Highlights Count Highlights Full Name Is Private Bio Hashtags URL Is Joined Recently And much more
Cheltenham's Facebook Groups
kaggle.com
zip
Updated Apr 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Chirico (2018). Cheltenham's Facebook Groups [Dataset]. https://www.kaggle.com/datasets/mchirico/cheltenham-s-facebook-group
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 2, 2018
Authors
Mike Chirico
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Facebook is becoming an essential tool for more than just family and friends. Discover how Cheltenham Township (USA), a diverse community just outside of Philadelphia, deals with major issues such as the Bill Cosby trial, everyday traffic issues, sewer I/I problems and lost cats and dogs. And yes, theft.

Communities work when they're connected and exchanging information. What and who are the essential forces making a positive impact, and when and how do conversational threads get directed or misdirected?

Use Any Facebook Public Group

You can leverage the examples here for any public Facebook group. For an example of the source code used to collect this data, and a quick start docker image, take a look at the following project: facebook-group-scrape.

Data Sources

There are 4 csv files in the dataset, with data from the following 5 public Facebook groups:

Unofficial Cheltenham Township

Elkins Park Happenings!

Free Speech Zone

Cheltenham Lateral Solutions

Cheltenham Township Residents

post.csv

These are the main posts you will see on the page. It might help to take a quick look at the page. Commas in the msg field have been replaced with {COMMA}, and apostrophes have been replaced with {APOST}.

gid Group id (5 different Facebook groups)

pid Main Post id

id Id of the user posting

name User's name

timeStamp

shares

url

msg Text of the message posted.

likes Number of likes

comment.csv

These are comments to the main post. Note, Facebook postings have comments, and comments on comments.

gid Group id

pid Matches Main Post identifier in post.csv

cid Comment Id.

timeStamp

id Id of user commenting

name Name of user commenting

rid Id of user responding to first comment

msg Message

like.csv

These are likes and responses. The two keys in this file (pid,cid) will join to post and comment respectively.

gid Group id

pid Matches Main Post identifier in post.csv

cid Matches Comments id.

response Response such as LIKE, ANGRY etc.

id The id of user responding

name Name of the user responding

member.csv

These are all the members in the group. Some members never, or rarely, post or comment. You may find multiple entries in this table for the same person. The name of the individual never changes, but they change their profile picture. Each profile picture change is captured in this table. Facebook gives users a new id in this table when they change their profile picture.

gid Group id

id Id of the member

name Name of the member

url URL of the member
f
The 42 datasets and their test and training datasets
figshare.com
application/x-rar
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiltan Bitirim (2025). The 42 datasets and their test and training datasets [Dataset]. http://doi.org/10.6084/m9.figshare.26135284.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26135284.v1
Dataset updated
Jul 1, 2025
Dataset provided by
figshare
Authors
Yiltan Bitirim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
***************************************************************************The 42 datasets and their test and training datasets.***************************************************************************For detailed information, you may read the following article.XXXXXXXXX, "Turkish-Tweet-Based Emoji Recommendation for the Top-N Emojis", Journal Name, Vol. XX, Issue. XX, pp. XX-XX, Date.***************************************************************************If you want to use all or a part of these datasets, you are free to use. However, please consider the following.Copyright belongs to the author.Do not redistribute all or a part of these datasets.These datasets come without any warranty. The author is not responsible for any damage caused.All studies that include all or a part of these datasets should cite the following article:XXXXXXXXXX, "Turkish-Tweet-Based Emoji Recommendation for the Top-N Emojis", Journal Name, Vol. XX, Issue. XX, pp. XX-XX, Date.***************************************************************************
u
Social Recommendation Data
cseweb.ucsd.edu
berd-platform.de
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Social Recommendation Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets include ratings as well as social (or trust) relationships between users. Data are from LibraryThing (a book review website) and epinions (general consumer reviews).

Metadata includes

reviews

price paid (epinions)

helpfulness votes (librarything)

flags (librarything)
d
Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...
datarade.ai
.json, .csv
Updated Aug 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataplex (2024). Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-global-social-media-data-1-1m-mill-dataplex
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 12, 2024
Dataset authored and provided by
Dataplex
Area covered
Chile, Côte d'Ivoire, Martinique, Jersey, Botswana, Macao, Christmas Island, Holy See, Gambia, Mexico
Description
The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

Dataset Overview:

This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

Sourced Directly from Reddit:

All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

Key Features:

Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.

User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.

Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.

AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

Use Cases:

Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.

Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.

Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.

Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

Data Quality and Reliability:

The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

Integration and Usability:

The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

User-Friendly Structure and Metadata:

The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

Ideal For:

Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.

Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.

Researchers: Explore the social dynamics of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...
g
Data from: Instagram Posts Dataset
gts.ai
json
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Globose Technology Solutions Private Limited (2024). Instagram Posts Dataset [Dataset]. https://gts.ai/dataset-download/instagram-posts-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Globose Technology Solutions Private Limited
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Engagement metrics, Content performance, Audience demographics
Description
A dataset of 1968 Instagram posts totaling 5,426 images, including images, captions, and metadata for AI and computer vision applications.
Global social network penetration 2019-2028
statista.com
es.statista.com
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Global social network penetration 2019-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global social media penetration rate in was forecast to continuously increase between 2024 and 2028 by in total 11.6 (+18.19 percent). After the ninth consecutive increasing year, the penetration rate is estimated to reach 75.31 and therefore a new peak in 2028. Notably, the social media penetration rate of was continuously increasing over the past years.
u
Steam Video Game and Bundle Data
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Steam Video Game and Bundle Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.

Metadata includes

reviews

purchases, plays, recommends (likes)

product bundles

pricing information

Basic Statistics:

Reviews: 7,793,069

Users: 2,567,538

Items: 15,474

Bundles: 615
g
COVID-19 Social Media Counts & Sentiment
covid-hub.gio.georgia.gov
Updated Apr 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
foustl32 (2020). COVID-19 Social Media Counts & Sentiment [Dataset]. https://covid-hub.gio.georgia.gov/items/feb6280d42de4e91b47cf37344a91eae
Explore at:
Dataset updated
Apr 6, 2020
Dataset authored and provided by
foustl32
Area covered

Description
Update: As of August 26th, 2020 we are sunsetting updates to this free dataset. Please reach out to lyden@spatial.ai if you have interest in this data, Geosocial data, or other related datasets. As part of an effort to provide open source resources and data related to the COVID-19 outbreak, this feature layer includes counts of social media posts aggregated at the county that mention COVID-19. This data is provided historically week over week as far back January 26th, 2020. This feature service will be refreshed regularly to remain up to date. It was most recently updated using data collected through August 24th. Data also includes information about the sentiment of posts collected. Posts are classified as negative, neutral, or positive and aggregated at a county level per week. To perform sentiment analysis, the VADER (Valence Aware Dictionary and sEntiment Reasoner) model was used. This feature service was developed in collaboration between Datastory & Spatial.ai. There's a powerful story hidden in your data... Datastory can help you see it. Visit www.datastoryconsulting.com to learn more. Social media counts and statistics come from Twitter data collected by Spatial.ai for the creation of Geosocial data, which uses machine learning to create geographic social media segmentation. Learn more about the underlying data at https://spatial.ai/esri or reach out to lyden@spatial.ai for more information.
u
Google Restaurants dataset
cseweb.ucsd.edu
csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Google Restaurants dataset [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
csvAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This is a mutli-modal dataset for restaurants from Google Local (Google Maps). Data includes images and reviews posted by users, as well as metadata for each restaurant.
f
Navigating News Narratives: A Media Bias Analysis Dataset
figshare.com
txt
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24422122.v4
Dataset updated
Dec 8, 2023
Dataset provided by
figshare
Authors
Shaina Raza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0
The 64 datasets and their test and training datasets
figshare.com
zip
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiltan Bitirim (2025). The 64 datasets and their test and training datasets [Dataset]. http://doi.org/10.6084/m9.figshare.29445437.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29445437.v1
Dataset updated
Jul 1, 2025
Dataset provided by
figshare
Authors
Yiltan Bitirim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
***************************************************************************The 64 datasets and their test and training datasets.***************************************************************************For detailed information, you may read the following article.XXXXXXXXX, "The Impact of Word-Length and Tweet-Length on Emoji Recommendation for Short Turkish Texts", Journal Name, Vol. XX, Issue. XX, pp. XX-XX, Date.***************************************************************************If you want to use all or a part of these datasets, you are free to use. However, please consider the following.Copyright belongs to the author.Do not redistribute all or a part of these datasets.These datasets come without any warranty. The author is not responsible for any damage caused.All studies that include all or a part of these datasets should cite the following article:XXXXXXXXXX, "The Impact of Word-Length and Tweet-Length on Emoji Recommendation for Short Turkish Texts", Journal Name, Vol. XX, Issue. XX, pp. XX-XX, Date.***************************************************************************

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2022). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media

Social Media Datasets

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Sep 7, 2022

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

Dataset Features

User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

Popular Use Cases

Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

Clear search

Close search

Google apps

Main menu

Social Media Datasets

Social Media and Mental Health - Dataset - BSOS Data Repository

Facebook users worldwide 2017-2027

Twitter Dataset

Social Media Influencers Dataset

marketing_social_media

Developer Community and Code Datasets

Number of global social network users 2017-2028

Instagram Dataset

Cheltenham's Facebook Groups

The 42 datasets and their test and training datasets

Social Recommendation Data

Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...

Data from: Instagram Posts Dataset

Global social network penetration 2019-2028

Steam Video Game and Bundle Data

COVID-19 Social Media Counts & Sentiment

Google Restaurants dataset

Navigating News Narratives: A Media Bias Analysis Dataset

The 64 datasets and their test and training datasets

Social Media Datasets