Bluesky experienced rapid user growth in late 2024. The platform's user base expanded from 14.5 million in October to 25 million by December, showcasing its increasing popularity among social media users seeking new options. Surge in downloads and user engagement The platform's growth was particularly notable following the U.S. presidential elections in November 2024, when monthly downloads surged to 7.35 million. This increase in user adoption coincided with rising demand for Twitter alternatives. Earlier in the year, Bluesky had already shown strong performance, with 38,000 downloads from Android devices and 30,000 from iOS devices in July 2024. Moderation challenges and user demographics As Bluesky's user base expanded, so did the need for content moderation. In 2024, the platform received 6.48 million reports to its moderation service, a significant increase from 358,000 reports in 2023. These reports included 1.75 million for anti-social behavior, 1.2 million for misleading content, and 1.4 million for spam.
According to a 2024 survey conducted in the United States, daily users of Bluesky were more likely to use YouTube, TikTok, and X daily. Overall, Threads users are more likely to use other Meta Platforms networks every day.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets required to reproduce the results presented in the paper "The Rise of Bluesky."
Due to its large size, the dataset used to construct the follower network in Fig. 1c is not included here. However, it may be made available upon request under exceptional circumstances.
https://bsky.social/about/support/toshttps://bsky.social/about/support/tos
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year's Worth of Social Data." PlosOne (2024) https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky Social dataset have the right to opt-out and request the removal of their data, per GDPR provisions (Article 17).
We emphasize that the released data has been thoroughly pseudonymized in compliance with GDPR (Article 4(5)). Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to protect individual privacy further and minimize reidentification risk. Moreover, it should be noted that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides opt-out derogations (Article 17(3)(d) and Article 89).
Nonetheless, if you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with the subject "Removal request: [username]"). We will process your request within a reasonable timeframe - updates will occur monthly, if necessary, and access to previous versions will be restricted.
This work is supported by :
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
POLITISKY24 (Political Stance Analysis on Bluesky for 2024) is a first-of-its-kind dataset for stance detection, focused on the 2024 U.S. presidential election. It designed for target-specific user-level stance detection and contains 16,044 user-target stance pairs centered on two key political figures, Kamala Harris and Donald Trump. In addition, this dataset includes detailed metadata, such as complete user posting histories and engagement graphs (likes, reposts, and quotes).
Stance labels were generated using a robust and evaluated pipeline that integrates state-of-the-art Information Retrieval (IR) techniques with Large Language Models (LLMs), offering confidence scores, reasoning explanations, and text spans for each label. With an LLM-assisted labeling accuracy of 81%, POLITISKY24 provides a rich resource for the target-specific stance detection task. This dataset enables the exploration of Bluesky platform, paving the way for deeper insights into political opinions and social discourse, and addressing gaps left by traditional datasets constrained by platform policies.
In the uploaded files:
The file user_post_history_dataset.parquet
includes the posting history of 8,561 active Bluesky users who have shared content related to American politics.
The file user_post_list_for_stance_detection.parquet
contains a list of up to 1,000 recent English-language post IDs per user, intended for use in the stance detection task.
The file user_network_dataset.parquet
captures users’ interactions through likes, reposts, and quotes.
The file human_annotated_validation_user_stance_dataset.parquet
contains human-annotated stance labels for 445 validation users toward Trump and Harris, resulting in a total of 890 user-target pairs. The labels are divided into three stances: 1 (favor), 2 (against), and 3 (neither).
The file llm_annotated_validation_user_stance_dataset.parquet
contains stance labels annotated by an LLM for the same 445 validation users toward Trump and Harris, also totaling 890 user-target pairs. In addition to stance labels, each pair includes an explanation of the reasoning, the source tweets, spans from the source tweets used in the reasoning, and a confidence score.
The file llm_annotated_full_user_stance_dataset.parquet
is similar to the above LLM-annotated validation file but covers all dataset users excluding the validation set. It provides stance labels for 8,022 users toward Trump and Harris, totaling 16,044 user-target pairs.
The file human_annotated_validation_stance_relevancy_dataset (post-target entity pairs).parquet
contains human-annotated stance labels for 175 validation posts toward Trump and Harris, resulting in 350 post-target pairs. The labels are divided into three stances: 1 (favor), 2 (against), and 3 (neither).
The file human_annotated_validation_stance_relevancy_dataset (query-post stance relevancy pairs).parquet
contains 700 query-post stance relevancy pairs derived from the post-target entity pairs.
As of March 2023, decentralized social media platform Mastodon had over ten million registered users. In November 2022, there were 2.5 million users registered to the online network, an increase of around 300 percent within five months. Additionally, Mastodon, which shares similar micro-blogging features to Twitter, gained roughly 500 thousand users within ten days of Elon Musk's Twitter takeover on October 27th, 2022.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Bluesky experienced rapid user growth in late 2024. The platform's user base expanded from 14.5 million in October to 25 million by December, showcasing its increasing popularity among social media users seeking new options. Surge in downloads and user engagement The platform's growth was particularly notable following the U.S. presidential elections in November 2024, when monthly downloads surged to 7.35 million. This increase in user adoption coincided with rising demand for Twitter alternatives. Earlier in the year, Bluesky had already shown strong performance, with 38,000 downloads from Android devices and 30,000 from iOS devices in July 2024. Moderation challenges and user demographics As Bluesky's user base expanded, so did the need for content moderation. In 2024, the platform received 6.48 million reports to its moderation service, a significant increase from 358,000 reports in 2023. These reports included 1.75 million for anti-social behavior, 1.2 million for misleading content, and 1.4 million for spam.