Bluesky experienced rapid user growth in late 2024. The platform's user base expanded from 14.5 million in October to 25 million by December, showcasing its increasing popularity among social media users seeking new options. Surge in downloads and user engagement The platform's growth was particularly notable following the U.S. presidential elections in November 2024, when monthly downloads surged to 7.35 million. This increase in user adoption coincided with rising demand for Twitter alternatives. Earlier in the year, Bluesky had already shown strong performance, with 38,000 downloads from Android devices and 30,000 from iOS devices in July 2024. Moderation challenges and user demographics As Bluesky's user base expanded, so did the need for content moderation. In 2024, the platform received 6.48 million reports to its moderation service, a significant increase from 358,000 reports in 2023. These reports included 1.75 million for anti-social behavior, 1.2 million for misleading content, and 1.4 million for spam.
As of December 2024, 22.21 percent of global Bluesky.com visitors were aged between 25 and 34 years. Additionally, just over one in ten users were aged between 18 and 24 years.
According to a 2024 survey conducted in the United States, daily users of Bluesky were more likely to use YouTube, TikTok, and X daily. Overall, Threads users are more likely to use other Meta Platforms networks every day.
Launched in February 2023, Bluesky is a microblogging platform that was proposed as a Twitter alternative at the beginning of the year. In July 2024, the Bluesky app generated 38,000 thousand app downloads from Android devices worldwide, while iOS devices generated 30,000 thousand downloads during the month. Bluesky is a semi-decentralized social media, and while users can access the platform only after being invited by members, all internet users can visualize Bluesky posts and content as of the last examined period.
Bluesky Social Dataset Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.
This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.
Dataset Here is a description of the dataset files.
followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers u, v, representing a directed following relation (i.e., user u follows user v). posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in 100 files, each containing the full posts of up to 50,000 users. Each post is stored as a JSON-formatted line. interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers, and represents a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author ,quoted_author, and date. graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread. feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author); feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values, namely the feed name, the user id, and the timestamp. feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp; scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.
Citation If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data. (2024) arXiv:2404.18984
Acknowledgments: This work is supported by :
the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu); SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021; EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
As of December 2024, 50.05 percent of visitors to the Bluesky.com website were male. Additionally, as of the end of 2024, the decentralized social platform had a total of 25 million users, up from 14.5 million in October 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets required to reproduce the results presented in the paper "The Rise of Bluesky."
Due to its large size, the dataset used to construct the follower network in Fig. 1c is not included here. However, it may be made available upon request under exceptional circumstances.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects and is filtered where the books is Applying the Rasch model in social sciences using R and BlueSky statistics, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).
According to a 2024 survey conducted in the United States, Bluesky users were more likely than Threads users to say that AI would have at least a somewhat positive impact on their overall quality of life over the next ten years.
The microblogging platform Bluesky, which launched in February 2023, has been suggested as a Twitter substitute. As people look for a suitable replacement for the text-based social media platform, demand for Bluesky is rising. After the U.S. presidential elections in November 2024, monthly Bluesky downloads surged to 7.35 million.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
According to a 2024 survey conducted in the United States, 32 percent of Bluesky users read political websites and blogs daily, compared with 35 percent of Threads users. Additionally, 40 percent of Bluesky users followed politics very closely on a daily basis, compared to 33 percent of Threads users.
According to a 2024 survey conducted in the United States, daily Bluesky users were much more likely than daily Threads users to feel increased levels of stress following the 2024 U.S. election. Overall, 72 percent of Bluesky users said their stress levels increased, compared to 42 percent of Threads users.
As of March 2023, decentralized social media platform Mastodon had over ten million registered users. In November 2022, there were 2.5 million users registered to the online network, an increase of around 300 percent within five months. Additionally, Mastodon, which shares similar micro-blogging features to Twitter, gained roughly 500 thousand users within ten days of Elon Musk's Twitter takeover on October 27th, 2022.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Bluesky experienced rapid user growth in late 2024. The platform's user base expanded from 14.5 million in October to 25 million by December, showcasing its increasing popularity among social media users seeking new options. Surge in downloads and user engagement The platform's growth was particularly notable following the U.S. presidential elections in November 2024, when monthly downloads surged to 7.35 million. This increase in user adoption coincided with rising demand for Twitter alternatives. Earlier in the year, Bluesky had already shown strong performance, with 38,000 downloads from Android devices and 30,000 from iOS devices in July 2024. Moderation challenges and user demographics As Bluesky's user base expanded, so did the need for content moderation. In 2024, the platform received 6.48 million reports to its moderation service, a significant increase from 358,000 reports in 2023. These reports included 1.75 million for anti-social behavior, 1.2 million for misleading content, and 1.4 million for spam.