Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1st Dec 2024. This version of the dataset has been superseeded and is now restricted. Please refer to the most recent release.
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year Worth of Social Data". PlosOne (2024) a https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky dataset have the right to opt out and request the removal of their data, in accordance with GDPR provisions (Article 17). It should be noted, however, that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides derogations (Article 17(3)(d) and Article 89).
We emphasize that, in compliance with GDPR (Article 4(5)), the released data has been thoroughly pseudonymized. Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to further protect individual privacy.
If you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with subject "Removal request: [username]").
We will process your request within a reasonable timeframe.
This work is supported by :
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
2 Million Bluesky Posts
This dataset contains 2 million public posts collected from Bluesky Social's firehose API, intended for machine learning research and experimentation with social media data. The with-language-predictions config contains the same data as the default config but with language predictions added using the glotlid model. Dataset Details Dataset Description This dataset consists of 2 million public posts from Bluesky Social, collected through the platform's firehose… See the full description on the dataset page: https://huggingface.co/datasets/alpindale/two-million-bluesky-posts.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
8 Million Bluesky Social Posts Collection
I've collected and curated 8 million public posts from Bluesky Social between November 27 - December 1, 2024, with an additional 12 million posts coming in the upcoming weeks. This growing dataset aims to provide researchers and developers with a comprehensive sample of real world social media data for analysis and experimentation. This collection represents one of the largest publicly available Bluesky datasets, offering unique insights… See the full description on the dataset page: https://huggingface.co/datasets/withalim/bluesky-posts.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Five Million bluesky posts
This dataset contains 5 million public posts collected from Bluesky Social's firehose API, intended for machine learning research and experimentation with social media data. This dataset was inspired by the Alpindales original 2 million posts dataset, this dataset expands on that dataset with much more data. Alpins dataset did not get author handles or image urls & metadata that was included in the posts. The images and their captions could potenically… See the full description on the dataset page: https://huggingface.co/datasets/Roronotalt/bluesky.
Traffic analytics, rankings, and competitive metrics for bluesky.social as of June 2025
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
including user-to-user interactions (following and blocking users) and user-to-community interactions (creating and joining communities).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets required to reproduce the results presented in the paper "The Rise of Bluesky."
Due to its large size, the dataset used to construct the follower network in Fig. 1c is not included here. However, it may be made available upon request under exceptional circumstances.
Bluesky experienced rapid user growth in late 2024. The platform's user base expanded from 14.5 million in October 2024 to 38 million by August 2025, showcasing its increasing popularity among social media users seeking new options. Surge in downloads and user engagement The platform's growth was particularly notable following the U.S. presidential elections in November 2024, when monthly downloads surged to 7.35 million. This increase in user adoption coincided with rising demand for Twitter alternatives. Earlier in the year, Bluesky had already shown strong performance, with 38,000 downloads from Android devices and 30,000 from iOS devices in July 2024. Moderation challenges and user demographics As Bluesky's user base expanded, so did the need for content moderation. In 2024, the platform received 6.48 million reports to its moderation service, a significant increase from 358,000 reports in 2023. These reports included 1.75 million for anti-social behavior, 1.2 million for misleading content, and 1.4 million for spam.
Bluesky saw a significant increase in user reports to its moderation service in 2024. The number of reports jumped from 358,000 in 2023 to 6.48 million in 2024, indicating a growing user base and increased platform activity. This surge in moderation reports coincided with a spike in monthly downloads, particularly after the U.S. presidential elections in November 2024, when Bluesky downloads reached 7.35 million. Breakdown of moderation reports The 6.48 million reports submitted to Bluesky's moderation service in 2024 covered various issues. Anti-social behavior accounted for 1.75 million reports, while misleading content and spam received 1.2 million and 1.4 million reports, respectively. These figures suggest that users actively engaged in flagging content that violated platform guidelines. Additionally, Bluesky received 238 requests from law enforcement, governments, and legal entities, responding to 182 of them. The most common legal requests were for user data, followed by takedown requests and inquiries. Comparison with other platforms While Bluesky experienced growth in user reports, other social media platforms like Facebook saw fluctuations in content moderation. In the third quarter of 2024, Facebook removed 6.4 million pieces of hate speech content, down from 7.2 million in the previous quarter. Similarly, Facebook took action on 7.6 million pieces of bullying and harassment related content in the same period, a slight decrease from 7.8 million in the previous quarter. These comparisons highlight the ongoing challenges social media platforms face in content moderation and user safety.
This dataset provides information about the number of properties, residents, and average property values for Blue Sky cross streets in Springville, IN.
Bluesky scenarios and result files used in the work "Velocity Obstacle Based Conflict Avoidance in Urban Environment with Variable Speed Limit". The scenario files can be used with the Bluesky simulator tool implementation found at https://github.com/TUDelft-CNS-ATM/bluesky. The result files exhibit the results obtained with the previous tool.
Bluesky scenarios and result files used in the work "Review of Conflict Resolution Methods for Manned and Unmanned Aviation". The scenario files can be used with the Bluesky simulator tool implementation found at https://github.com/TUDelft-CNS-ATM/bluesky. The result files exhibit the results obtained with the previous tool.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
bluesky.com is ranked #67329 in US with 119.79K Traffic. Categories: Retail. Learn more about website traffic, market share, and more!
This dataset provides information about the number of properties, residents, and average property values for Blue Sky Drive cross streets in Jeffersonville, KY.
This data set contains the output and products from the daily runs of the United States Forest Service (USFS) BlueSky modeling framework for the WE-CAN and BB-FLUX field projects. The runs for WE-CAN utilized the University of Washington 4-km resolution WRF model over the Pacific Northwest. The data set includes the model output in NetCDF format, KMZ files for display of the PM2.5 column average forecasts, KMZ files of the active fire locations, and PM2.5 forecast imagery all contained within daily gzipped tar files.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
This dataset provides information about the number of properties, residents, and average property values for Blue Sky Loop cross streets in Jeffersonville, IN.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Nightsky 40M Dataset
~40 million posts from the Bluesky Firehose API, reasonably anonymized. Licensed under CC0 and completely independently sourced to avoid licensing issues. Use it as you wish! Very little preprocessing.
Request data deletion
A user may request removal of their data by e-mailing nightsky-rm@proton.me with a subject line of "Delete My Data".As I don't collect usernames/DIDs, you must specify the position of every individual row you would like to be… See the full description on the dataset page: https://huggingface.co/datasets/Aranym/40-million-bluesky-posts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1st Dec 2024. This version of the dataset has been superseeded and is now restricted. Please refer to the most recent release.
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year Worth of Social Data". PlosOne (2024) a https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky dataset have the right to opt out and request the removal of their data, in accordance with GDPR provisions (Article 17). It should be noted, however, that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides derogations (Article 17(3)(d) and Article 89).
We emphasize that, in compliance with GDPR (Article 4(5)), the released data has been thoroughly pseudonymized. Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to further protect individual privacy.
If you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with subject "Removal request: [username]").
We will process your request within a reasonable timeframe.
This work is supported by :