Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset released with the paper titled: "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board".
The dataset is a single Newline delimited JSON file. Each line in the file consists of a JSON object which is a full 4chan /pol/ thread. The JSON objects contain all the key/values returned by the 4chan API, along with three additional keys (entities, perspectives, and extracted_poster_id).
For each JSON object we complement the data with the list of the named entities we detect for each post, using the spaCy Python library. In addition, for each post we add scores returned by the Google’s Perspective API, and more specifically seven scores in the [0; 1] interval.
For the detailed description of every key in the JSON structure, along with the type of the value, please read the readme.pdf file provided with this dataset.
If you find our dataset useful, please cite our paper:
@article{papasavva2020raiders, title={Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board}, author={Antonis Papasavva, Savvas Zannettou, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn}, journal={14th International AAAI Conference On Web And Social Media (ICWSM), 2020}, year={2020} }
How to extract the data:
Note that the data is compressed. See the instructions below on how to extract the data:
Step 1: Open a terminal window and navigate to the path where the file pol_0616-1119_labeled.tar.zst is located.
Step2: Run the following command:
unzstd pol_0616-1119_labeled.tar.zst
The above command will result in a file named pol_0616-1119_labeled.tar. (in the same directory)
Step 3: Again, from your terminal window, run this command:
tar -xvf pol_0616-1119_labeled.tar
When the above command finishes, you will get (in the same directory) the extracted data - a file named pol_062016-112019_labeled.ndjson.
There are many applications that can be used to extract this data on Windows available online. The authors cannot recommend specific applications. Note that the file is compressed twice so you will need to perform the data extraction twice - once on the downloaded file, and once on the file that was extracted from the downloaded file.
Please do not hesitate to contact the author of this study in case you face any problem at: antonis.papasavva@ucl.ac.uk
https://bsky.social/about/support/toshttps://bsky.social/about/support/tos
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year's Worth of Social Data." PlosOne (2024) https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky Social dataset have the right to opt-out and request the removal of their data, per GDPR provisions (Article 17).
We emphasize that the released data has been thoroughly pseudonymized in compliance with GDPR (Article 4(5)). Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to protect individual privacy further and minimize reidentification risk. Moreover, it should be noted that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides opt-out derogations (Article 17(3)(d) and Article 89).
Nonetheless, if you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with the subject "Removal request: [username]"). We will process your request within a reasonable timeframe - updates will occur monthly, if necessary, and access to previous versions will be restricted.
This work is supported by :
https://www.bco-dmo.org/dataset/773623/licensehttps://www.bco-dmo.org/dataset/773623/license
These data were used in a structural analysis study to evaluate how pCO2 and an additional stressor, elevated temperature, influences byssal thread quality and production. Mussels (M. trossulus) were collected in May 2012 from Argyle Creek, San Juan Island, WA (48.52\u02da N, 123.01\u02da W) and held in a mesh box submerged under the dock at Friday Harbor Laboratories (FHL), San Juan Island, WA for up to 14 d. Mussels were placed in controlled temperature and pCO2 treatments in the Ocean Acidification Experimental Laboratory (OAEL), then newly produced threads were counted and pulled to failure to determine byssus strength. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv acquisition_description=Mussels (M. trossulus) were collected in May 2012 from Argyle Creek, San Juan Island, WA (48.52\u02da N, 123.01\u02da W) and held in a mesh box submerged under the dock at Friday Harbor Laboratories (FHL), San Juan Island, WA for up to 14 d. Mussels were placed in experimental mesocosms in the Ocean Acidification Experimental Laboratory (OAEL) at FHL as described in O\u2019Donnell et al. (2013) and Timmins-Schiffman et al. (2012). Briefly, manipulations of pH were made by bubbling CO2 into a 150 L temperature- controlled seawater reservoir, that supplied water to eight 3.5 L chambers at a turnover rate of 50 ml min-1. Air was bubbled into the reservoir to maintain 100% oxygen saturation and submersible pumps (model number P396, Annex Depot, Sacramento, CA) provided mixing in the chambers at 3.8 L min-1. The bottom of each chamber was lined with autoclaved pebbles, collected from an FHL beach, to provide a substrate for byssal thread attachment. pH and temperature were monitored continuously in each water reservoir with a Durafet pH and temperature probe and the full carbonate chemistry of the system evaluated with DIC and Total alkalinity measurements once during each trial. Mussels were acclimated to their treatment temperatures in ambient pH (~7.8) over 9 d, ramping temperature up no more than 2\u02daC per day, and fed a maintenance level of Shellfish Diet 1800 (6 g l-1 day-1, Reed Mariculture, Campell, CA, USA).
The twelve independent temperature x pCO2 treatments spanned the range of local marine conditions (Newcomb, 2015; George et al., 2019; temperature at 10\u02daC, 18\u02daC, or 25\u02daC and pCO2 at 400, 750, 1200, or 2500 \u00b5atm). Each mussel was trimmed of external byssus before placement in an experimental treatment for 3 d, sufficient time to produce new mature byssal threads (Bell & Gosline 1996) while minimizing the effect of treatment on mussel condition. Mussels were starved during the 3 d trials to minimize changes in chamber water chemistry due to food addition and to reduce fouling. Three trials were conducted in succession to replicate treatments over time, increasing sample size (n=8 x 3) for each temperature*pCO2 treatment.
At the end of each trial, mussels and the rocks to which they had attached with byssal threads were removed from the chambers. The entire byssus was dissected from each mussel and stored air-dried for up to 20 days. Byssus was rehydrated in seawater prior to testing, a method that does not alter the mechanical properties of the byssal threads (Brazee, 2004). The number of byssal threads each mussel produced was counted, and one thread was haphazardly chosen for mechanical testing following the procedure of Bell & Gosline (1996). Briefly, an individual thread was clamped with submersible pneumatic grips on either end by holding the proximal byssal stem between cardstock with cyanoacrylate glue and affixing the distal plaque with attached rock to an aluminum T-bar with epoxy. An Instron 5565 tensometer (Norwood MA, USA), extended the thread at a rate of 10 mm min-1 in a temperature-controlled water bath (3130-100 BioPuls Bath, Instron, Norwood, MA, USA) until failure. The tensometer measured force (\u00b110-3 N) and extension (\u00b1 10-3 mm) at 10 Hz. Tests were performed in seawater with a pH of 7.8 and the relevant treatment temperature.
Pull to failure mechanical tests provided estimates of thread breaking force, yield force, extensibility, initial stiffness and failure location (Bell & Gosline 1996). Yield, due to quasi-plastic deformation in the distal region, was identified as the point where the initial slope of the force-extension curve decreased by 40%. Extensibility was calculated by dividing thread extension at failure by initial length and initial stiffness was determined from the initial slope of the force extension curve. The location of failure (proximal, plaque, and/or distal region) was noted and threads were retested to quantify the breaking force of each remaining region. Tests that broke at the grips were considered underestimates and were discarded.
The cross-sectional area of the proximal region was measured to evaluate morphological differences among treatments. The elliptical area was estimated from measures of the major and minor axes (+ 1 um using a dissecting microscope (Brazee & Carrington 2006). Proximal breaking stress (N mm-2), a material property, was calculated as proximal breaking force divided by proximal area. Thread surface structure was examined using a scanning electron microscope (FEI Sirion XL30 SEM, Hillsboro, OR).
Whole mussel attachment strength was estimated using two mathematical models developed by Bell & Gosline (1996). Each model assumes a mussel is anchored with a constant thread number (n=50) arranged in a circle. The normal model estimates dislodgment force perpendicular to the substrate (e.g, lift); all threads are engaged and extend until they reach their maximum force. The parallel model estimates dislodgement force for an animal pulled parallel to the substrate (e.g., drag); threads on the upstream side are the first in tension, yield and extend until they reach maximum force and break, while more threads are recruited into tension until they have all broken. Additionally, we modified each model to incorporate the variation in thread production across treatments. Because thread production was measured for only three days, treatment means were scaled to a maximum value of 50 threads.
Detailed methods and results are provided in Newcomb, 2015 and Newcomb et al., 2019
Location: Friday Harbor Laboratories, Friday Harbor WA awards_0_award_nid=55120 awards_0_award_number=OCE-1041213 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward?AWD_ID=1041213 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=Mary Beth Saffo awards_0_program_manager_nid=51608 cdm_data_type=Other comment=Thread number PI: Emily Carrington Data Version 1: 2019-07-24 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/bco-dmo.773623.1 infoUrl=https://www.bco-dmo.org/dataset/773623 institution=BCO-DMO instruments_0_acronym=Materials Testing System instruments_0_dataset_instrument_description=Instron’s (Norwood, MA) electromechanical testing systems are used to test a wide range of materials in tension or compression. The series 5560 are dual column table top models, the 5565 model has a load capacity of 5 kN (1125 lbf). instruments_0_dataset_instrument_nid=773637 instruments_0_description=Testing systems that are used to test a wide range of materials in tension or compression. instruments_0_instrument_name=Materials Testing System instruments_0_instrument_nid=718 instruments_0_supplied_name=Instron 5565 load frame (Norwood, MA) keywords_vocabulary=GCMD Science Keywords metadata_source=https://www.bco-dmo.org/api/dataset/773623 param_mapping={'773623': {}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/773623/parameters people_0_affiliation=University of Washington people_0_affiliation_acronym=UW people_0_person_name=Emily Carrington people_0_person_nid=51609 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Woods Hole Oceanographic Institution people_1_affiliation_acronym=WHOI BCO-DMO people_1_person_name=Karen Soenen people_1_person_nid=748773 people_1_role=BCO-DMO Data Manager people_1_role_type=related project=OA - Ecomaterials Perspective projects_0_acronym=OA - Ecomaterials Perspective projects_0_description=Effects of Ocean Acidification on Coastal Organisms: An Ecomaterials Perspective This award will support researchers based at the University of Washington's Friday Harbor Laboratories. The overall focus of the project is to determine how ocean acidification affects the integrity of biomaterials and how these effects in turn alter interactions among members of marine communities. The research plan emphasizes an ecomaterial approach; a team of biomaterials and ecomechanics experts will apply their unique perspective to detail how different combinations of environmental conditions affect the structural integrity and ecological performance of organisms. The study targets a diversity of ecologically important taxa, including bivalves, snails, crustaceans, and seaweeds, thereby providing insight into the range of possible biological responses to future changes in climate conditions. The proposal will enhance our understanding of the ecological consequences of climate change, a significant societal problem. Each of the study systems has broader impacts in fields beyond ecomechanics. Engineers are particularly interested in biomaterials and in each system there are materials with commercial potential. The project will integrate research and education by supporting doctoral student dissertation research, providing undergraduate research opportunities via three training programs at FHL, and summer internships for talented high school students, recruited from the FHL Science Outreach Program. The participation of underrepresented groups will be broadened by actively recruiting URM and female students.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset released with the paper titled: "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board".
The dataset is a single Newline delimited JSON file. Each line in the file consists of a JSON object which is a full 4chan /pol/ thread. The JSON objects contain all the key/values returned by the 4chan API, along with three additional keys (entities, perspectives, and extracted_poster_id).
For each JSON object we complement the data with the list of the named entities we detect for each post, using the spaCy Python library. In addition, for each post we add scores returned by the Google’s Perspective API, and more specifically seven scores in the [0; 1] interval.
For the detailed description of every key in the JSON structure, along with the type of the value, please read the readme.pdf file provided with this dataset.
If you find our dataset useful, please cite our paper:
@article{papasavva2020raiders, title={Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board}, author={Antonis Papasavva, Savvas Zannettou, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn}, journal={14th International AAAI Conference On Web And Social Media (ICWSM), 2020}, year={2020} }
How to extract the data:
Note that the data is compressed. See the instructions below on how to extract the data:
Step 1: Open a terminal window and navigate to the path where the file pol_0616-1119_labeled.tar.zst is located.
Step2: Run the following command:
unzstd pol_0616-1119_labeled.tar.zst
The above command will result in a file named pol_0616-1119_labeled.tar. (in the same directory)
Step 3: Again, from your terminal window, run this command:
tar -xvf pol_0616-1119_labeled.tar
When the above command finishes, you will get (in the same directory) the extracted data - a file named pol_062016-112019_labeled.ndjson.
There are many applications that can be used to extract this data on Windows available online. The authors cannot recommend specific applications. Note that the file is compressed twice so you will need to perform the data extraction twice - once on the downloaded file, and once on the file that was extracted from the downloaded file.
Please do not hesitate to contact the author of this study in case you face any problem at: antonis.papasavva@ucl.ac.uk