3 datasets found
  1. Dataset: Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from...

    • zenodo.org
    bin, pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn; Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn (2024). Dataset: Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board [Dataset]. http://doi.org/10.5281/zenodo.3606810
    Explore at:
    bin, pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn; Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset released with the paper titled: "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board".

    The dataset is a single Newline delimited JSON file. Each line in the file consists of a JSON object which is a full 4chan /pol/ thread. The JSON objects contain all the key/values returned by the 4chan API, along with three additional keys (entities, perspectives, and extracted_poster_id).

    For each JSON object we complement the data with the list of the named entities we detect for each post, using the spaCy Python library. In addition, for each post we add scores returned by the Google’s Perspective API, and more specifically seven scores in the [0; 1] interval.

    For the detailed description of every key in the JSON structure, along with the type of the value, please read the readme.pdf file provided with this dataset.

    If you find our dataset useful, please cite our paper:

    @article{papasavva2020raiders,
     title={Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board},
     author={Antonis Papasavva, Savvas Zannettou, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn},
     journal={14th International AAAI Conference On Web And Social Media (ICWSM), 2020},
     year={2020} 
    }

    How to extract the data:

    Note that the data is compressed. See the instructions below on how to extract the data:

    • Linux and Mac

    Step 1: Open a terminal window and navigate to the path where the file pol_0616-1119_labeled.tar.zst is located.

    Step2: Run the following command:

    unzstd pol_0616-1119_labeled.tar.zst

    The above command will result in a file named pol_0616-1119_labeled.tar. (in the same directory)

    Step 3: Again, from your terminal window, run this command:

    tar -xvf pol_0616-1119_labeled.tar

    When the above command finishes, you will get (in the same directory) the extracted data - a file named pol_062016-112019_labeled.ndjson.

    • Windows

    There are many applications that can be used to extract this data on Windows available online. The authors cannot recommend specific applications. Note that the file is compressed twice so you will need to perform the data extraction twice - once on the downloaded file, and once on the file that was extracted from the downloaded file.

    Please do not hesitate to contact the author of this study in case you face any problem at: antonis.papasavva@ucl.ac.uk

  2. Bluesky Social Dataset

    • zenodo.org
    application/gzip, csv
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Failla; Andrea Failla; Giulio Rossetti; Giulio Rossetti (2025). Bluesky Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.14669616
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrea Failla; Andrea Failla; Giulio Rossetti; Giulio Rossetti
    License

    https://bsky.social/about/support/toshttps://bsky.social/about/support/tos

    Description

    Bluesky Social Dataset

    Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue.

    The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.

    Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.

    Dataset

    Here is a description of the dataset files.

    • followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers representing a directed following relation (i.e., user u follows user v).
    • user_posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in a collection of files, each containing the post of an anonymized user. Each post is stored as a JSON-formatted line.
    • interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers representing a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author,quoted_author, and date.
    • graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread.
    • feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author);
    • feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values: the feed name, user id, and timestamp.
    • feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp;
    • scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.

    Citation

    If used for research purposes, please cite the following paper describing the dataset details:

    Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year's Worth of Social Data." PlosOne (2024) https://doi.org/10.1371/journal.pone.0310330

    Right to Erasure (Right to be forgotten)

    Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.

    Users included in the Bluesky Social dataset have the right to opt-out and request the removal of their data, per GDPR provisions (Article 17).

    We emphasize that the released data has been thoroughly pseudonymized in compliance with GDPR (Article 4(5)). Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to protect individual privacy further and minimize reidentification risk. Moreover, it should be noted that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides opt-out derogations (Article 17(3)(d) and Article 89).

    Nonetheless, if you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with the subject "Removal request: [username]"). We will process your request within a reasonable timeframe - updates will occur monthly, if necessary, and access to previous versions will be restricted.

    Acknowledgments:

    This work is supported by :

    • the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”,
      Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu);
    • SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021;
    • EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
  3. E

    [Thread number] - Lab study on the effect of temperature and pCO2 on mussel...

    • erddap.bco-dmo.org
    Updated Aug 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCO-DMO (2019). [Thread number] - Lab study on the effect of temperature and pCO2 on mussel byssal attachment (thread number) with mussels collected in May 2012 from Argyle Creek, San Juan Island, WA (48.52˚ N, 123.01˚ W) (Effects of Ocean Acidification on Coastal Organisms: An Ecomaterials Perspective) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_773623/index.html
    Explore at:
    Dataset updated
    Aug 6, 2019
    Dataset provided by
    Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
    Authors
    BCO-DMO
    License

    https://www.bco-dmo.org/dataset/773623/licensehttps://www.bco-dmo.org/dataset/773623/license

    Area covered
    San Juan Island
    Variables measured
    pH, Temp, Trial, Mussel_ID, pCO2_Target, Thread_Count, Treatment_ID, pCO2_Measured
    Description

    These data were used in a structural analysis study to evaluate how pCO2 and an additional stressor, elevated temperature, influences byssal thread quality and production. Mussels (M. trossulus) were collected in May 2012 from Argyle Creek, San Juan Island, WA (48.52\u02da N, 123.01\u02da W) and held in a mesh box submerged under the dock at Friday Harbor Laboratories (FHL), San Juan Island, WA for up to 14 d. Mussels were placed in controlled temperature and pCO2 treatments in the Ocean Acidification Experimental Laboratory (OAEL), then newly produced threads were counted and pulled to failure to determine byssus strength. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv acquisition_description=Mussels (M. trossulus) were collected in May 2012 from Argyle Creek, San Juan Island, WA (48.52\u02da N, 123.01\u02da W) and held in a mesh box submerged under the dock at Friday Harbor Laboratories (FHL), San Juan Island, WA for up to 14 d. Mussels were placed in experimental mesocosms in the Ocean Acidification Experimental Laboratory (OAEL) at FHL as described in O\u2019Donnell et al. (2013) and Timmins-Schiffman et al. (2012). Briefly, manipulations of pH were made by bubbling CO2 into a 150 L temperature- controlled seawater reservoir, that supplied water to eight 3.5 L chambers at a turnover rate of 50 ml min-1. Air was bubbled into the reservoir to maintain 100% oxygen saturation and submersible pumps (model number P396, Annex Depot, Sacramento, CA) provided mixing in the chambers at 3.8 L min-1. The bottom of each chamber was lined with autoclaved pebbles, collected from an FHL beach, to provide a substrate for byssal thread attachment. pH and temperature were monitored continuously in each water reservoir with a Durafet pH and temperature probe and the full carbonate chemistry of the system evaluated with DIC and Total alkalinity measurements once during each trial. Mussels were acclimated to their treatment temperatures in ambient pH (~7.8) over 9 d, ramping temperature up no more than 2\u02daC per day, and fed a maintenance level of Shellfish Diet 1800 (6 g l-1 day-1, Reed Mariculture, Campell, CA, USA).

    The twelve independent temperature x pCO2 treatments spanned the range of local marine conditions (Newcomb, 2015; George et al., 2019; temperature at 10\u02daC, 18\u02daC, or 25\u02daC and pCO2 at 400, 750, 1200, or 2500 \u00b5atm). Each mussel was trimmed of external byssus before placement in an experimental treatment for 3 d, sufficient time to produce new mature byssal threads (Bell & Gosline 1996) while minimizing the effect of treatment on mussel condition. Mussels were starved during the 3 d trials to minimize changes in chamber water chemistry due to food addition and to reduce fouling. Three trials were conducted in succession to replicate treatments over time, increasing sample size (n=8 x 3) for each temperature*pCO2 treatment.

    At the end of each trial, mussels and the rocks to which they had attached with byssal threads were removed from the chambers. The entire byssus was dissected from each mussel and stored air-dried for up to 20 days. Byssus was rehydrated in seawater prior to testing, a method that does not alter the mechanical properties of the byssal threads (Brazee, 2004). The number of byssal threads each mussel produced was counted, and one thread was haphazardly chosen for mechanical testing following the procedure of Bell & Gosline (1996). Briefly, an individual thread was clamped with submersible pneumatic grips on either end by holding the proximal byssal stem between cardstock with cyanoacrylate glue and affixing the distal plaque with attached rock to an aluminum T-bar with epoxy. An Instron 5565 tensometer (Norwood MA, USA), extended the thread at a rate of 10 mm min-1 in a temperature-controlled water bath (3130-100 BioPuls Bath, Instron, Norwood, MA, USA) until failure. The tensometer measured force (\u00b110-3 N) and extension (\u00b1 10-3 mm) at 10 Hz. Tests were performed in seawater with a pH of 7.8 and the relevant treatment temperature.

    Pull to failure mechanical tests provided estimates of thread breaking force, yield force, extensibility, initial stiffness and failure location (Bell & Gosline 1996). Yield, due to quasi-plastic deformation in the distal region, was identified as the point where the initial slope of the force-extension curve decreased by 40%. Extensibility was calculated by dividing thread extension at failure by initial length and initial stiffness was determined from the initial slope of the force extension curve. The location of failure (proximal, plaque, and/or distal region) was noted and threads were retested to quantify the breaking force of each remaining region. Tests that broke at the grips were considered underestimates and were discarded.

    The cross-sectional area of the proximal region was measured to evaluate morphological differences among treatments. The elliptical area was estimated from measures of the major and minor axes (+ 1 um using a dissecting microscope (Brazee & Carrington 2006). Proximal breaking stress (N mm-2), a material property, was calculated as proximal breaking force divided by proximal area. Thread surface structure was examined using a scanning electron microscope (FEI Sirion XL30 SEM, Hillsboro, OR).

    Whole mussel attachment strength was estimated using two mathematical models developed by Bell & Gosline (1996). Each model assumes a mussel is anchored with a constant thread number (n=50) arranged in a circle. The normal model estimates dislodgment force perpendicular to the substrate (e.g, lift); all threads are engaged and extend until they reach their maximum force. The parallel model estimates dislodgement force for an animal pulled parallel to the substrate (e.g., drag); threads on the upstream side are the first in tension, yield and extend until they reach maximum force and break, while more threads are recruited into tension until they have all broken. Additionally, we modified each model to incorporate the variation in thread production across treatments. Because thread production was measured for only three days, treatment means were scaled to a maximum value of 50 threads.

    Detailed methods and results are provided in Newcomb, 2015 and Newcomb et al., 2019

    Location: Friday Harbor Laboratories, Friday Harbor WA awards_0_award_nid=55120 awards_0_award_number=OCE-1041213 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward?AWD_ID=1041213 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=Mary Beth Saffo awards_0_program_manager_nid=51608 cdm_data_type=Other comment=Thread number PI: Emily Carrington Data Version 1: 2019-07-24 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/bco-dmo.773623.1 infoUrl=https://www.bco-dmo.org/dataset/773623 institution=BCO-DMO instruments_0_acronym=Materials Testing System instruments_0_dataset_instrument_description=Instron’s (Norwood, MA) electromechanical testing systems are used to test a wide range of materials in tension or compression. The series 5560 are dual column table top models, the 5565 model has a load capacity of 5 kN (1125 lbf). instruments_0_dataset_instrument_nid=773637 instruments_0_description=Testing systems that are used to test a wide range of materials in tension or compression. instruments_0_instrument_name=Materials Testing System instruments_0_instrument_nid=718 instruments_0_supplied_name=Instron 5565 load frame (Norwood, MA) keywords_vocabulary=GCMD Science Keywords metadata_source=https://www.bco-dmo.org/api/dataset/773623 param_mapping={'773623': {}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/773623/parameters people_0_affiliation=University of Washington people_0_affiliation_acronym=UW people_0_person_name=Emily Carrington people_0_person_nid=51609 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Woods Hole Oceanographic Institution people_1_affiliation_acronym=WHOI BCO-DMO people_1_person_name=Karen Soenen people_1_person_nid=748773 people_1_role=BCO-DMO Data Manager people_1_role_type=related project=OA - Ecomaterials Perspective projects_0_acronym=OA - Ecomaterials Perspective projects_0_description=Effects of Ocean Acidification on Coastal Organisms: An Ecomaterials Perspective This award will support researchers based at the University of Washington's Friday Harbor Laboratories. The overall focus of the project is to determine how ocean acidification affects the integrity of biomaterials and how these effects in turn alter interactions among members of marine communities. The research plan emphasizes an ecomaterial approach; a team of biomaterials and ecomechanics experts will apply their unique perspective to detail how different combinations of environmental conditions affect the structural integrity and ecological performance of organisms. The study targets a diversity of ecologically important taxa, including bivalves, snails, crustaceans, and seaweeds, thereby providing insight into the range of possible biological responses to future changes in climate conditions. The proposal will enhance our understanding of the ecological consequences of climate change, a significant societal problem. Each of the study systems has broader impacts in fields beyond ecomechanics. Engineers are particularly interested in biomaterials and in each system there are materials with commercial potential. The project will integrate research and education by supporting doctoral student dissertation research, providing undergraduate research opportunities via three training programs at FHL, and summer internships for talented high school students, recruited from the FHL Science Outreach Program. The participation of underrepresented groups will be broadened by actively recruiting URM and female students.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn; Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn (2024). Dataset: Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board [Dataset]. http://doi.org/10.5281/zenodo.3606810
Organization logo

Dataset: Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
bin, pdfAvailable download formats
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn; Antonis Papasavva; Savvas Zannettou; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the dataset released with the paper titled: "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board".

The dataset is a single Newline delimited JSON file. Each line in the file consists of a JSON object which is a full 4chan /pol/ thread. The JSON objects contain all the key/values returned by the 4chan API, along with three additional keys (entities, perspectives, and extracted_poster_id).

For each JSON object we complement the data with the list of the named entities we detect for each post, using the spaCy Python library. In addition, for each post we add scores returned by the Google’s Perspective API, and more specifically seven scores in the [0; 1] interval.

For the detailed description of every key in the JSON structure, along with the type of the value, please read the readme.pdf file provided with this dataset.

If you find our dataset useful, please cite our paper:

@article{papasavva2020raiders,
 title={Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board},
 author={Antonis Papasavva, Savvas Zannettou, Emiliano De Cristofaro, Gianluca Stringhini, Jeremy Blackburn},
 journal={14th International AAAI Conference On Web And Social Media (ICWSM), 2020},
 year={2020} 
}

How to extract the data:

Note that the data is compressed. See the instructions below on how to extract the data:

  • Linux and Mac

Step 1: Open a terminal window and navigate to the path where the file pol_0616-1119_labeled.tar.zst is located.

Step2: Run the following command:

unzstd pol_0616-1119_labeled.tar.zst

The above command will result in a file named pol_0616-1119_labeled.tar. (in the same directory)

Step 3: Again, from your terminal window, run this command:

tar -xvf pol_0616-1119_labeled.tar

When the above command finishes, you will get (in the same directory) the extracted data - a file named pol_062016-112019_labeled.ndjson.

  • Windows

There are many applications that can be used to extract this data on Windows available online. The authors cannot recommend specific applications. Note that the file is compressed twice so you will need to perform the data extraction twice - once on the downloaded file, and once on the file that was extracted from the downloaded file.

Please do not hesitate to contact the author of this study in case you face any problem at: antonis.papasavva@ucl.ac.uk

Search
Clear search
Close search
Google apps
Main menu