Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. s

    Twitter cascade dataset

    • researchdata.smu.edu.sg
    • smu.edu.sg
    • +1more
    pdf
    Updated May 31, 2023
    + more versions
  2. s

    Research data for " Across the great divides: Gender dynamics influence how...

    • researchdata.smu.edu.sg
    bin
    Updated May 31, 2023
  3. s

    Data and code for "Discoverability beyond the library: Search engine...

    • researchdata.smu.edu.sg
    zip
    Updated Jun 4, 2023
  4. s

    Online appendix to "Labor market implications of Taiwan's accession to the...

    • researchdata.smu.edu.sg
    pdf
    Updated Jun 2, 2023
  5. f

    Data from: Extended Comprehensive Study of Association Measures for Fault...

    • figshare.com
    • researchdata.smu.edu.sg
    zip
    Updated Mar 12, 2021
  6. s

    Data and codes for "SCANet: Self-paced semi-curricular attention network for...

    • researchdata.smu.edu.sg
    zip
    Updated Oct 9, 2023
  7. f

    Twitter bot profiling

    • figshare.com
    • researchdata.smu.edu.sg
    • +1more
    pdf
    Updated May 31, 2023
    + more versions
  8. f

    Data from: Online supplement to 'A panel clustering approach to analyzing...

    • datasetcatalog.nlm.nih.gov
    • researchdata.smu.edu.sg
    Updated Mar 23, 2022
  9. f

    Data from: Cross-cultural variation in men’s preference for sexual...

    • figshare.com
    • researchdata.smu.edu.sg
    • +1more
    doc
    Updated Mar 12, 2021
  10. m

    Replication data for "Geography, Trade, and Internal Migration in China"

    • data.mendeley.com
    • researchdata.smu.edu.sg
    • +1more
    Updated Mar 3, 2020
  11. f

    Data from: Worker selection, hiring, and vacancies

    • datasetcatalog.nlm.nih.gov
    • researchdata.smu.edu.sg
    Updated Apr 2, 2020
  12. s

    Data and code for "DeepFacade: A deep learning approach to facade parsing"

    • researchdata.smu.edu.sg
    zip
    Updated Jun 1, 2023
  13. Overfitting in semantics-based program repair

    • zenodo.org
    • researchdata.smu.edu.sg
    • +1more
    zip
    Updated Jan 24, 2020
  14. s

    Data from: Employer image within and across industries: Moving beyond...

    • researchdata.smu.edu.sg
    • datasetcatalog.nlm.nih.gov
    txt
    Updated Jun 1, 2023
  15. f

    2023 August Shandong Field Notes

    • figshare.com
    • researchdata.smu.edu.sg
    pdf
    Updated Aug 12, 2024
  16. f

    Data from: Estimating stranded coal assets in China's power sector

    • datasetcatalog.nlm.nih.gov
    • researchdata.smu.edu.sg
    Updated Sep 13, 2022
  17. d

    Replication Data for: The Search for Spices and Souls: Catholic Missions as...

    • search.dataone.org
    • researchdata.smu.edu.sg
    Updated Nov 12, 2023
  18. d

    GIGA Sanctions Dataset

    • da-ra.de
    • researchdata.smu.edu.sg
    • +1more
    Updated 2012
  19. f

    Earable & IoT Dataset from: ERICA - Enabling real-time mistake detection &...

    • datasetcatalog.nlm.nih.gov
    • researchdata.smu.edu.sg
    Updated Nov 11, 2020
  20. s

    Replication data for: Media in a time of crisis

    • researchdata.smu.edu.sg
    bin
    Updated Jun 8, 2023
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Living Analytics Research Centre (2023). Twitter cascade dataset [Dataset]. http://doi.org/10.25440/smu.12062709.v1

Twitter cascade dataset

Explore at:
153 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Living Analytics Research Centre
License

http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

Description

This dataset comprises a set of information cascades generated by Singapore Twitter users. Here a cascade is defined as a set of tweets about the same topic. This dataset was collected via the Twitter REST and streaming APIs in the following way. Starting from popular seed users (i.e., users having many followers), we crawled their follow, retweet, and user mention links. We then added those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. With this, we have a total of 184,794 Twitter user accounts. Then tweets are crawled from these users from 1 April to 31 August 2012. In all, we got 32,479,134 tweets. To identify cascades, we extracted all the URL links and hashtags from the above tweets. And these URL links and hashtags are considered as the identities of cascades. In other words, all the tweets which contain the same URL link (or the same hashtag) represent a cascade. Mathematically, a cascade is represented as a set of user-timestamp pairs. Figure 1 provides an example, i.e. cascade C = {< u1, t1 >, < u2, t2 >, < u1, t3 >, < u3, t4 >, < u4, t5 >}. For evaluation, the dataset was split into two parts: four months data for training and the last one month data for testing. Table 1summarizes the basic (count) statistics of the dataset. Each line in each file represents a cascade. The first term in each line is a hashtag or URL, the second term is a list of user-timestamp pairs. Due to privacy concerns, all user identities are anonymized.

Search
Clear search
Close search
Google apps
Main menu