100+ datasets found
  1. Twitter Graph Example v2 43

    • kaggle.com
    zip
    Updated Jun 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43
    Explore at:
    zip(17943518 bytes)Available download formats
    Dataset updated
    Jun 29, 2022
    Authors
    Mathias Weiß
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

    Twitter Graph

    Show data from your personal Twitter account

    The Graph Your Network application inserts your Twitter activity into Neo4j.

    https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

    Content

    ~10 MB of graphs data (CSV)

    43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

    57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING

  2. TwitterFollowGraph

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter (2023). TwitterFollowGraph [Dataset]. https://huggingface.co/datasets/Twitter/TwitterFollowGraph
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Xhttp://x.com/
    Authors
    Twitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

    This repo contains the TwitterFaveGraph dataset from our paper kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

      TwitterFollowGraph
    

    TwitterFollowGraph is a bipartite directed graph of users (consumer) nodes to author (producer) nodes… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFollowGraph.

  3. X/Twitter: number of worldwide users 2019-2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide
    Description

    As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.

  4. Twitter API Data: Nike, Lululemon, Adidas Tweets

    • kaggle.com
    zip
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Conrad Kleykamp (2024). Twitter API Data: Nike, Lululemon, Adidas Tweets [Dataset]. https://www.kaggle.com/datasets/conradkleykamp/nikelululemonadidas-tweets-jsonl/code
    Explore at:
    zip(105168717 bytes)Available download formats
    Dataset updated
    Jun 10, 2024
    Authors
    Conrad Kleykamp
    Description

    This dataset was retrieved from CU Boulder's DTSA 5800 Network Analysis for Marketing Analytics course for the MS - Data Science degree. The dataset was originally retrieved by the professor of the course from Twitter's (X) Standard Search API and has been uploaded to Kaggle for further use. The dataset contains 175,077 tweet objects. Each tweet object is encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. These attributes, and their state are used to describe objects.

    Each tweet object mentions at least one of three major brands: Nike, Adidas, Lululemon. This dataset was originally uploaded for practicing natural language processing and creating network graphs.

    For more information on the structure of a tweet object, please see Twitter's (X) documentation here.

    License: Twitter Oauth 1.0

  5. Twitter Connections with User Location

    • kaggle.com
    zip
    Updated Feb 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sajid Hasan Apon (2019). Twitter Connections with User Location [Dataset]. https://www.kaggle.com/sajidhasanapon/twitter-connections-with-user-location
    Explore at:
    zip(1412431294 bytes)Available download formats
    Dataset updated
    Feb 13, 2019
    Authors
    Sajid Hasan Apon
    Description

    Content

    This dataset is a subset of all the Twitter-users, along with their connections and locations.

    The graph created by considering the users as nodes and their connections as edges is a connected component of the total Twitter graph (i.e. for every user in the subgraph, all their connections in the original graph are contained within the subgraph).

    Although Twitter is a directed graph (follower-following relation is not mutual. "X follows Y" does not imply "Y follows X"), we have considered the directed edges as undirected. Hence, if u→v is present in the original graph but v→u is not, we have added the edge v→u for every u and v.

    There are two directories: one with 10 million users and the other with 1 million. Each directory contains two .txt files: location, and user.

    The location file contains the latitude and longitude of each user. The file format is:

    lat_1, long_1
    lat_2, long_2
    lat_3, long_3
    ...
    

    where lat_1 and long_1 are the latitude and longitude of user number 1 respectively, and so on.

    The user file contains the adjacency list of each user. The k-th row of this file enumerates the friends of user number k.

    Bibliographic References

    If you use this dataset, please cite it as:

    @article{DBLP:journals/pvldb/GhoshACHSL18,
    author  = {Bishwamittra Ghosh and
    Mohammed Eunus Ali and
    Farhana Murtaza Choudhury and
    Sajid Hasan Apon and
    Timos Sellis and
    Jianxin Li},
    title   = {The Flexible Socio Spatial Group Queries},
    journal  = {{PVLDB}},
    volume  = {12},
    number  = {2},
    pages   = {99--111},
    year   = {2018},
    url    = {http://www.vldb.org/pvldb/vol12/p99-ghosh.pdf},
    timestamp = {Mon, 03 Dec 2018 16:45:54 +0100},
    biburl  = {https://dblp.org/rec/bib/journals/pvldb/GhoshACHSL18},
    }
    

    Source Publications

    @inproceedings{DBLP:conf/kdd/LiWDWC12,
    author  = {Rui Li and Shengjie Wang and Hongbo Deng and Rui Wang and Kevin Chen-Chuan Chang},
    title   = {Towards social user profiling: unified and discriminative influence model for inferring home locations},
    booktitle = {KDD},
    year   = {2012},
    pages   = {1023-1031}
    }
    
  6. Z

    A study on real graphs of fake news spreading on Twitter

    • data.niaid.nih.gov
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
    Explore at:
    Dataset updated
    Aug 20, 2021
    Dataset provided by
    Federal University of Rio de Janeiro
    Authors
    Amirhosein Bodaghi
    Description

    *** Fake News on Twitter ***

    These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

    1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

    2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

    3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

    4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

    5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

    The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

    DD

    DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

    The structure of excel files for each dataset is as follow:

    Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

    User ID (user who has posted the current tweet/retweet)

    The description sentence in the profile of the user who has published the tweet/retweet

    The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

    Date and time of creation of the account by which the current tweet/retweet has been posted

    Language of the tweet/retweet

    Number of followers

    Number of followings (friends)

    Date and time of posting the current tweet/retweet

    Number of like (favorite) the current tweet had been acquired before crawling it

    Number of times the current tweet had been retweeted before crawling it

    Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

    The source (OS) of device by which the current tweet/retweet was posted

    Tweet/Retweet ID

    Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

    Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

    Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

    Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

    State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

    r : The tweet/retweet is a fake news post

    a : The tweet/retweet is a truth post

    q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

    n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

    DG

    DG for each fake news contains two files:

    A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

    A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

    Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

    The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

  7. TwitterFaveGraph

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter (2023). TwitterFaveGraph [Dataset]. https://huggingface.co/datasets/Twitter/TwitterFaveGraph
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Xhttp://x.com/
    Authors
    Twitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MiCRO: Multi-interest Candidate Retrieval Online

    This repo contains the TwitterFaveGraph dataset from our paper MiCRO: Multi-interest Candidate Retrieval Online. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

      TwitterFaveGraph
    

    TwitterFaveGraph is a bipartite directed graph of user nodes to Tweet nodes where an edge represents a "fave" engagement. Each edge is binned into predetermined time chunks which… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFaveGraph.

  8. s

    How Popular Is Twitter In The World?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.

  9. Twitter Posts Network (SNAP)

    • kaggle.com
    zip
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Twitter Posts Network (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-twitter7/code
    Explore at:
    zip(6560110658 bytes)Available download formats
    Dataset updated
    Dec 16, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    476 million Twitter tweets

    Dataset information

    467 million Twitter posts from 20 million users covering a 7 month period
    from June 1 2009 to December 31 2009. We estimate this is about 20-30% of
    all public tweets published on Twitter during the particular time frame.

    For each public tweet the following information was available:

    Author                                 
    Time                                  
    Content                                
    

    We have no Twitter social graph (who-follows-whom graph) available. You can find a copy of the graph at http://an.kaist.ac.kr/traces/WWW2010.html
    (thanks to Haewoon Kwak, et al.).

    Dataset statistics
    Number of users 17,069,982
    Number of tweets 476,553,560
    Number of URLs 181,611,080
    Number of Hashtags 49,293,684
    Number of re-tweets 71,835,017

    Source (citation)
    J. Yang, J. Leskovec. Temporal Variation in Online Media. ACM Intl.
    Conf. on Web Search and Data Mining (WSDM '11), 2011.

    As per request from Twitter the data is no longer available.

    http://an.kaist.ac.kr/traces/WWW2010.html :

    What is Twitter, a Social Network or a News Media?

    Haewoon Kwak (http://an.kaist.ac.kr/~haewoon),
    Changhyun Lee (http://an.kaist.ac.kr/~chlee),
    Hosung Park (http://an.kaist.ac.kr/~hosung),
    and Sue Moon (http://an.kaist.ac.kr/~sbmoon)

    Proceedings of the 19th International World Wide Web (WWW) Conference,
    April 26-30, 2010, Raleigh NC (USA)

    Twitter, a microblogging service less than three years old, commands more
    than 41 million users as of July 2009 and is growing fast. Twitter users
    tweet about any topic within the 140-character limit and follow others to
    receive their tweets. The goal of this paper is to study the topological
    characteristics of Twitter and its power as a new medium of information
    sharing.

    We have crawled the entire Twitter site and obtained 41.7 million user
    profiles, 1.47 billion social relations, 4,262 trending topics, and 106
    million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low
    reciprocity, which all mark a deviation from known characteristics of human social networks~\cite{Newman03}. In order to identify influentials on
    Twitter, we have ranked users by the number of followers and by PageRank
    and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the
    number of followers and that from the popularity of one's tweets. We have
    analyzed the tweets of top trending topics and reported on their temporal
    behavior and user participation. We have classified the trending topics
    based on the active period and th...

  10. Top retweeted users (highest indegree).

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). Top retweeted users (highest indegree). [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Top retweeted users (highest indegree).

  11. The 10 most popular hashtags in our dataset.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The 10 most popular hashtags in our dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 10 most popular hashtags in our dataset.

  12. a

    Lerman Twitter 2010 Dataset

    • academictorrents.com
    • marketplace.sshopencloud.eu
    bittorrent
    Updated Aug 15, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristina Lerman (2014). Lerman Twitter 2010 Dataset [Dataset]. https://academictorrents.com/details/d8b3a315172c8d804528762f37fa67db14577cdb
    Explore at:
    bittorrent(292173969)Available download formats
    Dataset updated
    Aug 15, 2014
    Dataset authored and provided by
    Kristina Lerman
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users. URLs 66,059 tweets 2,859,764 users 736,930 links 36,743,448 Tweets Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information link: URL within the text of the tweet id: tweet id create_at: date added to the db create_at_long inreplyto_screen_name: screen name of user this tweet is replying to inreplyto_user_id: user id of user this tweet is replying to source: device from which the tweet originated bad_user_id: alternate user id user_screen_name: tweeting user screen name order_of_users: tweet s index within sequence of tweets of the same URL user_id: user id Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for

  13. m

    Graph-Based Social Media Data on Mental Health Topics

    • data.mendeley.com
    Updated Nov 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
    Explore at:
    Dataset updated
    Nov 4, 2024
    Authors
    Samuel Ady Sanjaya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

    The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)

  14. s

    Why Do People Use Twitter?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.

  15. s

    How Popular Is Twitter In The US?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The US? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The US has the largest number of Twitter users with over a 100 million users. They account for about 16.7% of all Twitter users worldwide.

  16. Z

    COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...

    • data.niaid.nih.gov
    Updated Jan 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassine Drias; Habiba Drias (2021). COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024176
    Explore at:
    Dataset updated
    Jan 23, 2021
    Dataset provided by
    LRIA - USTHB
    LRIA - University of Algiers
    Authors
    Yassine Drias; Habiba Drias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

    The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

    The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

    Author: the user who posted the tweet

    Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field

    Tweet: the full content of the tweet

    Hashtags: the list of hashtags present in the tweet

    Language: the language of the tweet

    Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.

    Location: the country of the author of the tweet, which is unfortunately not always available

    Date: the publication date of the tweet

    Source: the device or platform used to send the tweet

    The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".

  17. Data from: Discovery and classification of Twitter bots

    • data.europa.eu
    unknown
    Updated Apr 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). Discovery and classification of Twitter bots [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4715885?locale=de
    Explore at:
    unknown(22740)Available download formats
    Dataset updated
    Apr 22, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Online Social Networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. Social influence of large groups may be influenced by user believes or be attracted the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between the bot accounts and human users, as well as the botnet interactions with the user communities and the Twitter trending topics.

  18. The complete list of all entities with the corresponding keywords that were...

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The complete list of all entities with the corresponding keywords that were used for each one. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete list of all entities with the corresponding keywords that were used for each one.

  19. r

    Twitter16

    • resodate.org
    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bita Azarijoo; Mostafa Salehi; Shaghayegh Najari (2024). Twitter16 [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdHdpdHRlcjE2
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Bita Azarijoo; Mostafa Salehi; Shaghayegh Najari
    Description

    Rumor detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks. This paper proposes a novel Claim-guided Hierarchical Graph Attention Network based on undirected interaction graphs to learn graph attention-based embeddings that attend to user interactions for rumor detection.

  20. NLP feature set variables for TwiBot-20.

    • plos.figshare.com
    xls
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agata Skorupka (2024). NLP feature set variables for TwiBot-20. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Agata Skorupka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43
Organization logo

Twitter Graph Example v2 43

Explore at:
zip(17943518 bytes)Available download formats
Dataset updated
Jun 29, 2022
Authors
Mathias Weiß
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

Twitter Graph

Show data from your personal Twitter account

The Graph Your Network application inserts your Twitter activity into Neo4j.

https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

Content

~10 MB of graphs data (CSV)

43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING

Search
Clear search
Close search
Google apps
Main menu