100+ datasets found

Twitter Graph Example v2 43
kaggle.com
zip
Updated Jun 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43
Explore at:
zip(17943518 bytes)Available download formats
Dataset updated
Jun 29, 2022
Authors
Mathias Weiß
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

Twitter Graph

Show data from your personal Twitter account

The Graph Your Network application inserts your Twitter activity into Neo4j.

https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

Content

~10 MB of graphs data (CSV)

43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING
TwitterFollowGraph
huggingface.co
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Twitter (2023). TwitterFollowGraph [Dataset]. https://huggingface.co/datasets/Twitter/TwitterFollowGraph
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Dataset provided by
Xhttp://x.com/
Authors
Twitter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

This repo contains the TwitterFaveGraph dataset from our paper kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

TwitterFollowGraph

TwitterFollowGraph is a bipartite directed graph of users (consumer) nodes to author (producer) nodes… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFollowGraph.
X/Twitter: number of worldwide users 2019-2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
Twitter API Data: Nike, Lululemon, Adidas Tweets
kaggle.com
zip
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Conrad Kleykamp (2024). Twitter API Data: Nike, Lululemon, Adidas Tweets [Dataset]. https://www.kaggle.com/datasets/conradkleykamp/nikelululemonadidas-tweets-jsonl/code
Explore at:
zip(105168717 bytes)Available download formats
Dataset updated
Jun 10, 2024
Authors
Conrad Kleykamp
Description
This dataset was retrieved from CU Boulder's DTSA 5800 Network Analysis for Marketing Analytics course for the MS - Data Science degree. The dataset was originally retrieved by the professor of the course from Twitter's (X) Standard Search API and has been uploaded to Kaggle for further use. The dataset contains 175,077 tweet objects. Each tweet object is encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. These attributes, and their state are used to describe objects.

Each tweet object mentions at least one of three major brands: Nike, Adidas, Lululemon. This dataset was originally uploaded for practicing natural language processing and creating network graphs.

For more information on the structure of a tweet object, please see Twitter's (X) documentation here.

License: Twitter Oauth 1.0
Twitter Connections with User Location
kaggle.com
zip
Updated Feb 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sajid Hasan Apon (2019). Twitter Connections with User Location [Dataset]. https://www.kaggle.com/sajidhasanapon/twitter-connections-with-user-location
Explore at:
zip(1412431294 bytes)Available download formats
Dataset updated
Feb 13, 2019
Authors
Sajid Hasan Apon
Description
Content

This dataset is a subset of all the Twitter-users, along with their connections and locations.

The graph created by considering the users as nodes and their connections as edges is a connected component of the total Twitter graph (i.e. for every user in the subgraph, all their connections in the original graph are contained within the subgraph).

Although Twitter is a directed graph (follower-following relation is not mutual. "X follows Y" does not imply "Y follows X"), we have considered the directed edges as undirected. Hence, if u→v is present in the original graph but v→u is not, we have added the edge v→u for every u and v.

There are two directories: one with 10 million users and the other with 1 million. Each directory contains two .txt files: location, and user.

The location file contains the latitude and longitude of each user. The file format is:

lat_1, long_1 lat_2, long_2 lat_3, long_3 ...

where lat_1 and long_1 are the latitude and longitude of user number 1 respectively, and so on.

The user file contains the adjacency list of each user. The k-th row of this file enumerates the friends of user number k.

Bibliographic References

If you use this dataset, please cite it as:

@article{DBLP:journals/pvldb/GhoshACHSL18, author = {Bishwamittra Ghosh and Mohammed Eunus Ali and Farhana Murtaza Choudhury and Sajid Hasan Apon and Timos Sellis and Jianxin Li}, title = {The Flexible Socio Spatial Group Queries}, journal = {{PVLDB}}, volume = {12}, number = {2}, pages = {99--111}, year = {2018}, url = {http://www.vldb.org/pvldb/vol12/p99-ghosh.pdf}, timestamp = {Mon, 03 Dec 2018 16:45:54 +0100}, biburl = {https://dblp.org/rec/bib/journals/pvldb/GhoshACHSL18}, }

Source Publications

@inproceedings{DBLP:conf/kdd/LiWDWC12, author = {Rui Li and Shengjie Wang and Hongbo Deng and Rui Wang and Kevin Chen-Chuan Chang}, title = {Towards social user profiling: unified and discriminative influence model for inferring home locations}, booktitle = {KDD}, year = {2012}, pages = {1023-1031} }
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset provided by
Federal University of Rio de Janeiro
Authors
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
TwitterFaveGraph
huggingface.co
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Twitter (2023). TwitterFaveGraph [Dataset]. https://huggingface.co/datasets/Twitter/TwitterFaveGraph
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Dataset provided by
Xhttp://x.com/
Authors
Twitter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MiCRO: Multi-interest Candidate Retrieval Online

This repo contains the TwitterFaveGraph dataset from our paper MiCRO: Multi-interest Candidate Retrieval Online. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

TwitterFaveGraph

TwitterFaveGraph is a bipartite directed graph of user nodes to Tweet nodes where an edge represents a "fave" engagement. Each edge is binned into predetermined time chunks which… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFaveGraph.
s
How Popular Is Twitter In The World?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.
Twitter Posts Network (SNAP)
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Twitter Posts Network (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-twitter7/code
Explore at:
zip(6560110658 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
476 million Twitter tweets

Dataset information

467 million Twitter posts from 20 million users covering a 7 month period
from June 1 2009 to December 31 2009. We estimate this is about 20-30% of
all public tweets published on Twitter during the particular time frame.

For each public tweet the following information was available:

Author Time Content

We have no Twitter social graph (who-follows-whom graph) available. You can find a copy of the graph at http://an.kaist.ac.kr/traces/WWW2010.html
(thanks to Haewoon Kwak, et al.).

Dataset statistics
Number of users 17,069,982
Number of tweets 476,553,560
Number of URLs 181,611,080
Number of Hashtags 49,293,684
Number of re-tweets 71,835,017

Source (citation)
J. Yang, J. Leskovec. Temporal Variation in Online Media. ACM Intl.
Conf. on Web Search and Data Mining (WSDM '11), 2011.

As per request from Twitter the data is no longer available.

http://an.kaist.ac.kr/traces/WWW2010.html :

What is Twitter, a Social Network or a News Media?

Haewoon Kwak (http://an.kaist.ac.kr/~haewoon),
Changhyun Lee (http://an.kaist.ac.kr/~chlee),
Hosung Park (http://an.kaist.ac.kr/~hosung),
and Sue Moon (http://an.kaist.ac.kr/~sbmoon)

Proceedings of the 19th International World Wide Web (WWW) Conference,
April 26-30, 2010, Raleigh NC (USA)

Twitter, a microblogging service less than three years old, commands more
than 41 million users as of July 2009 and is growing fast. Twitter users
tweet about any topic within the 140-character limit and follow others to
receive their tweets. The goal of this paper is to study the topological
characteristics of Twitter and its power as a new medium of information
sharing.

We have crawled the entire Twitter site and obtained 41.7 million user
profiles, 1.47 billion social relations, 4,262 trending topics, and 106
million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low
reciprocity, which all mark a deviation from known characteristics of human social networks~\cite{Newman03}. In order to identify influentials on
Twitter, we have ranked users by the number of followers and by PageRank
and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the
number of followers and that from the popularity of one's tweets. We have
analyzed the tweets of top trending topics and reported on their temporal
behavior and user participation. We have classified the trending topics
based on the active period and th...
Top retweeted users (highest indegree).
plos.figshare.com
xls
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). Top retweeted users (highest indegree). [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0270542.t003
Dataset updated
Jun 17, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Top retweeted users (highest indegree).
The 10 most popular hashtags in our dataset.
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The 10 most popular hashtags in our dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0270542.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The 10 most popular hashtags in our dataset.
a
Lerman Twitter 2010 Dataset
academictorrents.com
marketplace.sshopencloud.eu
bittorrent
Updated Aug 15, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristina Lerman (2014). Lerman Twitter 2010 Dataset [Dataset]. https://academictorrents.com/details/d8b3a315172c8d804528762f37fa67db14577cdb
Explore at:
bittorrent(292173969)Available download formats
Dataset updated
Aug 15, 2014
Dataset authored and provided by
Kristina Lerman
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users. URLs 66,059 tweets 2,859,764 users 736,930 links 36,743,448 Tweets Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information link: URL within the text of the tweet id: tweet id create_at: date added to the db create_at_long inreplyto_screen_name: screen name of user this tweet is replying to inreplyto_user_id: user id of user this tweet is replying to source: device from which the tweet originated bad_user_id: alternate user id user_screen_name: tweeting user screen name order_of_users: tweet s index within sequence of tweets of the same URL user_id: user id Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for
m
Graph-Based Social Media Data on Mental Health Topics
data.mendeley.com
Updated Nov 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
Explore at:
Unique identifier
https://doi.org/10.17632/z45txpdp7f.2
Dataset updated
Nov 4, 2024
Authors
Samuel Ady Sanjaya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)
s
Why Do People Use Twitter?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.
s
How Popular Is Twitter In The US?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How Popular Is Twitter In The US? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The US has the largest number of Twitter users with over a 100 million users. They account for about 16.7% of all Twitter users worldwide.
Z
COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...
data.niaid.nih.gov
Updated Jan 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassine Drias; Habiba Drias (2021). COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024176
Explore at:
Dataset updated
Jan 23, 2021
Dataset provided by
LRIA - USTHB
LRIA - University of Algiers
Authors
Yassine Drias; Habiba Drias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

Author: the user who posted the tweet

Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field

Tweet: the full content of the tweet

Hashtags: the list of hashtags present in the tweet

Language: the language of the tweet

Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.

Location: the country of the author of the tweet, which is unfortunately not always available

Date: the publication date of the tweet

Source: the device or platform used to send the tweet

The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".
Data from: Discovery and classification of Twitter bots
data.europa.eu
unknown
Updated Apr 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2021). Discovery and classification of Twitter bots [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4715885?locale=de
Explore at:
unknown(22740)Available download formats
Dataset updated
Apr 22, 2021
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Online Social Networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. Social influence of large groups may be influenced by user believes or be attracted the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between the bot accounts and human users, as well as the botnet interactions with the user communities and the Twitter trending topics.
The complete list of all entities with the corresponding keywords that were...
plos.figshare.com
xls
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The complete list of all entities with the corresponding keywords that were used for each one. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0270542.t002
Dataset updated
Jun 17, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete list of all entities with the corresponding keywords that were used for each one.
r
Twitter16
resodate.org
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bita Azarijoo; Mostafa Salehi; Shaghayegh Najari (2024). Twitter16 [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdHdpdHRlcjE2
Explore at:
Dataset updated
Dec 16, 2024
Dataset provided by
Leibniz Data Manager
Authors
Bita Azarijoo; Mostafa Salehi; Shaghayegh Najari
Description
Rumor detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks. This paper proposes a novel Claim-guided Hierarchical Graph Attention Network based on undirected interaction graphs to learn graph attention-based embeddings that attend to user interactions for rumor detection.
NLP feature set variables for TwiBot-20.
plos.figshare.com
xls
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agata Skorupka (2024). NLP feature set variables for TwiBot-20. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315849.t001
Dataset updated
Dec 23, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Agata Skorupka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mathias Weiß (2022). Twitter Graph Example v2 43 [Dataset]. https://www.kaggle.com/datasets/weissmedia/twitter-graph-example-v2-43

Twitter Graph Example v2 43

Explore at:

zip(17943518 bytes)Available download formats

Dataset updated

Jun 29, 2022

Authors

Mathias Weiß

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This project is inspired on https://github.com/neo4j-graph-examples/twitter-v2.

Twitter Graph

Show data from your personal Twitter account

The Graph Your Network application inserts your Twitter activity into Neo4j.

https://neo4jsandbox.com/guides/twitter/img/twitter-data-model.svg" alt="">

Content

~10 MB of graphs data (CSV)

43.325 node labels - Hashtag - Link - Me - Source - Tweet - User

57.896 relationship types - AMPLIFIES - CONTAINS - FOLLOWS - INTERACTS_WITH - MENTIONS - POSTS - REPLY_TO - RETWEETS - RT_MENTIONS - SIMILAR_TO - TAGS - USING

Clear search

Close search

Google apps

Main menu

Twitter Graph Example v2 43

Twitter Graph

Content

TwitterFollowGraph

X/Twitter: number of worldwide users 2019-2024

Twitter API Data: Nike, Lululemon, Adidas Tweets

Twitter Connections with User Location

Content

Bibliographic References

Source Publications

A study on real graphs of fake news spreading on Twitter

TwitterFaveGraph

How Popular Is Twitter In The World?

Twitter Posts Network (SNAP)

476 million Twitter tweets

http://an.kaist.ac.kr/traces/WWW2010.html :

Top retweeted users (highest indegree).

The 10 most popular hashtags in our dataset.

Lerman Twitter 2010 Dataset

Graph-Based Social Media Data on Mental Health Topics

Why Do People Use Twitter?

How Popular Is Twitter In The US?

COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...

Data from: Discovery and classification of Twitter bots

The complete list of all entities with the corresponding keywords that were...

Twitter16

NLP feature set variables for TwiBot-20.

Twitter Graph Example v2 43

Twitter Graph

Content