Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.
Comprised of 12 different files, the entry count for each file is shown below.
File | Data Points |
---|---|
users.tsv | 454,458 |
follows.tsv | 4,002,115 |
truths.tsv | 823,927 |
quotes.tsv | 10,508 |
replies.tsv | 506,276 |
media.tsv | 184,884 |
hashtags.tsv | 21,599 |
external_urls.tsv | 173,947 |
truth_hashtag_edges.tsv | 213,295 |
truth_media_edges.tsv | 257,500 |
truth_external_url_edges.tsv | 252,877 |
truth_user_tag_edges.tsv | 145,234 |
A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
more info : https://snap.stanford.edu/data/com-Youtube.html
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The following dataset was obtained by parsing statements and their veracity verdict from Politifact.com. Contains 14k affirmations up till late 2020.
The statements obtained are of 6 categories: True, Mostly True, Half-True, Mostly False, False, Pants on Fire!
This dataset can be used for multiple purposes: attempting to detect truthfulness based on statement language (or conversely, detecting lies), fact-checking integration or just EDA for political purposes.
There are 4 columns in politifact.csv: statement, source, link, veracity.
statement - statement made by celebrity or politician. source - can be a person, but not necessarily. link - URL of affirmation. veracity - degree of truthfulness given by the Politifact.com team.
Other variants have certain classes removed and are binarized (into truths and lies). Have a quick look over this notebook for more details: https://www.kaggle.com/thesergiu/part-1-quick-eda-on-politifact-csv
Initial Source: www.politifact.com Creator GitHub Link: https://github.com/the-sergiu GitHub Repo Link for more context: https://github.com/the-sergiu/TruthDetection
This dataset was created by Vivian
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The real dataset consists of movie evaluations from IMDB, which provides a platform where individuals can evaluate movies on a scale of 1 to 10. If a user rates a movie and clicks the share button, a Twitter message is generated. We then extract the rating from the Twitter message. We treat the ratings on the IMDB website as the event truths, which are based on the aggregated evaluations from all users, whereas our observations come from only a subset of users who share their ratings on Twitter. Using the Twitter API, we collect information about the follower and following relationships between individuals that generate movie evaluation Twitter messages. To better show the influence of social network information on event truth discovery, we delete small subnetworks that consist of less than 5 agents. The final dataset we use consists of 2266 evaluations from 209 individuals on 245 movies (events) and also the social network between these 209 individuals. We regard the social network to be undirected as both follower or following relationships indicate that the two users have similar taste.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/com-Orkut.html
Dataset information
Orkut (http://www.orkut.com/) is a free on-line social network where users
form friendship each other. Orkut also allows users form a group which
other members can then join. We consider such user-defined groups as
ground-truth communities. We provide the Orkut friendship social network
and ground-truth communities. This data is provided by Alan Mislove et al.
(http://socialnetworks.mpi-sws.org/data-imc2007.html)
We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.
Dataset statistics
Nodes 3,072,441
Edges 117,185,083
Nodes in largest WCC 3072441 (1.000)
Edges in largest WCC 117185083 (1.000)
Nodes in largest SCC 3072441 (1.000)
Edges in largest SCC 117185083 (1.000)
Average clustering coefficient 0.1666
Number of triangles 627584181
Fraction of closed triangles 0.01414
Diameter (longest shortest path) 9
90-percentile effective diameter 4.8
Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based
on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233
Files
File Description
com-orkut.ungraph.txt.gz Undirected Orkut network
com-orkut.all.cmty.txt.gz Orkut communities
com-orkut.top5000.cmty.txt.gz Orkut communities (Top 5,000)
The graph in the SNAP data set is 1-based, with nodes numbered 1 to
3,072,626.
In the SuiteSparse Matrix Collection, Problem.A is the undirected
Orkut network, a matrix of size n-by-n with n=3,072,441, which is
the number of unique user id's appearing in any edge.
Problem.aux.nodeid is a list of the node id's that appear in the SNAP data
set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (1-based).
C = Problem.aux.Communities_all is a sparse matrix of size n by 15,301,901
which represents the same number communities in the com-orkut.all.cmty.txt
file. The kth line in that file defines the kth community, and is the
column C(:,k), where where C(i,k)=1 if person nodeid(i) is in the kth
community. Row C(i,:) and row/column i of the A matrix thus refer to the
same person, nodeid(i).
Ctop = Problem.aux.Communities_to...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is generated from multiple interactions between a Social Robot (NAO) and 5th grade students from a private school in São Paulo, Brazil.
In the interaction, the robot approached the content that teachers were approaching at the time with the participants students about the wasting system in Brazil.
The measures here are the readings that the R-CASTLE system did for each answer the students gave to the questions the robot asked.
For more information about how these measures were collected, please refer to this thesis at: https://doi.org/10.11606/T.55.2020.tde-31082020-093935
Since the goal of the R-CASTLE is to provide autonomous adaptation, we built a ground-truth dataset based on human feedback of an expert in education operating the robot in loco. The person was teleoperating the robot to change its behaviour (or not) according to observed values of the participants as Face Gaze, Facial emotion displayed, Number of spoken words, the correctness of the answer (based on pre-defined answers), and the time students took to answer. These measures are the 5th columns of this csv file. The evaluator could decide to increase (1), maintain (0), or decrease (-1) the level of difficulties of the following questions depending on the mentioned observed measures. This is the human true label, stored in the 6th column.
Deviations (integer): number of face deviations of the participant during the question answering identified by the system.
EmotionCount (integer): a balance between "good" and "bad" emotions (good - bad) identified by the system.
NumberWord (integer): number of words comprised in the sentence the participant gave.
SucRate/Ans/RWa: (between 0 and 1, where 0 is completely wrong and 1 is completely right): The success rate of the participant’s answer to that question, based on the expected answer programmed by their teachers.
Time2ans (float): The time spent to answer the question since the robot has finished the question until the end of the participant’s speech in seconds.
True Value (-1, 0, 1): Ground-truth value. Value of adaptation chosen by the human observing the interaction if the system needed to decrease, maintain, or increase the level of difficulty of asked questions.
Final Crisp Value (float): value of calculated fuzzy output based on the implementations in the paper: https://doi.org/10.1145/3395035.3425201
CoAID include diverse COVID-19 healthcare misinformation, including fake news on websites and social platforms, along with users' social engagement about such news. CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets, Jure Leskovec http://snap.stanford.edu/data/index.html email jure at cs.stanford.edu
Citation for the SNAP collection:
@misc{snapnets, author = {Jure Leskovec and Andrej Krevl}, title = {{SNAP Datasets}: {Stanford} Large Network Dataset Collection}, howpublished = {\url{http://snap.stanford.edu/data}}, month = jun, year = 2014 }
The following matrices/graphs were added to the collection in June 2010 by Tim Davis (problem id and name):
2284 SNAP/soc-Epinions1 who-trusts-whom network of Epinions.com 2285 SNAP/soc-LiveJournal1 LiveJournal social network 2286 SNAP/soc-Slashdot0811 Slashdot social network, Nov 2008 2287 SNAP/soc-Slashdot0902 Slashdot social network, Feb 2009 2288 SNAP/wiki-Vote Wikipedia who-votes-on-whom network 2289 SNAP/email-EuAll Email network from a EU research institution 2290 SNAP/email-Enron Email communication network from Enron 2291 SNAP/wiki-Talk Wikipedia talk (communication) network 2292 SNAP/cit-HepPh Arxiv High Energy Physics paper citation network 2293 SNAP/cit-HepTh Arxiv High Energy Physics paper citation network 2294 SNAP/cit-Patents Citation network among US Patents 2295 SNAP/ca-AstroPh Collaboration network of Arxiv Astro Physics 2296 SNAP/ca-CondMat Collaboration network of Arxiv Condensed Matter 2297 SNAP/ca-GrQc Collaboration network of Arxiv General Relativity 2298 SNAP/ca-HepPh Collaboration network of Arxiv High Energy Physics 2299 SNAP/ca-HepTh Collaboration network of Arxiv High Energy Physics Theory 2300 SNAP/web-BerkStan Web graph of Berkeley and Stanford 2301 SNAP/web-Google Web graph from Google 2302 SNAP/web-NotreDame Web graph of Notre Dame 2303 SNAP/web-Stanford Web graph of Stanford.edu 2304 SNAP/amazon0302 Amazon product co-purchasing network from March 2 2003 2305 SNAP/amazon0312 Amazon product co-purchasing network from March 12 2003 2306 SNAP/amazon0505 Amazon product co-purchasing network from May 5 2003 2307 SNAP/amazon0601 Amazon product co-purchasing network from June 1 2003 2308 SNAP/p2p-Gnutella04 Gnutella peer to peer network from August 4 2002 2309 SNAP/p2p-Gnutella05 Gnutella peer to peer network from August 5 2002 2310 SNAP/p2p-Gnutella06 Gnutella peer to peer network from August 6 2002 2311 SNAP/p2p-Gnutella08 Gnutella peer to peer network from August 8 2002 2312 SNAP/p2p-Gnutella09 Gnutella peer to peer network from August 9 2002 2313 SNAP/p2p-Gnutella24 Gnutella peer to peer network from August 24 2002 2314 SNAP/p2p-Gnutella25 Gnutella peer to peer network from August 25 2002 2315 SNAP/p2p-Gnutella30 Gnutella peer to peer network from August 30 2002 2316 SNAP/p2p-Gnutella31 Gnutella peer to peer network from August 31 2002 2317 SNAP/roadNet-CA Road network of California 2318 SNAP/roadNet-PA Road network of Pennsylvania 2319 SNAP/roadNet-TX Road network of Texas 2320 SNAP/as-735 733 daily instances(graphs) from November 8 1997 to January 2 2000 2321 SNAP/as-Skitter Internet topology graph, from traceroutes run daily in 2005 2322 SNAP/as-caida The CAIDA AS Relationships Datasets, from January 2004 to November 2007 2323 SNAP/Oregon-1 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2324 SNAP/Oregon-2 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2325 SNAP/soc-sign-epinions Epinions signed social network 2326 SNAP/soc-sign-Slashdot081106 Slashdot Zoo signed social network from November 6 2008 2327 SNAP/soc-sign-Slashdot090216 Slashdot Zoo signed social network from February 16 2009 2328 SNAP/soc-sign-Slashdot090221 Slashdot Zoo signed social network from February 21 2009
Then the following problems were added in July 2018. All data and metadata from the SNAP data set was imported into the SuiteSparse Matrix Collection.
2777 SNAP/CollegeMsg Messages on a Facebook-like platform at UC-Irvine 2778 SNAP/com-Amazon Amazon product network 2779 SNAP/com-DBLP DBLP collaboration network 2780 SNAP/com-Friendster Friendster online social network 2781 SNAP/com-LiveJournal LiveJournal online social network 2782 SNAP/com-Orkut Orkut online social network 2783 SNAP/com-Youtube Youtube online social network 2784 SNAP/email-Eu-core E-mail network 2785 SNAP/email-Eu-core-temporal E-mails between users at a research institution 2786 SNAP/higgs-twitter twitter messages re: Higgs boson on 4th July 2012. 2787 SNAP/loc-Brightkite Brightkite location based online social network 2788 SNAP/loc-Gowalla Gowalla location based online social network 2789 SNAP/soc-Pokec Pokec online social network 2790 SNAP/soc-sign-bitcoin-alpha Bitcoin Alpha web of trust network 2791 SNAP/soc-sign-bitcoin-otc Bitcoin OTC web of trust network 2792 SNAP/sx-askubuntu Comments, questions, and answers on Ask Ubuntu 2793 SNAP/sx-mathoverflow Comments, questions, and answers on Math Overflow 2794 SNAP/sx-stackoverflow Comments, questions, and answers on Stack Overflow 2795 SNAP/sx-superuser Comments, questions, and answers on Super User 2796 SNAP/twitter7 A collection of 476 million tweets collected between June-Dec 2009 2797 SNAP/wiki-RfA Wikipedia Requests for Adminship (with text) 2798 SNAP/wiki-talk-temporal Users editing talk pages on Wikipedia 2799 SNAP/wiki-topcats Wikipedia hyperlinks (with communities)
The following 13 graphs/networks were in the SNAP data set in July 2018 but have not yet been imported into the SuiteSparse Matrix Collection. They may be added in the future:
amazon-meta ego-Facebook ego-Gplus ego-Twitter gemsec-Deezer gemsec-Facebook ksc-time-series memetracker9 web-flickr web-Reddit web-RedditPizzaRequests wiki-Elec wiki-meta wikispeedia
The 2010 description of the SNAP data set gave these categories:
Social networks: online social networks, edges represent interactions between people
Communication networks: email communication networks with edges representing communication
Citation networks: nodes represent papers, edges represent citations
Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)
Web graphs: nodes represent webpages and edges are hyperlinks
Blog and Memetracker graphs: nodes represent time stamped blog posts, edges are hyperlinks [revised below]
Amazon networks : nodes represent products and edges link commonly co-purchased products
Internet networks : nodes represent computers and edges communication
Road networks : nodes represent intersections and edges roads connecting the intersections
Autonomous systems : graphs of the internet
Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)
By July 2018, the following categories had been added:
Networks with ground-truth communities : ground-truth network communities in social and information networks
Location-based online social networks : Social networks with geographic check-ins
Wikipedia networks, articles, and metadata : Talk, editing, voting, and article data from Wikipedia
Temporal networks : networks where edges have timestamps
Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
Online communities : Data from online communities such as Reddit and Flickr
Online reviews : Data from online review systems such as BeerAdvocate and Amazon
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
It is a widely accepted fact that state-sponsored Twitter accounts operated during the 2016 US presidential election spreading millions of tweets with misinformation and inflammatory political content. Whether these social media campaigns of the so-called "troll" accounts were able to manipulate public opinion is still in question. Here we aim to quantify the influence of troll accounts and the impact they had on Twitter by analyzing 152.5 million tweets from 9.9 million users, including 822 troll accounts. The data collected during the US election campaign, contain original troll tweets before they were deleted by Twitter. From these data, we constructed a very large interaction graph; a directed graph of 9.3 million nodes and 169.9 million edges. Recently, Twitter released datasets on the misinformation campaigns of 8,275 state-sponsored accounts linked to Russia, Iran and Venezuela as part of the investigation on the foreign interference in the 2016 US election. These data serve as ground-truth identifier of troll users in our dataset. Using graph analysis techniques we qualify the diffusion cascades of web and media context that have been shared by the troll accounts. We present strong evidence that authentic users were the source of the viral cascades. Although the trolls were participating in the viral cascades, they did not have a leading role in them and only four troll accounts were truly influential. With this version, we are correcting an error in the Acknowledgments regarding the research funding that supports this work. The correct one is the European Union's Horizon 2020 Research and Innovation program under the Cybersecurity CONCORDIA project (Grant Agreement No. 830927)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology. While computational propaganda influences modern society, it is hard to measure or detect it. Furthermore, with the recent exponential growth in large language models (L.L.M), and the growing concerns about information overload, which makes the alternative truth spheres more noisy than ever before, the complexity and magnitude of computational propaganda is also expected to increase, making their detection even harder. Propaganda in social networks is disguised as legitimate news sent from authentic users. It smartly blended real users with fake accounts. We seek here to detect efforts to manipulate the spread of information in social networks, by one of the fundamental macro-scale properties of rhetoric—repetitiveness. We use 16 data sets of a total size of 13 GB, 10 related to political topics and 6 related to non-political ones (large-scale disasters), each ranging from tens of thousands to a few million of tweets. We compare them and identify statistical and network properties that distinguish between these two types of information cascades. These features are based on both the repetition distribution of hashtags and the mentions of users, as well as the network structure. Together, they enable us to distinguish (p − value = 0.0001) between the two different classes of information cascades. In addition to constructing a bipartite graph connecting words and tweets to each cascade, we develop a quantitative measure and show how it can be used to distinguish between political and non-political discussions. Our method is indifferent to the cascade’s country of origin, language, or cultural background since it is only based on the statistical properties of repetitiveness and the word appearance in tweets bipartite network structures.
ClaimsKG is a knowledge graph of metadata information for 59580 fact-checked claims scraped from 13 fact-checking sites. In addition to providing a single dataset of claims and associated metadata, truth ratings are harmonised and additional information is provided for each claim, e.g., about mentioned entities. Please see (https://data.gesis.org/claimskg/) for further details about the data model and statistics.
The dataset facilitates structured queries about claims, their truth values, involved entities, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia/Wikipedia, and lifts all data to RDF using established vocabularies (such as schema.org).
The latest release of ClaimsKG covers 59580 claims. The data was scraped till August, of 2022 containing claims published between the years 1996-2022 from 13 factchecking websites. The claim-review (fact checking) period for claims ranges between the year 1996 to 2022. Entity fishing python client (https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains a total of 1371271 entities detected and referenced with DBpedia. More information, such as detailed statistics, query examples and a user-friendly interface to explore the knowledge graph is available at: https://data.gesis.org/claimskg/ .
The first two releases of ClaimsKG are hosted at Zenodo (https://doi.org/10.5281/zenodo.3518960), ClaimsKGV1.0 (published on 04.04.2019), ClaimsKGV2.0 (published on 01.09.2019). This latest release of ClaimsKG supersedes the previous versions as it contains all the claims from the previous versions together with additional claims as well as improved entity annotations.
Social media bots pose as humans to influence users with commercial, political or ideological purposes. For example, bots could artificially inflate the popularity of a product by promoting it and/or writing positive ratings, as well as undermine the reputation of competitive products through negative valuations. The threat is even greater when the purpose is political or ideological (see Brexit referendum or US Presidential elections). Fearing the effect of this influence, the German political parties have rejected the use of bots in their electoral campaign for the general elections. Furthermore, bots are commonly related to fake news spreading. Therefore, to approach the identification of bots from an author profiling perspective is of high importance from the point of view of marketing, forensics and security.
After having addressed several aspects of author profiling in social media from 2013 to 2018 (age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating whether the author of a Twitter feed is a bot or a human. Furthermore, in case of human, to profile the gender of the author.
The uncompressed dataset consists in a folder per language (en, es). Each folder contains:
A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.
A truth.txt file with the list of authors and the ground truth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Individual characteristics of deliberate and accidental fake news distributors.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.
Comprised of 12 different files, the entry count for each file is shown below.
File | Data Points |
---|---|
users.tsv | 454,458 |
follows.tsv | 4,002,115 |
truths.tsv | 823,927 |
quotes.tsv | 10,508 |
replies.tsv | 506,276 |
media.tsv | 184,884 |
hashtags.tsv | 21,599 |
external_urls.tsv | 173,947 |
truth_hashtag_edges.tsv | 213,295 |
truth_media_edges.tsv | 257,500 |
truth_external_url_edges.tsv | 252,877 |
truth_user_tag_edges.tsv | 145,234 |
A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.