12 datasets found

Truth Social Dataset

zenodo.org

zip

Updated Jan 13, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger (2023). Truth Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.7531625

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7531625

Dataset updated

Jan 13, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.

Comprised of 12 different files, the entry count for each file is shown below.

File	Data Points
users.tsv	454,458
follows.tsv	4,002,115
truths.tsv	823,927
quotes.tsv	10,508
replies.tsv	506,276
media.tsv	184,884
hashtags.tsv	21,599
external_urls.tsv	173,947
truth_hashtag_edges.tsv	213,295
truth_media_edges.tsv	257,500
truth_external_url_edges.tsv	252,877
truth_user_tag_edges.tsv	145,234

A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.

Orkut Social Network and Communities (SNAP)
kaggle.com
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Orkut Social Network and Communities (SNAP) [Dataset]. https://www.kaggle.com/wolfram77/graphs-snap-com-orkut/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 16, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Orkut social network and ground-truth communities

https://snap.stanford.edu/data/com-Orkut.html

Dataset information

Orkut (http://www.orkut.com/) is a free on-line social network where users form friendship each other. Orkut also allows users form a group which
other members can then join. We consider such user-defined groups as
ground-truth communities. We provide the Orkut friendship social network
and ground-truth communities. This data is provided by Alan Mislove et al. (http://socialnetworks.mpi-sws.org/data-imc2007.html)

We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.

Dataset statistics
Nodes 3,072,441
Edges 117,185,083
Nodes in largest WCC 3072441 (1.000)
Edges in largest WCC 117185083 (1.000)
Nodes in largest SCC 3072441 (1.000)
Edges in largest SCC 117185083 (1.000)
Average clustering coefficient 0.1666
Number of triangles 627584181
Fraction of closed triangles 0.01414
Diameter (longest shortest path) 9
90-percentile effective diameter 4.8

Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

Files
File Description
com-orkut.ungraph.txt.gz Undirected Orkut network
com-orkut.all.cmty.txt.gz Orkut communities
com-orkut.top5000.cmty.txt.gz Orkut communities (Top 5,000)

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

The graph in the SNAP data set is 1-based, with nodes numbered 1 to
3,072,626.

In the SuiteSparse Matrix Collection, Problem.A is the undirected
Orkut network, a matrix of size n-by-n with n=3,072,441, which is
the number of unique user id's appearing in any edge.

Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (1-based).

C = Problem.aux.Communities_all is a sparse matrix of size n by 15,301,901 which represents the same number communities in the com-orkut.all.cmty.txt file. The kth line in that file defines the kth community, and is the
column C(:,k), where where C(i,k)=1 if person nodeid(i) is in the kth
community. Row C(i,:) and row/column i of the A matrix thus refer to the
same person, nodeid(i).

Ctop = Problem.aux.Communities_to...
f
Collected feed statistics.
plos.figshare.com
xls
Updated Nov 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Failla; Giulio Rossetti (2024). Collected feed statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0310330.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310330.t003
Dataset updated
Nov 5, 2024
Dataset provided by
PLOS ONE
Authors
Andrea Failla; Giulio Rossetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
D
Using social network information to discover truth of movie ranking
researchdata.ntu.edu.sg
tsv, txt
Updated Jun 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DR-NTU (Data) (2018). Using social network information to discover truth of movie ranking [Dataset]. http://doi.org/10.21979/N9/L5TTRW
Explore at:
tsv(4143), tsv(26553), txt(1857)Available download formats
Unique identifier
https://doi.org/10.21979/N9/L5TTRW
Dataset updated
Jun 10, 2018
Dataset provided by
DR-NTU (Data)
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The real dataset consists of movie evaluations from IMDB, which provides a platform where individuals can evaluate movies on a scale of 1 to 10. If a user rates a movie and clicks the share button, a Twitter message is generated. We then extract the rating from the Twitter message. We treat the ratings on the IMDB website as the event truths, which are based on the aggregated evaluations from all users, whereas our observations come from only a subset of users who share their ratings on Twitter. Using the Twitter API, we collect information about the follower and following relationships between individuals that generate movie evaluation Twitter messages. To better show the influence of social network information on event truth discovery, we delete small subnetworks that consist of less than 5 agents. The final dataset we use consists of 2266 evaluations from 209 individuals on 245 movies (events) and also the social network between these 209 individuals. We regard the social network to be undirected as both follower or following relationships indicate that the two users have similar taste.
f
Post metadata.
plos.figshare.com
xls
Updated Nov 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Failla; Giulio Rossetti (2024). Post metadata. [Dataset]. http://doi.org/10.1371/journal.pone.0310330.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310330.t001
Dataset updated
Nov 5, 2024
Dataset provided by
PLOS ONE
Authors
Andrea Failla; Giulio Rossetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
P
CoAID Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Limeng Cui; Dongwon Lee, CoAID Dataset [Dataset]. https://paperswithcode.com/dataset/coaid
Explore at:
Authors
Limeng Cui; Dongwon Lee
Description
CoAID include diverse COVID-19 healthcare misinformation, including fake news on websites and social platforms, along with users' social engagement about such news. CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
B
Residential School Locations Dataset (CSV Format)
borealisdata.ca
search.dataone.org
Updated Jun 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/RIYEMU
Dataset updated
Jun 5, 2019
Dataset provided by
Borealis
Authors
Rosa Orlandini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1863 - Jun 30, 1998
Area covered
Canada
Description
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
f
Data sets used for user analysis.
plos.figshare.com
xlsx
Updated Jan 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alon Sela; Omer Neter; Václav Lohr; Petr Cihelka; Fan Wang; Moti Zwilling; John Phillip Sabou; Miloš Ulman (2025). Data sets used for user analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0309688.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0309688.s002
Dataset updated
Jan 30, 2025
Dataset provided by
PLOS ONE
Authors
Alon Sela; Omer Neter; Václav Lohr; Petr Cihelka; Fan Wang; Moti Zwilling; John Phillip Sabou; Miloš Ulman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology. While computational propaganda influences modern society, it is hard to measure or detect it. Furthermore, with the recent exponential growth in large language models (L.L.M), and the growing concerns about information overload, which makes the alternative truth spheres more noisy than ever before, the complexity and magnitude of computational propaganda is also expected to increase, making their detection even harder. Propaganda in social networks is disguised as legitimate news sent from authentic users. It smartly blended real users with fake accounts. We seek here to detect efforts to manipulate the spread of information in social networks, by one of the fundamental macro-scale properties of rhetoric—repetitiveness. We use 16 data sets of a total size of 13 GB, 10 related to political topics and 6 related to non-political ones (large-scale disasters), each ranging from tens of thousands to a few million of tweets. We compare them and identify statistical and network properties that distinguish between these two types of information cascades. These features are based on both the repetition distribution of hashtags and the mentions of users, as well as the network structure. Together, they enable us to distinguish (p − value = 0.0001) between the two different classes of information cascades. In addition to constructing a bipartite graph connecting words and tweets to each cascade, we develop a quantitative measure and show how it can be used to distinguish between political and non-political discussions. Our method is indifferent to the cascade’s country of origin, language, or cultural background since it is only based on the statistical properties of repetitiveness and the word appearance in tweets bipartite network structures.
o
Data from: On the Influence of Twitter Trolls during the 2016 US...
explore.openaire.eu
Updated Oct 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikos Salamanos; Michael J. Jensen; Xinlei He; Yang Chen; Michael Sirivianos (2019). On the Influence of Twitter Trolls during the 2016 US Presidential Election [Dataset]. http://doi.org/10.5281/zenodo.3540801
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3540801, https://identifiers.org/arxiv:1910.00531v2
Dataset updated
Oct 1, 2019
Authors
Nikos Salamanos; Michael J. Jensen; Xinlei He; Yang Chen; Michael Sirivianos
Area covered
United States
Description
It is a widely accepted fact that state-sponsored Twitter accounts operated during the 2016 US presidential election spreading millions of tweets with misinformation and inflammatory political content. Whether these social media campaigns of the so-called "troll" accounts were able to manipulate public opinion is still in question. Here we aim to quantify the influence of troll accounts and the impact they had on Twitter by analyzing 152.5 million tweets from 9.9 million users, including 822 troll accounts. The data collected during the US election campaign, contain original troll tweets before they were deleted by Twitter. From these data, we constructed a very large interaction graph; a directed graph of 9.3 million nodes and 169.9 million edges. Recently, Twitter released datasets on the misinformation campaigns of 8,275 state-sponsored accounts linked to Russia, Iran and Venezuela as part of the investigation on the foreign interference in the 2016 US election. These data serve as ground-truth identifier of troll users in our dataset. Using graph analysis techniques we qualify the diffusion cascades of web and media context that have been shared by the troll accounts. We present strong evidence that authentic users were the source of the viral cascades. Although the trolls were participating in the viral cascades, they did not have a leading role in them and only four troll accounts were truly influential. With this version, we are correcting an error in the Acknowledgments regarding the research funding that supports this work. The correct one is the European Union's Horizon 2020 Research and Innovation program under the Cybersecurity CONCORDIA project (Grant Agreement No. 830927)
Z
PAN19 Authorship Analysis: Bots and Gender Profiling
data.niaid.nih.gov
zenodo.org
Updated Apr 26, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosso, Paolo (2020). PAN19 Authorship Analysis: Bots and Gender Profiling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3530207
Explore at:
Dataset updated
Apr 26, 2020
Dataset provided by
Rangel, Francisco
Rosso, Paolo
Description
Social media bots pose as humans to influence users with commercial, political or ideological purposes. For example, bots could artificially inflate the popularity of a product by promoting it and/or writing positive ratings, as well as undermine the reputation of competitive products through negative valuations. The threat is even greater when the purpose is political or ideological (see Brexit referendum or US Presidential elections). Fearing the effect of this influence, the German political parties have rejected the use of bots in their electoral campaign for the general elections. Furthermore, bots are commonly related to fake news spreading. Therefore, to approach the identification of bots from an author profiling perspective is of high importance from the point of view of marketing, forensics and security.

After having addressed several aspects of author profiling in social media from 2013 to 2018 (age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating whether the author of a Twitter feed is a bot or a human. Furthermore, in case of human, to profile the gender of the author.

The uncompressed dataset consists in a folder per language (en, es). Each folder contains:

A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.

A truth.txt file with the list of authors and the ground truth.
g
ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022)
search.gesis.org
datacatalogue.cessda.eu
Updated Aug 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gangopadhyay, Susmita; Boland, Katarina; Schüller, Sascha; Todorov, Konstantin; Tchechmedjiev, Andon; Zapilko, Benjamin; Fafalios, Pavlos; Jabeen, Hajira; Dietze, Stefan (2022). ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022) [Dataset]. http://doi.org/10.7802/2620
Explore at:
Unique identifier
https://doi.org/10.7802/2620
Dataset updated
Aug 15, 2022
Dataset provided by
GESIS, Köln
GESIS search
Authors
Gangopadhyay, Susmita; Boland, Katarina; Schüller, Sascha; Todorov, Konstantin; Tchechmedjiev, Andon; Zapilko, Benjamin; Fafalios, Pavlos; Jabeen, Hajira; Dietze, Stefan
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Description
ClaimsKG is a knowledge graph of metadata information for 59580 fact-checked claims scraped from 13 fact-checking sites. In addition to providing a single dataset of claims and associated metadata, truth ratings are harmonised and additional information is provided for each claim, e.g., about mentioned entities. Please see (https://data.gesis.org/claimskg/) for further details about the data model and statistics.

The dataset facilitates structured queries about claims, their truth values, involved entities, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia/Wikipedia, and lifts all data to RDF using established vocabularies (such as schema.org). 

The latest release of ClaimsKG covers 59580 claims. The data was scraped till August, of 2022 containing claims published between the years 1996-2022 from 13 factchecking websites. The claim-review (fact checking) period for claims ranges between the year 1996 to 2022. Entity fishing python client (https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains a total of 1371271 entities detected and referenced with DBpedia. More information, such as detailed statistics, query examples and a user-friendly interface to explore the knowledge graph is available at: https://data.gesis.org/claimskg/ .

The first two releases of ClaimsKG are hosted at Zenodo (https://doi.org/10.5281/zenodo.3518960), ClaimsKGV1.0 (published on 04.04.2019), ClaimsKGV2.0 (published on 01.09.2019). This latest release of ClaimsKG supersedes the previous versions as it contains all the claims from the previous versions together with additional claims as well as improved entity annotations.
f
Individual characteristics of deliberate and accidental fake news...
plos.figshare.com
xls
Updated Apr 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerstin Unfried; Jan Priebe (2024). Individual characteristics of deliberate and accidental fake news distributors. [Dataset]. http://doi.org/10.1371/journal.pone.0301818.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301818.t002
Dataset updated
Apr 9, 2024
Dataset provided by
PLOS ONE
Authors
Kerstin Unfried; Jan Priebe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Individual characteristics of deliberate and accidental fake news distributors.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger (2023). Truth Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.7531625

Truth Social Dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7531625

Dataset updated

Jan 13, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Comprised of 12 different files, the entry count for each file is shown below.

File	Data Points
users.tsv	454,458
follows.tsv	4,002,115
truths.tsv	823,927
quotes.tsv	10,508
replies.tsv	506,276
media.tsv	184,884
hashtags.tsv	21,599
external_urls.tsv	173,947
truth_hashtag_edges.tsv	213,295
truth_media_edges.tsv	257,500
truth_external_url_edges.tsv	252,877
truth_user_tag_edges.tsv	145,234

A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.

Clear search

Close search

Google apps

Main menu

Truth Social Dataset

Orkut Social Network and Communities (SNAP)

Orkut social network and ground-truth communities

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

Collected feed statistics.

Using social network information to discover truth of movie ranking

Post metadata.

CoAID Dataset

Residential School Locations Dataset (CSV Format)

Data sets used for user analysis.

Data from: On the Influence of Twitter Trolls during the 2016 US...

PAN19 Authorship Analysis: Bots and Gender Profiling

ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022)

Individual characteristics of deliberate and accidental fake news...

Truth Social DatasetSee More Versions

Truth Social Dataset