16 datasets found

Truth Social Dataset

zenodo.org

zip

Updated Jan 13, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger (2023). Truth Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.7531625

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7531625

Dataset updated

Jan 13, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.

Comprised of 12 different files, the entry count for each file is shown below.

File	Data Points
users.tsv	454,458
follows.tsv	4,002,115
truths.tsv	823,927
quotes.tsv	10,508
replies.tsv	506,276
media.tsv	184,884
hashtags.tsv	21,599
external_urls.tsv	173,947
truth_hashtag_edges.tsv	213,295
truth_media_edges.tsv	257,500
truth_external_url_edges.tsv	252,877
truth_user_tag_edges.tsv	145,234

A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.

Data from: Youtube social network
kaggle.com
zip
Updated Sep 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenzo De Tomasi (2019). Youtube social network [Dataset]. https://www.kaggle.com/datasets/lodetomasi1995/youtube-social-network
Explore at:
zip(10604317 bytes)Available download formats
Dataset updated
Sep 1, 2019
Authors
Lorenzo De Tomasi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.

We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

more info : https://snap.stanford.edu/data/com-Youtube.html
Truth Detection/Deception Detection/Lie Detection
kaggle.com
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TheSergiu (2022). Truth Detection/Deception Detection/Lie Detection [Dataset]. https://www.kaggle.com/datasets/thesergiu/truth-detectiondeception-detectionlie-detection/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
TheSergiu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Truth Detection / Lie Detection / Deception Detection / Fact Checking

Context

The following dataset was obtained by parsing statements and their veracity verdict from Politifact.com. Contains 14k affirmations up till late 2020.

The statements obtained are of 6 categories: True, Mostly True, Half-True, Mostly False, False, Pants on Fire!

This dataset can be used for multiple purposes: attempting to detect truthfulness based on statement language (or conversely, detecting lies), fact-checking integration or just EDA for political purposes.

Content

There are 4 columns in politifact.csv: statement, source, link, veracity.

statement - statement made by celebrity or politician. source - can be a person, but not necessarily. link - URL of affirmation. veracity - degree of truthfulness given by the Politifact.com team.

Other variants have certain classes removed and are binarized (into truths and lies). Have a quick look over this notebook for more details: https://www.kaggle.com/thesergiu/part-1-quick-eda-on-politifact-csv

Don't forget to upvote the dataset if you find it useful!

Acknowledgements

Initial Source: www.politifact.com Creator GitHub Link: https://github.com/the-sergiu GitHub Repo Link for more context: https://github.com/the-sergiu/TruthDetection
titanic-truth
kaggle.com
Updated Dec 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivian (2019). titanic-truth [Dataset]. https://www.kaggle.com/tang7q1/titanictruth/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vivian
Description
Dataset

This dataset was created by Vivian

Contents
D
Using social network information to discover truth of movie ranking
researchdata.ntu.edu.sg
tsv, txt
Updated Jun 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DR-NTU (Data) (2018). Using social network information to discover truth of movie ranking [Dataset]. http://doi.org/10.21979/N9/L5TTRW
Explore at:
tsv(4143), tsv(26553), txt(1857)Available download formats
Unique identifier
https://doi.org/10.21979/N9/L5TTRW
Dataset updated
Jun 10, 2018
Dataset provided by
DR-NTU (Data)
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The real dataset consists of movie evaluations from IMDB, which provides a platform where individuals can evaluate movies on a scale of 1 to 10. If a user rates a movie and clicks the share button, a Twitter message is generated. We then extract the rating from the Twitter message. We treat the ratings on the IMDB website as the event truths, which are based on the aggregated evaluations from all users, whereas our observations come from only a subset of users who share their ratings on Twitter. Using the Twitter API, we collect information about the follower and following relationships between individuals that generate movie evaluation Twitter messages. To better show the influence of social network information on event truth discovery, we delete small subnetworks that consist of less than 5 agents. The final dataset we use consists of 2266 evaluations from 209 individuals on 245 movies (events) and also the social network between these 209 individuals. We regard the social network to be undirected as both follower or following relationships indicate that the two users have similar taste.
f
Post metadata.
plos.figshare.com
xls
Updated Nov 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Failla; Giulio Rossetti (2024). Post metadata. [Dataset]. http://doi.org/10.1371/journal.pone.0310330.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310330.t001
Dataset updated
Nov 5, 2024
Dataset provided by
PLOS ONE
Authors
Andrea Failla; Giulio Rossetti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
Orkut Social Network and Communities (SNAP)
kaggle.com
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Orkut Social Network and Communities (SNAP) [Dataset]. https://www.kaggle.com/wolfram77/graphs-snap-com-orkut/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 16, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Orkut social network and ground-truth communities

https://snap.stanford.edu/data/com-Orkut.html

Dataset information

Orkut (http://www.orkut.com/) is a free on-line social network where users form friendship each other. Orkut also allows users form a group which
other members can then join. We consider such user-defined groups as
ground-truth communities. We provide the Orkut friendship social network
and ground-truth communities. This data is provided by Alan Mislove et al. (http://socialnetworks.mpi-sws.org/data-imc2007.html)

We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.

Dataset statistics
Nodes 3,072,441
Edges 117,185,083
Nodes in largest WCC 3072441 (1.000)
Edges in largest WCC 117185083 (1.000)
Nodes in largest SCC 3072441 (1.000)
Edges in largest SCC 117185083 (1.000)
Average clustering coefficient 0.1666
Number of triangles 627584181
Fraction of closed triangles 0.01414
Diameter (longest shortest path) 9
90-percentile effective diameter 4.8

Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

Files
File Description
com-orkut.ungraph.txt.gz Undirected Orkut network
com-orkut.all.cmty.txt.gz Orkut communities
com-orkut.top5000.cmty.txt.gz Orkut communities (Top 5,000)

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

The graph in the SNAP data set is 1-based, with nodes numbered 1 to
3,072,626.

In the SuiteSparse Matrix Collection, Problem.A is the undirected
Orkut network, a matrix of size n-by-n with n=3,072,441, which is
the number of unique user id's appearing in any edge.

Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (1-based).

C = Problem.aux.Communities_all is a sparse matrix of size n by 15,301,901 which represents the same number communities in the com-orkut.all.cmty.txt file. The kth line in that file defines the kth community, and is the
column C(:,k), where where C(i,k)=1 if person nodeid(i) is in the kth
community. Row C(i,:) and row/column i of the A matrix thus refer to the
same person, nodeid(i).

Ctop = Problem.aux.Communities_to...
Z
Dataset of adaptive Children-Robot Interaction for Education based on...
data.niaid.nih.gov
zenodo.org
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tozadore, Daniel (2025). Dataset of adaptive Children-Robot Interaction for Education based on Autonomous Multimodal Users' Readings [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11174781
Explore at:
Dataset updated
Feb 3, 2025
Dataset provided by
Tozadore, Daniel
Romero, Roseli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of adaptive Children-Robot Interaction for Education based on Autonomous Multimodal Users’ Readings

Background

This dataset is generated from multiple interactions between a Social Robot (NAO) and 5th grade students from a private school in São Paulo, Brazil.

In the interaction, the robot approached the content that teachers were approaching at the time with the participants students about the wasting system in Brazil.

The measures here are the readings that the R-CASTLE system did for each answer the students gave to the questions the robot asked.

For more information about how these measures were collected, please refer to this thesis at: https://doi.org/10.11606/T.55.2020.tde-31082020-093935

Since the goal of the R-CASTLE is to provide autonomous adaptation, we built a ground-truth dataset based on human feedback of an expert in education operating the robot in loco. The person was teleoperating the robot to change its behaviour (or not) according to observed values of the participants as Face Gaze, Facial emotion displayed, Number of spoken words, the correctness of the answer (based on pre-defined answers), and the time students took to answer. These measures are the 5th columns of this csv file. The evaluator could decide to increase (1), maintain (0), or decrease (-1) the level of difficulties of the following questions depending on the mentioned observed measures. This is the human true label, stored in the 6th column.

Description:Each row of this file is a tuple of the autonomous reading the robot made in the 5 first columns, plus the true label in the 6th row (True Value) and the Final Crisp Value using fuzzy classification in the 7th row (Final Crisp Value).

Deviations (integer): number of face deviations of the participant during the question answering identified by the system.

EmotionCount (integer): a balance between "good" and "bad" emotions (good - bad) identified by the system.

NumberWord (integer): number of words comprised in the sentence the participant gave.

SucRate/Ans/RWa: (between 0 and 1, where 0 is completely wrong and 1 is completely right): The success rate of the participant’s answer to that question, based on the expected answer programmed by their teachers.

Time2ans (float): The time spent to answer the question since the robot has finished the question until the end of the participant’s speech in seconds.

True Value (-1, 0, 1): Ground-truth value. Value of adaptation chosen by the human observing the interaction if the system needed to decrease, maintain, or increase the level of difficulty of asked questions.

Final Crisp Value (float): value of calculated fuzzy output based on the implementations in the paper: https://doi.org/10.1145/3395035.3425201

Creators Daniel Tozadore: dtozadore@gmail.comRoseli Romero: rafrance@icmc.usp.br

License: Creative Commons Licenses
P
CoAID Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Limeng Cui; Dongwon Lee, CoAID Dataset [Dataset]. https://paperswithcode.com/dataset/coaid
Explore at:
Authors
Limeng Cui; Dongwon Lee
Description
CoAID include diverse COVID-19 healthcare misinformation, including fake news on websites and social platforms, along with users' social engagement about such news. CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
P
Group SNAP Dataset
paperswithcode.com
Updated Jul 21, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Group SNAP Dataset [Dataset]. https://paperswithcode.com/dataset/group-snap-snap-suitesparse-matrix-collection
Explore at:
Dataset updated
Jul 21, 2018
Description
Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets, Jure Leskovec http://snap.stanford.edu/data/index.html email jure at cs.stanford.edu

Citation for the SNAP collection:

@misc{snapnets, author = {Jure Leskovec and Andrej Krevl}, title = {{SNAP Datasets}: {Stanford} Large Network Dataset Collection}, howpublished = {\url{http://snap.stanford.edu/data}}, month = jun, year = 2014 }

The following matrices/graphs were added to the collection in June 2010 by Tim Davis (problem id and name):

2284 SNAP/soc-Epinions1 who-trusts-whom network of Epinions.com 2285 SNAP/soc-LiveJournal1 LiveJournal social network 2286 SNAP/soc-Slashdot0811 Slashdot social network, Nov 2008 2287 SNAP/soc-Slashdot0902 Slashdot social network, Feb 2009 2288 SNAP/wiki-Vote Wikipedia who-votes-on-whom network 2289 SNAP/email-EuAll Email network from a EU research institution 2290 SNAP/email-Enron Email communication network from Enron 2291 SNAP/wiki-Talk Wikipedia talk (communication) network 2292 SNAP/cit-HepPh Arxiv High Energy Physics paper citation network 2293 SNAP/cit-HepTh Arxiv High Energy Physics paper citation network 2294 SNAP/cit-Patents Citation network among US Patents 2295 SNAP/ca-AstroPh Collaboration network of Arxiv Astro Physics 2296 SNAP/ca-CondMat Collaboration network of Arxiv Condensed Matter 2297 SNAP/ca-GrQc Collaboration network of Arxiv General Relativity 2298 SNAP/ca-HepPh Collaboration network of Arxiv High Energy Physics 2299 SNAP/ca-HepTh Collaboration network of Arxiv High Energy Physics Theory 2300 SNAP/web-BerkStan Web graph of Berkeley and Stanford 2301 SNAP/web-Google Web graph from Google 2302 SNAP/web-NotreDame Web graph of Notre Dame 2303 SNAP/web-Stanford Web graph of Stanford.edu 2304 SNAP/amazon0302 Amazon product co-purchasing network from March 2 2003 2305 SNAP/amazon0312 Amazon product co-purchasing network from March 12 2003 2306 SNAP/amazon0505 Amazon product co-purchasing network from May 5 2003 2307 SNAP/amazon0601 Amazon product co-purchasing network from June 1 2003 2308 SNAP/p2p-Gnutella04 Gnutella peer to peer network from August 4 2002 2309 SNAP/p2p-Gnutella05 Gnutella peer to peer network from August 5 2002 2310 SNAP/p2p-Gnutella06 Gnutella peer to peer network from August 6 2002 2311 SNAP/p2p-Gnutella08 Gnutella peer to peer network from August 8 2002 2312 SNAP/p2p-Gnutella09 Gnutella peer to peer network from August 9 2002 2313 SNAP/p2p-Gnutella24 Gnutella peer to peer network from August 24 2002 2314 SNAP/p2p-Gnutella25 Gnutella peer to peer network from August 25 2002 2315 SNAP/p2p-Gnutella30 Gnutella peer to peer network from August 30 2002 2316 SNAP/p2p-Gnutella31 Gnutella peer to peer network from August 31 2002 2317 SNAP/roadNet-CA Road network of California 2318 SNAP/roadNet-PA Road network of Pennsylvania 2319 SNAP/roadNet-TX Road network of Texas 2320 SNAP/as-735 733 daily instances(graphs) from November 8 1997 to January 2 2000 2321 SNAP/as-Skitter Internet topology graph, from traceroutes run daily in 2005 2322 SNAP/as-caida The CAIDA AS Relationships Datasets, from January 2004 to November 2007 2323 SNAP/Oregon-1 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2324 SNAP/Oregon-2 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2325 SNAP/soc-sign-epinions Epinions signed social network 2326 SNAP/soc-sign-Slashdot081106 Slashdot Zoo signed social network from November 6 2008 2327 SNAP/soc-sign-Slashdot090216 Slashdot Zoo signed social network from February 16 2009 2328 SNAP/soc-sign-Slashdot090221 Slashdot Zoo signed social network from February 21 2009

Then the following problems were added in July 2018. All data and metadata from the SNAP data set was imported into the SuiteSparse Matrix Collection.

2777 SNAP/CollegeMsg Messages on a Facebook-like platform at UC-Irvine 2778 SNAP/com-Amazon Amazon product network 2779 SNAP/com-DBLP DBLP collaboration network 2780 SNAP/com-Friendster Friendster online social network 2781 SNAP/com-LiveJournal LiveJournal online social network 2782 SNAP/com-Orkut Orkut online social network 2783 SNAP/com-Youtube Youtube online social network 2784 SNAP/email-Eu-core E-mail network 2785 SNAP/email-Eu-core-temporal E-mails between users at a research institution 2786 SNAP/higgs-twitter twitter messages re: Higgs boson on 4th July 2012. 2787 SNAP/loc-Brightkite Brightkite location based online social network 2788 SNAP/loc-Gowalla Gowalla location based online social network 2789 SNAP/soc-Pokec Pokec online social network 2790 SNAP/soc-sign-bitcoin-alpha Bitcoin Alpha web of trust network 2791 SNAP/soc-sign-bitcoin-otc Bitcoin OTC web of trust network 2792 SNAP/sx-askubuntu Comments, questions, and answers on Ask Ubuntu 2793 SNAP/sx-mathoverflow Comments, questions, and answers on Math Overflow 2794 SNAP/sx-stackoverflow Comments, questions, and answers on Stack Overflow 2795 SNAP/sx-superuser Comments, questions, and answers on Super User 2796 SNAP/twitter7 A collection of 476 million tweets collected between June-Dec 2009 2797 SNAP/wiki-RfA Wikipedia Requests for Adminship (with text) 2798 SNAP/wiki-talk-temporal Users editing talk pages on Wikipedia 2799 SNAP/wiki-topcats Wikipedia hyperlinks (with communities)

The following 13 graphs/networks were in the SNAP data set in July 2018 but have not yet been imported into the SuiteSparse Matrix Collection. They may be added in the future:

amazon-meta ego-Facebook ego-Gplus ego-Twitter gemsec-Deezer gemsec-Facebook ksc-time-series memetracker9 web-flickr web-Reddit web-RedditPizzaRequests wiki-Elec wiki-meta wikispeedia

The 2010 description of the SNAP data set gave these categories:

Social networks: online social networks, edges represent interactions between people

Communication networks: email communication networks with edges representing communication

Citation networks: nodes represent papers, edges represent citations

Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)

Web graphs: nodes represent webpages and edges are hyperlinks

Blog and Memetracker graphs: nodes represent time stamped blog posts, edges are hyperlinks [revised below]

Amazon networks : nodes represent products and edges link commonly co-purchased products

Internet networks : nodes represent computers and edges communication

Road networks : nodes represent intersections and edges roads connecting the intersections

Autonomous systems : graphs of the internet

Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)

By July 2018, the following categories had been added:

Networks with ground-truth communities : ground-truth network communities in social and information networks

Location-based online social networks : Social networks with geographic check-ins

Wikipedia networks, articles, and metadata : Talk, editing, voting, and article data from Wikipedia

Temporal networks : networks where edges have timestamps

Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets

Online communities : Data from online communities such as Reddit and Flickr

Online reviews : Data from online review systems such as BeerAdvocate and Amazon

https://sparse.tamu.edu/SNAP
B
Residential School Locations Dataset (CSV Format)
borealisdata.ca
search.dataone.org
Updated Jun 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/RIYEMU
Dataset updated
Jun 5, 2019
Dataset provided by
Borealis
Authors
Rosa Orlandini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1863 - Jun 30, 1998
Area covered
Canada
Description
The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.
o
Data from: On the Influence of Twitter Trolls during the 2016 US...
explore.openaire.eu
Updated Oct 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikos Salamanos; Michael J. Jensen; Xinlei He; Yang Chen; Michael Sirivianos (2019). On the Influence of Twitter Trolls during the 2016 US Presidential Election [Dataset]. http://doi.org/10.5281/zenodo.3540801
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3540801, https://identifiers.org/arxiv:1910.00531v2
Dataset updated
Oct 1, 2019
Authors
Nikos Salamanos; Michael J. Jensen; Xinlei He; Yang Chen; Michael Sirivianos
Area covered
United States
Description
It is a widely accepted fact that state-sponsored Twitter accounts operated during the 2016 US presidential election spreading millions of tweets with misinformation and inflammatory political content. Whether these social media campaigns of the so-called "troll" accounts were able to manipulate public opinion is still in question. Here we aim to quantify the influence of troll accounts and the impact they had on Twitter by analyzing 152.5 million tweets from 9.9 million users, including 822 troll accounts. The data collected during the US election campaign, contain original troll tweets before they were deleted by Twitter. From these data, we constructed a very large interaction graph; a directed graph of 9.3 million nodes and 169.9 million edges. Recently, Twitter released datasets on the misinformation campaigns of 8,275 state-sponsored accounts linked to Russia, Iran and Venezuela as part of the investigation on the foreign interference in the 2016 US election. These data serve as ground-truth identifier of troll users in our dataset. Using graph analysis techniques we qualify the diffusion cascades of web and media context that have been shared by the troll accounts. We present strong evidence that authentic users were the source of the viral cascades. Although the trolls were participating in the viral cascades, they did not have a leading role in them and only four troll accounts were truly influential. With this version, we are correcting an error in the Acknowledgments regarding the research funding that supports this work. The correct one is the European Union's Horizon 2020 Research and Innovation program under the Cybersecurity CONCORDIA project (Grant Agreement No. 830927)
f
Data sets used for user analysis.
plos.figshare.com
xlsx
Updated Jan 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alon Sela; Omer Neter; Václav Lohr; Petr Cihelka; Fan Wang; Moti Zwilling; John Phillip Sabou; Miloš Ulman (2025). Data sets used for user analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0309688.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0309688.s002
Dataset updated
Jan 30, 2025
Dataset provided by
PLOS ONE
Authors
Alon Sela; Omer Neter; Václav Lohr; Petr Cihelka; Fan Wang; Moti Zwilling; John Phillip Sabou; Miloš Ulman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology. While computational propaganda influences modern society, it is hard to measure or detect it. Furthermore, with the recent exponential growth in large language models (L.L.M), and the growing concerns about information overload, which makes the alternative truth spheres more noisy than ever before, the complexity and magnitude of computational propaganda is also expected to increase, making their detection even harder. Propaganda in social networks is disguised as legitimate news sent from authentic users. It smartly blended real users with fake accounts. We seek here to detect efforts to manipulate the spread of information in social networks, by one of the fundamental macro-scale properties of rhetoric—repetitiveness. We use 16 data sets of a total size of 13 GB, 10 related to political topics and 6 related to non-political ones (large-scale disasters), each ranging from tens of thousands to a few million of tweets. We compare them and identify statistical and network properties that distinguish between these two types of information cascades. These features are based on both the repetition distribution of hashtags and the mentions of users, as well as the network structure. Together, they enable us to distinguish (p − value = 0.0001) between the two different classes of information cascades. In addition to constructing a bipartite graph connecting words and tweets to each cascade, we develop a quantitative measure and show how it can be used to distinguish between political and non-political discussions. Our method is indifferent to the cascade’s country of origin, language, or cultural background since it is only based on the statistical properties of repetitiveness and the word appearance in tweets bipartite network structures.
c
ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022)
datacatalogue.cessda.eu
search.gesis.org
Updated Mar 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gangopadhyay, Susmita; Boland, Katarina; Schüller, Sascha; Todorov, Konstantin; Tchechmedjiev, Andon; Zapilko, Benjamin; Fafalios, Pavlos; Jabeen, Hajira; Dietze, Stefan (2023). ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022) [Dataset]. http://doi.org/10.7802/2469
Explore at:
Unique identifier
https://doi.org/10.7802/2469
Dataset updated
Mar 31, 2023
Dataset provided by
GESIS - Leibniz-Institut für Sozialwissenschaften
LGI2P / IMT Mines Ales / University of Montpellier
Institute of Computer Science, FORTH-ICS
GESIS - Leibniz-Institut für Sozialwissenschaften & Heinrich-Heine-University Düsseldorf
LIRMM / University of Montpellier
Authors
Gangopadhyay, Susmita; Boland, Katarina; Schüller, Sascha; Todorov, Konstantin; Tchechmedjiev, Andon; Zapilko, Benjamin; Fafalios, Pavlos; Jabeen, Hajira; Dietze, Stefan
Measurement technique
Web scraping
Description
ClaimsKG is a knowledge graph of metadata information for 59580 fact-checked claims scraped from 13 fact-checking sites. In addition to providing a single dataset of claims and associated metadata, truth ratings are harmonised and additional information is provided for each claim, e.g., about mentioned entities. Please see (https://data.gesis.org/claimskg/) for further details about the data model and statistics.

The dataset facilitates structured queries about claims, their truth values, involved entities, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia/Wikipedia, and lifts all data to RDF using established vocabularies (such as schema.org). 

The latest release of ClaimsKG covers 59580 claims. The data was scraped till August, of 2022 containing claims published between the years 1996-2022 from 13 factchecking websites. The claim-review (fact checking) period for claims ranges between the year 1996 to 2022. Entity fishing python client (https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains a total of 1371271 entities detected and referenced with DBpedia. More information, such as detailed statistics, query examples and a user-friendly interface to explore the knowledge graph is available at: https://data.gesis.org/claimskg/ .

The first two releases of ClaimsKG are hosted at Zenodo (https://doi.org/10.5281/zenodo.3518960), ClaimsKGV1.0 (published on 04.04.2019), ClaimsKGV2.0 (published on 01.09.2019). This latest release of ClaimsKG supersedes the previous versions as it contains all the claims from the previous versions together with additional claims as well as improved entity annotations.
Z
PAN19 Authorship Analysis: Bots and Gender Profiling
data.niaid.nih.gov
zenodo.org
Updated Apr 26, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosso, Paolo (2020). PAN19 Authorship Analysis: Bots and Gender Profiling [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3530207
Explore at:
Dataset updated
Apr 26, 2020
Dataset provided by
Rangel, Francisco
Rosso, Paolo
Description
Social media bots pose as humans to influence users with commercial, political or ideological purposes. For example, bots could artificially inflate the popularity of a product by promoting it and/or writing positive ratings, as well as undermine the reputation of competitive products through negative valuations. The threat is even greater when the purpose is political or ideological (see Brexit referendum or US Presidential elections). Fearing the effect of this influence, the German political parties have rejected the use of bots in their electoral campaign for the general elections. Furthermore, bots are commonly related to fake news spreading. Therefore, to approach the identification of bots from an author profiling perspective is of high importance from the point of view of marketing, forensics and security.

After having addressed several aspects of author profiling in social media from 2013 to 2018 (age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating whether the author of a Twitter feed is a bot or a human. Furthermore, in case of human, to profile the gender of the author.

The uncompressed dataset consists in a folder per language (en, es). Each folder contains:

A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.

A truth.txt file with the list of authors and the ground truth.
f
Individual characteristics of deliberate and accidental fake news...
plos.figshare.com
xls
Updated Apr 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kerstin Unfried; Jan Priebe (2024). Individual characteristics of deliberate and accidental fake news distributors. [Dataset]. http://doi.org/10.1371/journal.pone.0301818.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301818.t002
Dataset updated
Apr 9, 2024
Dataset provided by
PLOS ONE
Authors
Kerstin Unfried; Jan Priebe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Individual characteristics of deliberate and accidental fake news distributors.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger (2023). Truth Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.7531625

Truth Social Dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7531625

Dataset updated

Jan 13, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Comprised of 12 different files, the entry count for each file is shown below.

File	Data Points
users.tsv	454,458
follows.tsv	4,002,115
truths.tsv	823,927
quotes.tsv	10,508
replies.tsv	506,276
media.tsv	184,884
hashtags.tsv	21,599
external_urls.tsv	173,947
truth_hashtag_edges.tsv	213,295
truth_media_edges.tsv	257,500
truth_external_url_edges.tsv	252,877
truth_user_tag_edges.tsv	145,234

A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.

Clear search

Close search

Google apps

Main menu

Truth Social Dataset

Data from: Youtube social network

Truth Detection/Deception Detection/Lie Detection

Truth Detection / Lie Detection / Deception Detection / Fact Checking

Context

Content

Don't forget to upvote the dataset if you find it useful!

Acknowledgements

titanic-truth

Dataset

Contents

Using social network information to discover truth of movie ranking

Post metadata.

Orkut Social Network and Communities (SNAP)

Orkut social network and ground-truth communities

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

Dataset of adaptive Children-Robot Interaction for Education based on...

Dataset of adaptive Children-Robot Interaction for Education based on Autonomous Multimodal Users’ Readings

Background

Description:Each row of this file is a tuple of the autonomous reading the robot made in the 5 first columns, plus the true label in the 6th row (True Value) and the Final Crisp Value using fuzzy classification in the 7th row (Final Crisp Value).

Creators Daniel Tozadore: dtozadore@gmail.comRoseli Romero: rafrance@icmc.usp.br

License: Creative Commons Licenses

CoAID Dataset

Group SNAP Dataset

Residential School Locations Dataset (CSV Format)

Data from: On the Influence of Twitter Trolls during the 2016 US...

Data sets used for user analysis.

ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022)

PAN19 Authorship Analysis: Bots and Gender Profiling

Individual characteristics of deliberate and accidental fake news...

Truth Social DatasetSee More Versions

Truth Social Dataset