Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.
Comprised of 12 different files, the entry count for each file is shown below.
File | Data Points |
---|---|
users.tsv | 454,458 |
follows.tsv | 4,002,115 |
truths.tsv | 823,927 |
quotes.tsv | 10,508 |
replies.tsv | 506,276 |
media.tsv | 184,884 |
hashtags.tsv | 21,599 |
external_urls.tsv | 173,947 |
truth_hashtag_edges.tsv | 213,295 |
truth_media_edges.tsv | 257,500 |
truth_external_url_edges.tsv | 252,877 |
truth_user_tag_edges.tsv | 145,234 |
A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
more info : https://snap.stanford.edu/data/com-Youtube.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology. While computational propaganda influences modern society, it is hard to measure or detect it. Furthermore, with the recent exponential growth in large language models (L.L.M), and the growing concerns about information overload, which makes the alternative truth spheres more noisy than ever before, the complexity and magnitude of computational propaganda is also expected to increase, making their detection even harder. Propaganda in social networks is disguised as legitimate news sent from authentic users. It smartly blended real users with fake accounts. We seek here to detect efforts to manipulate the spread of information in social networks, by one of the fundamental macro-scale properties of rhetoric—repetitiveness. We use 16 data sets of a total size of 13 GB, 10 related to political topics and 6 related to non-political ones (large-scale disasters), each ranging from tens of thousands to a few million of tweets. We compare them and identify statistical and network properties that distinguish between these two types of information cascades. These features are based on both the repetition distribution of hashtags and the mentions of users, as well as the network structure. Together, they enable us to distinguish (p − value = 0.0001) between the two different classes of information cascades. In addition to constructing a bipartite graph connecting words and tweets to each cascade, we develop a quantitative measure and show how it can be used to distinguish between political and non-political discussions. Our method is indifferent to the cascade’s country of origin, language, or cultural background since it is only based on the statistical properties of repetitiveness and the word appearance in tweets bipartite network structures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.
Task
Fake news has become one of the main threats of our society. Although fake news is not a new phenomenon, the exponential growth of social media has offered an easy platform for their fast propagation. A great amount of fake news, and rumors are propagated in online social networks with the aim, usually, to deceive users and formulate specific opinions. Users play a critical role in the creation and propagation of fake news online by consuming and sharing articles with inaccurate information either intentionally or unintentionally. To this end, in this task, we aim at identifying possible fake news spreaders on social media as a first step towards preventing fake news from being propagated among online users.
After having addressed several aspects of author profiling in social media from 2013 to 2019 (bot detection, age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating if it is possbile to discriminate authors that have shared some fake news in the past from those that, to the best of our knowledge, have never done it.
As in previous years, we propose the task from a multilingual perspective:
NOTE: Although we recommend to participate in both languages (English and Spanish), it is possible to address the problem just for one language.
Data
Input
The uncompressed dataset consists in a folder per language (en, es). Each folder contains:
The format of the XML files is:
The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.
b2d5748083d6fdffec6c2d68d4d4442d:::0 2bed15d46872169dc7deaf8d2b43a56:::0 8234ac5cca1aed3f9029277b2cb851b:::1 5ccd228e21485568016b4ee82deb0d28:::0 60d068f9cafb656431e62a6542de2dc0:::1 ...
Output
Your software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:
The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.
IMPORTANT! Languages should not be mixed. A folder should be created for each language and place inside only the files with the prediction for this language.
Evaluation
The performance of your system will be ranked by accuracy. For each language, we will calculate individual accuracies in discriminating between the two classes. Finally, we will average the accuracy values per language to obtain the final ranking.
Submission
Once you finished tuning your approach on the validation set, your software will be tested on the test set. During the competition, the test set will not be released publicly. Instead, we ask you to submit your software for evaluation at our site as described below.
We ask you to prepare your software so that it can be executed via command line calls. The command shall take as input (i) an absolute path to the directory of the test corpus and (ii) an absolute path to an empty output directory:
mySoftware -i INPUT-DIRECTORY -o OUTPUT-DIRECTORY
Within OUTPUT-DIRECTORY
, we require two subfolders: en
and es
, one folder per language, respectively. As the provided output directory is guaranteed to be empty, your software needs to create those subfolders. Within each of these subfolders, you need to create one xml file per author. The xml file looks like this:
The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.
Note: By submitting your software you retain full copyrights. You agree to grant us usage rights only for the purpose of the PAN competition. We agree not to share your software with a third party or use it for other purposes than the PAN competition.
Related Work
Since 1970, scores of states have established truth commissions to document political violence. Despite their prevalence and potential consequence, the question of why commissions are adopted in some contexts, but not in others, is not well understood. Relatedly, little is known about why some commissions possess strong investigative powers while others do not. I argue that the answer to both questions lies with domestic and international civil society actors, who are connected by a global transitional justice (TJ) network and who share the burden of guiding commission adoption and design. I propose that commissions are more likely to be adopted where network members can leverage information and moral authority over governments. I also suggest that commissions are more likely to possess strong powers where international experts, who steward TJ best practices, advise governments. I evaluate these expectations by analyzing two datasets in the novel Varieties of Truth Commissions Project, interviews with representatives from international non-governmental organizations, interviews with Guatemalan non-governmental organization leaders, a focus group with Argentinian human rights advocates, and a focus group at the International Center for Transitional Justice. My results indicate that network members share the burden—domestic members are essential to commission adoption, while international members are important for strong commission design.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the LiveJournal friendship social network and ground-truth communities.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
Friendster is an on-line gaming network. Before re-launching as a game website, Friendster was a social networking site where users can form friendship edge each other. Friendster social network also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. For the social network, we take the induced subgraph of the nodes that either belong to at least one community or are connected to other nodes that belong to at least one community. This data is provided by The Web Archive Project, where the full graph is available.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
Orkut is a free on-line social network where users form friendship each other. Orkut also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the Orkut friendship social network and ground-truth communities. This data is provided by Alan Mislove et al.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
The DBLP computer science bibliography provides a comprehensive list of research papers in computer science. We construct a co-authorship network where two authors are connected if they publish at least one paper together. Publication venue, e.g, journal or conference, defines an individual ground-truth community; authors who published to a certain journal or conference form a community.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
Network was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains an undirected edge from i to j. Each product category provided by Amazon defines each ground-truth community.
We regard each connected component in a product category as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
The network was generated using email data from a large European research institution. We have anonymized information about all incoming and outgoing email between members of the research institution. Th...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The real dataset consists of movie evaluations from IMDB, which provides a platform where individuals can evaluate movies on a scale of 1 to 10. If a user rates a movie and clicks the share button, a Twitter message is generated. We then extract the rating from the Twitter message. We treat the ratings on the IMDB website as the event truths, which are based on the aggregated evaluations from all users, whereas our observations come from only a subset of users who share their ratings on Twitter. Using the Twitter API, we collect information about the follower and following relationships between individuals that generate movie evaluation Twitter messages. To better show the influence of social network information on event truth discovery, we delete small subnetworks that consist of less than 5 agents. The final dataset we use consists of 2266 evaluations from 209 individuals on 245 movies (events) and also the social network between these 209 individuals. We regard the social network to be undirected as both follower or following relationships indicate that the two users have similar taste.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is generated from multiple interactions between a Social Robot (NAO) and 5th grade students from a private school in São Paulo, Brazil.
In the interaction, the robot approached the content that teachers were approaching at the time with the participants students about the wasting system in Brazil.
The measures here are the readings that the R-CASTLE system did for each answer the students gave to the questions the robot asked.
For more information about how these measures were collected, please refer to this thesis at: https://doi.org/10.11606/T.55.2020.tde-31082020-093935
Since the goal of the R-CASTLE is to provide autonomous adaptation, we built a ground-truth dataset based on human feedback of an expert in education operating the robot in loco. The person was teleoperating the robot to change its behaviour (or not) according to observed values of the participants as Face Gaze, Facial emotion displayed, Number of spoken words, the correctness of the answer (based on pre-defined answers), and the time students took to answer. These measures are the 5th columns of this csv file. The evaluator could decide to increase (1), maintain (0), or decrease (-1) the level of difficulties of the following questions depending on the mentioned observed measures. This is the human true label, stored in the 6th column.
Deviations (integer): number of face deviations of the participant during the question answering identified by the system.
EmotionCount (integer): a balance between "good" and "bad" emotions (good - bad) identified by the system.
NumberWord (integer): number of words comprised in the sentence the participant gave.
SucRate/Ans/RWa: (between 0 and 1, where 0 is completely wrong and 1 is completely right): The success rate of the participant’s answer to that question, based on the expected answer programmed by their teachers.
Time2ans (float): The time spent to answer the question since the robot has finished the question until the end of the participant’s speech in seconds.
True Value (-1, 0, 1): Ground-truth value. Value of adaptation chosen by the human observing the interaction if the system needed to decrease, maintain, or increase the level of difficulty of asked questions.
Final Crisp Value (float): value of calculated fuzzy output based on the implementations in the paper: https://doi.org/10.1145/3395035.3425201
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is the multimodal SWELL knowledge work (SWELL-KW) dataset for research on stress and user modeling. The dataset was collected in an experiment, in which 25 people performed typical knowledge work (writing reports, making presentations, reading e-mail, searching for information). We manipulated their working conditions with the stressors: email interruptions and time pressure. A varied set of data was recorded: computer logging, facial expression from camera recordings, body postures from a Kinect 3D sensor and heart rate (variability) and skin conductance from body sensors. Our dataset not only contains raw data, but also preprocessed data and extracted features. The participants' subjective experience on task load, mental effort, emotion and perceived stress was assessed with validated questionnaires as a ground truth. The resulting dataset on working behavior and affect is suitable for several research fields, such as work psychology, user modeling and context aware systems.The collection of this dataset was supported by the Dutch national program COMMIT (project P7 SWELL). SWELL is an acronym of Smart Reasoning Systems for Well-being at Work and at Home. Notes on the content of the dataset:- The uLog XML files refer to documents in the dataset. Most extensions of these files have changed due to file conversions. The original extension is now included in the file names at the end.- Due to copyrights not all original documents and images are included in the dataset.- Variable C in 'D - Physiology features (HR_HRV_SCL - final).csv' refers to the type of block, 1, 2 or 3.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/com-Youtube.html
Dataset information
Youtube (http://www.youtube.com/) is a video-sharing web site that includes
a social network. In the Youtube social network, users form friendship each
other and users can create groups which other users can join. We consider
such user-defined groups as ground-truth communities. This data is provided
by Alan Mislove et al.
(http://socialnetworks.mpi-sws.org/data-imc2007.html)
We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.
Network statistics
Nodes 1,134,890
Edges 2,987,624
Nodes in largest WCC 1134890 (1.000)
Edges in largest WCC 2987624 (1.000)
Nodes in largest SCC 1134890 (1.000)
Edges in largest SCC 2987624 (1.000)
Average clustering coefficient 0.0808
Number of triangles 3056386
Fraction of closed triangles 0.002081
Diameter (longest shortest path) 20
90-percentile effective diameter 6.5
Community statistics
Number of communities 8,385
Average community size 13.50
Average membership size 0.10
Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based
on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233
Files
File Description
com-youtube.ungraph.txt.gz Undirected Youtube network
com-youtube.all.cmty.txt.gz Youtube communities
com-youtube.top5000.cmty.txt.gz Youtube communities (Top 5,000)
The graph in the SNAP data set is 1-based, with nodes numbered 1 to
1,157,827.
In the SuiteSparse Matrix Collection, Problem.A is the undirected Youtube
network, a matrix of size n-by-n with n=1,134,890, which is the number of
unique user id's appearing in any edge.
Problem.aux.nodeid is a list of the node id's that appear in the SNAP data
set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (1-based).
C = Problem.aux.Communities_all is a sparse matrix of size n by 16,386
which represents the communities in the com-youtube.all.cmty.txt file.
The kth line in that file defines the kth community, and is the column
C(:,k), where C(i,k)=1 if person ...
Data derived from weekly public opinion polls in the Netherlands in 1996 concerning social and political issues. Samples were drawn from the Dutch population aged 18 years and older.All data from the surveys held between 1962 and 2000 are available in the DANS data collections.Background variables:Sex / age / religion / income / vote recall latest elections / party preference / if stated not knowing what party to vote for at next elections: what party will have most chances that respondent will vote for? / level of education / union membership / professional status / left‐right rating / party alignment / province / degree of urbanization / weight factor.Topical variables:n9605: If VVD will be largest party after elections, whom from VVD for prime-minister: Bolkestein / Wiegel / Someone else.n9606: If VVD will be largest party after elections, whom from VVD for prime-minister: Bolkestein / Wiegel / Someone else.n9607: If VVD will be largest party after elections, whom from VVD for prime-minister: Bolkestein / Wiegel / Someone else.n9608: If VVD will be largest party after elections, whom from VVD for prime-minister: Bolkestein / Wiegel / Someone else.n9615: Who to decide how to govern our country: Bolkestein, Borst, Heerma, de Hoop Scheffer, Jorritsma, Kok, Lubbers, van Mierlo, Rosenmuller, Rottenberg, Sorgdrager, Terpstra, Wallage, Wiegel, Wolffensperger / In two years time new Parliamentary elections: from what parties should there be ministers in Dutch cabinet? / Knowing what parties form government at this time? / Mention parties that form present government / Who to become prime-minister after elections in two years' time?n9634: Type of people in general speaking truth, being honest teachers, doctors, priests, vicars, TV-newsreaders, professors, judges, 'man or woman in the street', police, survey researchers, civil servants, union-leaders, business-people, politicians, ministers, journalists, scientists (for example natural scientists, chemical scientists, etc.). Reading specific newspapers like: Algemeen Dagblad, De Telegraaf, Het Nieuws van de Dag, NRC Handelsblad, Het Parool, Trouw, De Volkskrant. Reading these newspapers regularly or not.n9636: Statements about politicians and political parties: Members of Parliament are not concerned with opinions of people like me / At elections political parties only interested in somebody's vote not in one's opinion.n9638: Placing particular political parties on left-right scale: CDA, PvdA, VVD, D66. Placing particular politicians on left-right scale: Bolkestein, de Boer, Heerma, de Hoop Scheffer, Jorritsma, Kok, Lubbers, van Mierlo, Wallage, Wiegel, Wolffensperger, Weijers. Data derived from weekly public opinion polls in the Netherlands concerning social and political issues. Samples were drawn from the Dutch population aged 21 or 18 years and older. The weekly data are available as separate files in annual records, containing overviews of the standard background variables as well as the topical variables.The dataset 'NIPO weeksurveys 1962-2000 (Creator: R.N. Eisinga, Radboud Universiteit Nijmegen' ) contains a cumulative datafile with a selection of the standard background variables: political party vote last election / political party vote intention / left-right political self-rating / union membership / sex / age / religious denomination / education / income / occupational status / province / municipality size and codes / postal code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study investigates the features of fake news networks and how they spread during the 2020 South Korean election. Using Actor-Network Theory (ANT), we assessed the network's central players and how they are connected. Results reveal the characteristics of the videoclips and channel networks responsible for the propagation of fake news. Analysis of the videoclip network reveals a high number of detected fake news videos and a high density of connections among users. Assessment of news videoclips on both actual and fake news networks reveals that the real news network is more concentrated. However, the scale of the network may play a role in these variations. Statistics for network centralization reveal that users are spread out over the network, pointing to its decentralized character. A closer look at the real and fake news networks inside videos and channels reveals similar trends. We find that the density of the real news videoclip network is higher than that of the fake news network, whereas the fake news channel networks are denser than their real news counterparts, which may indicate greater activity and interconnectedness in their transmission. We also found that fake news videoclips had more likes than real news videoclips, whereas real news videoclips had more dislikes than fake news videoclips. These findings strongly suggest that fake news videoclips are more accepted when people watch them on YouTube. In addition, we used semantic networks and automated content analysis to uncover common language patterns in fake news which helps us better understand the structure and dynamics of the networks involved in the dissemination of fake news. The findings reported here provide important insights on how fake news spread via social networks during the South Korean election of 2020. The results of this study have important implications for the campaign against fake news and ensuring factual coverage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains sentences in Amharic and their corresponding translations in English that were collected using crowd sourcing. These ground-truth sentences are from across different domains such as news headlines, social media, Wikipedia and everyday conversation.
amen.tsv - Domain: news | wiki | twitter | convo - Source Sentence: Amharic sentence - Reference Translation: English translation - Google Translate: output of Google Translate - Yandex Translate: output of Yandex Translate
enam.tsv - Domain: news | wiki | twitter | convo - Source Sentence: English sentence - Reference Translation: Amharic translation - Google Translate: output of Google Translate - Yandex Translate: output of Yandex Translate
News - These are news headlines from Ethiopian news websites.
Wikipedia - A random sample of sentences from the Amharic Wikipedia.
Twitter - Amharic Twitter posts on consumer products.
Conversational - Everyday conversational expressions from Amharic native speakers.
The dataset also contains evaluation of two commercial systems: "https://translate.google.com/">Google Translate and "https://translate.yandex.com/">Yandex Translate. Both systems provide free APIs that users can sign up and get access keys. The translations were generated on 14th February 2020.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.
H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/com-LiveJournal.html
Dataset information
LiveJournal (http://www.livejournal.com/) is a free on-line blogging
community where users declare friendship each other. LiveJournal also
allows users form a group which other members can then join. We consider
such user-defined groups as ground-truth communities. We provide the
LiveJournal friendship social network and ground-truth communities.
We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.
Dataset statistics
Nodes 3,997,962
Edges 34,681,189
Nodes in largest WCC 3997962 (1.000)
Edges in largest WCC 34681189 (1.000)
Nodes in largest SCC 3997962 (1.000)
Edges in largest SCC 34681189 (1.000)
Average clustering coefficient 0.2843
Number of triangles 177820130
Fraction of closed triangles 0.04559
Diameter (longest shortest path) 17
90-percentile effective diameter 6.5
Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based
on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233
Files
File Description
com-lj.ungraph.txt.gz Undirected LiveJournal network
com-lj.all.cmty.txt.gz LiveJournal communities
com-lj.top5000.cmty.txt.gz LiveJournal communities (Top 5,000)
The graph in the SNAP data set is 0-based, with nodes numbering 0 to
4,036,537.
In the SuiteSparse Matrix Collection, Problem.A is the undirected
LiveJournal network, a matrix of size n-by-n with n=3,997,962, which is
the number of unique user id's appearing in any edge.
Problem.aux.nodeid is a list of the node id's that appear in the SNAP data
set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (0-based).
C = Problem.aux.Communities_all is a sparse matrix of size n by 664,414
which represents the communities in the com-lj.all.cmty.txt file. The kth
line in that file defines the kth community, and is the column C(:,k),
where C(i,k)=1 if person nodeid(i) is in the kth community. Row C(i,:)
and row/column i of the A matrix thus refer to the same person, nodeid(i).
Ctop = Problem.aux.Communities_top5000 is n-by-5000, with the same
structure as the C array above, with the content of the
com-lj.top5000.cmty.txt file.
Friendster is an on-line gaming network. Before re-launching as a game website, Friendster was a social networking site where users can form friendship edge each other. Friendster social network also allows users form a group which other members can then join. The Friendster dataset consist of ground-truth communities (based on user-defined groups) and the social network from induced subgraph of the nodes that either belong to at least one community or are connected to other nodes that belong to at least one community.
Attitudes towards religious practices. Topics: assessment of personal happiness; attitudes towards pre-marital sexual intercourse; attitudes towards committed adultery; attitudes towards homosexual relationships between adults; attitudes towards abortion in case of serious disability or illness of the baby or low income of the family; attitudes towards gender roles in marriage; trust in institutions (parliament, business and industry, churches and religious organizations, courts and the legal system, schools and the educational system); mobility; attitudes towards the influence of religious leaders on voters and government; attitudes towards the benefits of science and religion (scale: modern science does more harm than good, too much trust in science and not enough in religious faith, religions bring more conflicts than peace, intolerance of people with very strong religious beliefs); judgment on the power of churches and religious organizations; attitudes towards equal rights for all religious groups in the country and respect for all religions; acceptance of persons from a different religion or with different religious views in case of marrying a relative or being a candidate of the preferred political party (social distance); attitudes towards the allowance for religious extremists to hold public meetings and to publish books expressing their views (freedom of expression); doubt or firm belief in God (deism, scale); belief in: a life after death, heaven, hell, religious miracles, reincarnation, Nirvana, supernatural powers of deceased ancestors; attitudes towards a higher truth and towards meaning of life (scale: God is concerned with every human being personally, little that people can do to change the course of their lives (fatalism), life is meaningful only because God exists, life does not serve any purpose, life is only meaningful if someone provides the meaning himself, connection with God without churches or religious services); religious preference (affiliation) of mother, father and spouse/partner; additional country specific for Kenya: religious preference (affiliation) of mother, father and spouse/partner; religion respondent was raised in; additional country specific for Kenya: religion respondent was raised in; frequency of church attendance (of attendance in religious services) of father and mother; personal frequency of church attendance when young; frequency of prayers and participation in religious activities; shrine, altar or a religious object in respondent’s home; frequency of visiting a holy place (shrine, temple, church or mosque) for religious reasons except regular religious services; self-classification of personal religiousness and spirituality; truth in one or in all religions; attitudes towards the profits of practicing a religion (scale: finding inner peace and happiness, making friends, gaining comfort in times of trouble and sorrow, meeting the right kind of people). Optional items: conversion of faith after crucial experience; personal sacrifice as an expression of faith such as fasting or following a special diet during holy season such as Lent or Ramadan. Demography: sex; age; marital status; steady life partner; years of schooling; highest education level; country specific education and degree; current employment status (respondent and partner); hours worked weekly; occupation (ISCO 1988) (respondent and partner); supervising function at work; working for private or public sector or self-employed (respondent and partner); if self-employed: number of employees; trade union membership; earnings of respondent (country specific); family income (country specific); size of household; household composition; party affiliation (left-right); country specific party affiliation; participation in last election; religious denomination; religious main groups; attendance of religious services; self-placement on a top-bottom scale; region (country specific); size of community (country specific); type of community: urban-rural area; country of origin or ethnic group affiliation; additional country specific for Kenya and Tanzania: ethnic group affiliation. Additionally coded: administrative mode of data-collection; case substitution; weighting factor. Einstellung zur religiösen Praxis. Themen: Einschätzung des persönlichen Glücksgefühls; Einstellung zu vorehelichem Geschlechtsverkehr und zu außerehelichem Geschlechtsverkehr (Ehebruch); Einstellung zu homosexuellen Beziehungen zwischen Erwachsenen; Einstellung zu Abtreibung im Falle von Behinderung oder Krankheit des Babys und im Falle geringen Einkommens der Familie; Rollenverständnis in der Ehe; Vertrauen in Institutionen (Parlament, Unternehmen und Industrie, Kirche und religiöse Organisationen, Gerichte und Rechtssystem, Schulen und Bildungssystem); eigene Mobilität; Einstellung zum Einfluss von religiösen Führern auf Wähler und Regierung; Einstellung zu Wissenschaft und Religion (Skala: moderne Wissenschaft bringt mehr Schaden als Nutzen, zu viel Vertrauen in die Wissenschaft und zu wenig religiöses Vertrauen, Religionen bringen mehr Konflikte als Frieden, Intoleranz von Menschen mit starken religiösen Überzeugungen); Beurteilung der Macht von Kirchen und religiösen Organisationen im Lande; Einstellung zur Gleichberechtigung aller religiösen Gruppen im Land und Respekt für alle Religionen; Akzeptanz einer Person anderen Glaubens oder mit unterschiedlichen religiösen Ansichten als Ehepartner im Verwandtschaftskreis sowie als Kandidat der präferierten Partei (soziale Distanz); Einstellung zur öffentlichen Redefreiheit bzw. zum Publikationsrecht für religiöse Extremisten; Zweifel oder fester Glaube an Gott (Skala Deismus); Glaube an: ein Leben nach dem Tod, Himmel, Hölle, Wunder, Reinkarnation, Nirwana, übernatürliche Kräfte verstorbener Vorfahren; Einstellung zu einer höheren Wahrheit und zum Sinn des Lebens (Gott kümmert sich um jeden Menschen persönlich, nur wenig persönlicher Einfluss auf das Leben möglich (Fatalismus), Leben hat nur einen Sinn aufgrund der Existenz Gottes, Leben dient keinem Zweck, eigenes Tun verleiht dem Leben Sinn, persönliche Verbindung mit Gott ohne Kirche oder Gottesdienste); Religion der Mutter, des Vaters und des Ehepartners bzw. Partners; zusätzlich länderspezifisch für Kenia: Religion der Mutter, des Vaters und des Ehepartners bzw. Partners; Religion, mit der der Befragte aufgewachsen ist; zusätzlich länderspezifisch für Kenia: Religion, mit der der Befragte aufgewachsen ist; Kirchgangshäufigkeit des Vaters und der Mutter; persönliche Kirchgangshäufigkeit in der Jugend; Häufigkeit des Betens und der Teilnahme an religiösen Aktivitäten; Schrein, Altar oder religiöses Objekt (z.B. Kreuz) im Haushalt des Befragten; Häufigkeit des Besuchs eines heiligen Ortes (Schrein, Tempel, Kirche oder Moschee) aus religiösen Gründen; Selbsteinschätzung der Religiosität und Spiritualität; Wahrheit in einer oder in allen Religionen; Vorteilhaftigkeit der Ausübung einer Religion (Skala: inneren Frieden und Glück finden, Freundschaften schließen, Unterstützung in schwierigen Zeiten, Gleichgesinnte treffen). Optionale Items: Bekehrung zum Glauben nach einem Schlüsselerlebnis; persönliche Opfer als Ausdruck des Glaubens wie Fasten oder Einhalten einer speziellen Diät während heiliger Zeiten wie z.B. Ramadan. Demographie: Geschlecht; Alter; Familienstand; Zusammenleben mit einem Partner; Jahre der Schulbildung, höchster Bildungsabschluss; länderspezifischer Bildungsgrad; derzeitiger Beschäftigungsstatus des Befragten und seines Partners; Wochenarbeitszeit; Beruf (ISCO-88) des Befragten und seines Partners; Vorgesetztenfunktion; Beschäftigung im privaten oder öffentlichen Dienst oder Selbständigkeit des Befragten und seines Partners; Selbständige wurden gefragt: Anzahl der Beschäftigten; Gewerkschaftsmitgliedschaft; Einkommensquellen des Befragten (länderspezifisch), Haushaltseinkommen (länderspezifisch); Haushaltsgröße; Haushaltszusammensetzung; Parteipräferenz (links-rechts), länderspezifische Parteipräferenz; Wahlbeteiligung bei der letzten Wahl; Konfession; Kirchgangshäufigkeit; Selbsteinstufung auf einer Oben-Unten-Skala; Region und Ortsgröße (länderspezifisch), Urbanisierungsgrad; Geburtsland und ethnische Herkunft; zusätzlich länderspezifisch für Kenia und Tansania: ethnische Herkunft. Zusätzlich verkodet wurde: Datenerhebungsart; case substitution; Gewichtungsfaktoren.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
As the COVID-19 virus quickly spreads around the world, unfortunately, misinformation related to COVID-19 also gets created and spreads like wild fire. Such misinformation has caused confusion among people, disruptions in society, and even deadly consequences in health problems. To be able to understand, detect, and mitigate such COVID-19 misinformation, therefore, has not only deep intellectual values but also huge societal impacts. To help researchers combat COVID-19 health misinformation, this dataset created.
#
#
https://img.etimg.com/thumb/msid-65836641,width-640,resizemode-4,imgsize-272192/fake-news.jpg" width="700">
The datasets is a diverse COVID-19 healthcare misinformation dataset, including fake news on websites and social platforms, along with users' social engagement about such news. It includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.
Version 0.1 (05/17/2020) initial version corresponding to arXiv paper CoAID: COVID-19 HEALTHCARE MISINFORMATION DATASET
Version 0.2 (08/03/2020) added data from May 1, 2020 through July 1, 2020
Version 0.3 (11/03/2020) added data from July 1, 2020 through September 1, 2020
Limeng Cui Dongwon Lee, Pennsylvania State University.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.
Comprised of 12 different files, the entry count for each file is shown below.
File | Data Points |
---|---|
users.tsv | 454,458 |
follows.tsv | 4,002,115 |
truths.tsv | 823,927 |
quotes.tsv | 10,508 |
replies.tsv | 506,276 |
media.tsv | 184,884 |
hashtags.tsv | 21,599 |
external_urls.tsv | 173,947 |
truth_hashtag_edges.tsv | 213,295 |
truth_media_edges.tsv | 257,500 |
truth_external_url_edges.tsv | 252,877 |
truth_user_tag_edges.tsv | 145,234 |
A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.