21 datasets found
  1. Truth Social Dataset

    • zenodo.org
    zip
    Updated Jan 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger (2023). Truth Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.7531625
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.

    Comprised of 12 different files, the entry count for each file is shown below.

    FileData Points
    users.tsv454,458
    follows.tsv4,002,115
    truths.tsv823,927
    quotes.tsv10,508
    replies.tsv506,276
    media.tsv184,884
    hashtags.tsv21,599
    external_urls.tsv173,947
    truth_hashtag_edges.tsv213,295
    truth_media_edges.tsv257,500
    truth_external_url_edges.tsv252,877
    truth_user_tag_edges.tsv145,234

    A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.

  2. Data from: Youtube social network

    • kaggle.com
    zip
    Updated Sep 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenzo De Tomasi (2019). Youtube social network [Dataset]. https://www.kaggle.com/datasets/lodetomasi1995/youtube-social-network
    Explore at:
    zip(10604317 bytes)Available download formats
    Dataset updated
    Sep 1, 2019
    Authors
    Lorenzo De Tomasi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    more info : https://snap.stanford.edu/data/com-Youtube.html

  3. P

    Friendster Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaewon Yang; Jure Leskovec (2020). Friendster Dataset [Dataset]. https://paperswithcode.com/dataset/friendster
    Explore at:
    Dataset updated
    Oct 28, 2020
    Authors
    Jaewon Yang; Jure Leskovec
    Description

    Friendster is an on-line gaming network. Before re-launching as a game website, Friendster was a social networking site where users can form friendship edge each other. Friendster social network also allows users form a group which other members can then join. The Friendster dataset consist of ground-truth communities (based on user-defined groups) and the social network from induced subgraph of the nodes that either belong to at least one community or are connected to other nodes that belong to at least one community.

  4. YouTube Social Network with Communities (SNAP)

    • kaggle.com
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). YouTube Social Network with Communities (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-com-youtube/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Youtube social network and ground-truth communities

    https://snap.stanford.edu/data/com-Youtube.html

    Dataset information

    Youtube (http://www.youtube.com/) is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider
    such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.
    (http://socialnetworks.mpi-sws.org/data-imc2007.html)

    We regard each connected component in a group as a separate ground-truth
    community. We remove the ground-truth communities which have less than 3
    nodes. We also provide the top 5,000 communities with highest quality
    which are described in our paper (http://arxiv.org/abs/1205.6233). As for
    the network, we provide the largest connected component.

    Network statistics
    Nodes 1,134,890
    Edges 2,987,624
    Nodes in largest WCC 1134890 (1.000)
    Edges in largest WCC 2987624 (1.000)
    Nodes in largest SCC 1134890 (1.000)
    Edges in largest SCC 2987624 (1.000)
    Average clustering coefficient 0.0808
    Number of triangles 3056386
    Fraction of closed triangles 0.002081
    Diameter (longest shortest path) 20
    90-percentile effective diameter 6.5
    Community statistics
    Number of communities 8,385
    Average community size 13.50
    Average membership size 0.10

    Source (citation)
    J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

    Files
    File Description
    com-youtube.ungraph.txt.gz Undirected Youtube network
    com-youtube.all.cmty.txt.gz Youtube communities
    com-youtube.top5000.cmty.txt.gz Youtube communities (Top 5,000)

    Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

    The graph in the SNAP data set is 1-based, with nodes numbered 1 to
    1,157,827.

    In the SuiteSparse Matrix Collection, Problem.A is the undirected Youtube
    network, a matrix of size n-by-n with n=1,134,890, which is the number of
    unique user id's appearing in any edge.

    Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
    node id's are the same as the SNAP data set (1-based).

    C = Problem.aux.Communities_all is a sparse matrix of size n by 16,386
    which represents the communities in the com-youtube.all.cmty.txt file.
    The kth line in that file defines the kth community, and is the column
    C(:,k), where C(i,k)=1 if person ...

  5. f

    Data sets used for user analysis.

    • plos.figshare.com
    xlsx
    Updated Jan 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alon Sela; Omer Neter; Václav Lohr; Petr Cihelka; Fan Wang; Moti Zwilling; John Phillip Sabou; Miloš Ulman (2025). Data sets used for user analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0309688.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 30, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Alon Sela; Omer Neter; Václav Lohr; Petr Cihelka; Fan Wang; Moti Zwilling; John Phillip Sabou; Miloš Ulman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology. While computational propaganda influences modern society, it is hard to measure or detect it. Furthermore, with the recent exponential growth in large language models (L.L.M), and the growing concerns about information overload, which makes the alternative truth spheres more noisy than ever before, the complexity and magnitude of computational propaganda is also expected to increase, making their detection even harder. Propaganda in social networks is disguised as legitimate news sent from authentic users. It smartly blended real users with fake accounts. We seek here to detect efforts to manipulate the spread of information in social networks, by one of the fundamental macro-scale properties of rhetoric—repetitiveness. We use 16 data sets of a total size of 13 GB, 10 related to political topics and 6 related to non-political ones (large-scale disasters), each ranging from tens of thousands to a few million of tweets. We compare them and identify statistical and network properties that distinguish between these two types of information cascades. These features are based on both the repetition distribution of hashtags and the mentions of users, as well as the network structure. Together, they enable us to distinguish (p − value = 0.0001) between the two different classes of information cascades. In addition to constructing a bipartite graph connecting words and tweets to each cascade, we develop a quantitative measure and show how it can be used to distinguish between political and non-political discussions. Our method is indifferent to the cascade’s country of origin, language, or cultural background since it is only based on the statistical properties of repetitiveness and the word appearance in tweets bipartite network structures.

  6. Orkut Social Network and Communities (SNAP)

    • kaggle.com
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Orkut Social Network and Communities (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-com-orkut/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2021
    Dataset provided by
    Kaggle
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Orkut social network and ground-truth communities

    https://snap.stanford.edu/data/com-Orkut.html

    Dataset information

    Orkut (http://www.orkut.com/) is a free on-line social network where users form friendship each other. Orkut also allows users form a group which
    other members can then join. We consider such user-defined groups as
    ground-truth communities. We provide the Orkut friendship social network
    and ground-truth communities. This data is provided by Alan Mislove et al. (http://socialnetworks.mpi-sws.org/data-imc2007.html)

    We regard each connected component in a group as a separate ground-truth
    community. We remove the ground-truth communities which have less than 3
    nodes. We also provide the top 5,000 communities with highest quality
    which are described in our paper (http://arxiv.org/abs/1205.6233). As for
    the network, we provide the largest connected component.

    Dataset statistics
    Nodes 3,072,441
    Edges 117,185,083
    Nodes in largest WCC 3072441 (1.000)
    Edges in largest WCC 117185083 (1.000)
    Nodes in largest SCC 3072441 (1.000)
    Edges in largest SCC 117185083 (1.000)
    Average clustering coefficient 0.1666
    Number of triangles 627584181
    Fraction of closed triangles 0.01414
    Diameter (longest shortest path) 9
    90-percentile effective diameter 4.8

    Source (citation)
    J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

    Files
    File Description
    com-orkut.ungraph.txt.gz Undirected Orkut network
    com-orkut.all.cmty.txt.gz Orkut communities
    com-orkut.top5000.cmty.txt.gz Orkut communities (Top 5,000)

    Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

    The graph in the SNAP data set is 1-based, with nodes numbered 1 to
    3,072,626.

    In the SuiteSparse Matrix Collection, Problem.A is the undirected
    Orkut network, a matrix of size n-by-n with n=3,072,441, which is
    the number of unique user id's appearing in any edge.

    Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
    node id's are the same as the SNAP data set (1-based).

    C = Problem.aux.Communities_all is a sparse matrix of size n by 15,301,901 which represents the same number communities in the com-orkut.all.cmty.txt file. The kth line in that file defines the kth community, and is the
    column C(:,k), where where C(i,k)=1 if person nodeid(i) is in the kth
    community. Row C(i,:) and row/column i of the A matrix thus refer to the
    same person, nodeid(i).

    Ctop = Problem.aux.Communities_to...

  7. u

    Reevaluating Political Trust and Social Desirability in China - Dataset -...

    • bsos-data.umd.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reevaluating Political Trust and Social Desirability in China - Dataset - BSOS Data Repository [Dataset]. https://bsos-data.umd.edu/dataset/making-the-list-reevaluating-political-trust-and-social-desirability-in-china
    Explore at:
    Area covered
    China
    Description

    The data comes from the Harvard Dataverse and covers information regarding political trust & regime support in China and self-monitoring, which determines the participants' desire for social desirability. Authors Nicholson and Huang obtained the data via a standard survey experiment that contains an embedded list experiment. The list experiment aspect is significant because list experiments are an "indirect way to gauge overreporting" (Nicholson and Haung). The data have possibilities for helping understand Chinese politics, such as how support varies at different government levels and how overreporting is affected by a person's social desirability. This data can be used in government classes and coding classes. The data should be used when learning about ordered logit and simple bar graphs. A regression should not be used. It could be used to compare the levels of trust in different regime types. It would be interesting to compare the results of other authoritarian countries, such as Turkey and Vietnam, to the results of these datasets from China. Additionally, data from these countries could be compared to democracies. People underreport in authoritarian governments and might not always tell the truth, so there is a chance that authoritarian countries could have similar levels of reported trust to the democratic countries. This experiment is also a list experiment, which reduces some of the underreporting. The data can be used to see whether certain demographic characteristics have more or less support for their government. Examples of demographic characteristics that could be looked at are gender, age, and education level.

  8. D

    Using social network information to discover truth of movie ranking

    • researchdata.ntu.edu.sg
    tsv, txt
    Updated Jun 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DR-NTU (Data) (2018). Using social network information to discover truth of movie ranking [Dataset]. http://doi.org/10.21979/N9/L5TTRW
    Explore at:
    tsv(4143), tsv(26553), txt(1857)Available download formats
    Dataset updated
    Jun 10, 2018
    Dataset provided by
    DR-NTU (Data)
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The real dataset consists of movie evaluations from IMDB, which provides a platform where individuals can evaluate movies on a scale of 1 to 10. If a user rates a movie and clicks the share button, a Twitter message is generated. We then extract the rating from the Twitter message. We treat the ratings on the IMDB website as the event truths, which are based on the aggregated evaluations from all users, whereas our observations come from only a subset of users who share their ratings on Twitter. Using the Twitter API, we collect information about the follower and following relationships between individuals that generate movie evaluation Twitter messages. To better show the influence of social network information on event truth discovery, we delete small subnetworks that consist of less than 5 agents. The final dataset we use consists of 2266 evaluations from 209 individuals on 245 movies (events) and also the social network between these 209 individuals. We regard the social network to be undirected as both follower or following relationships indicate that the two users have similar taste.

  9. f

    Most frequent words appearing in each feed.

    • plos.figshare.com
    xls
    Updated Nov 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Failla; Giulio Rossetti (2024). Most frequent words appearing in each feed. [Dataset]. http://doi.org/10.1371/journal.pone.0310330.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Failla; Giulio Rossetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue. The dataset contains the complete post history of over 4M users (81% of all registered accounts), totalling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions. Since Bluesky allows users to create and like feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions. This dataset allows novel analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection and performing content virality and diffusion analysis.

  10. Dataset of adaptive Children-Robot Interaction for Education based on...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Tozadore; Daniel Tozadore; Roseli Romero; Roseli Romero (2025). Dataset of adaptive Children-Robot Interaction for Education based on Autonomous Multimodal Users' Readings [Dataset]. http://doi.org/10.5281/zenodo.11174782
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Tozadore; Daniel Tozadore; Roseli Romero; Roseli Romero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # Dataset of adaptive Children-Robot Interaction for Education based on Autonomous Multimodal Users’ Readings

    ## Background

    This dataset is generated from multiple interactions between a Social Robot (NAO) and 5th grade students from a private school in São Paulo, Brazil.

    In the interaction, the robot approached the content that teachers were approaching at the time with the participants students about the wasting system in Brazil.

    The measures here are the readings that the R-CASTLE system did for each answer the students gave to the questions the robot asked.

    For more information about how these measures were collected, please refer to this thesis at: https://doi.org/10.11606/T.55.2020.tde-31082020-093935

    Since the goal of the R-CASTLE is to provide autonomous adaptation, we built a ground-truth dataset based on human feedback of an expert in education operating the robot in loco. The person was teleoperating the robot to change its behaviour (or not) according to observed values of the participants as Face Gaze, Facial emotion displayed, Number of spoken words, the correctness of the answer (based on pre-defined answers), and the time students took to answer. These measures are the 5th columns of this csv file. The evaluator could decide to increase (1), maintain (0), or decrease (-1) the level of difficulties of the following questions depending on the mentioned observed measures. This is the human true label, stored in the 6th column.

    ## Description:
    Each row of this file is a tuple of the autonomous reading the robot made in the 5 first columns, plus the true label in the 6th row (True Value) and the Final Crisp Value using fuzzy classification in the 7th row (Final Crisp Value).


    Deviations (integer): number of face deviations of the participant during the question answering identified by the system.

    EmotionCount (integer): a balance between "good" and "bad" emotions (good - bad) identified by the system.

    NumberWord (integer): number of words comprised in the sentence the participant gave.

    SucRate/Ans/RWa: (between 0 and 1, where 0 is completely wrong and 1 is completely right): The success rate of the participant’s answer to that question, based on the expected answer programmed by their teachers.

    Time2ans (float): The time spent to answer the question since the robot has finished the question until the end of the participant’s speech in seconds.

    True Value (-1, 0, 1): Ground-truth value. Value of adaptation chosen by the human observing the interaction if the system needed to decrease, maintain, or increase the level of difficulty of asked questions.

    Final Crisp Value (float): value of calculated fuzzy output based on the implementations in the paper: https://doi.org/10.1145/3395035.3425201


    ## Creators
    Daniel Tozadore: dtozadore@gmail.com
    Roseli Romero: rafrance@icmc.usp.br


    ## License:
    [Creative Commons Licenses](https://creativecommons.org/share-your-work/cclicenses/)

  11. P

    Group SNAP Dataset

    • paperswithcode.com
    Updated Jul 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Group SNAP Dataset [Dataset]. https://paperswithcode.com/dataset/group-snap-snap-suitesparse-matrix-collection
    Explore at:
    Dataset updated
    Jul 21, 2018
    Description

    Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets, Jure Leskovec http://snap.stanford.edu/data/index.html email jure at cs.stanford.edu

    Citation for the SNAP collection:

    @misc{snapnets, author = {Jure Leskovec and Andrej Krevl}, title = {{SNAP Datasets}: {Stanford} Large Network Dataset Collection}, howpublished = {\url{http://snap.stanford.edu/data}}, month = jun, year = 2014 }

    The following matrices/graphs were added to the collection in June 2010 by Tim Davis (problem id and name):

    2284 SNAP/soc-Epinions1 who-trusts-whom network of Epinions.com 2285 SNAP/soc-LiveJournal1 LiveJournal social network 2286 SNAP/soc-Slashdot0811 Slashdot social network, Nov 2008 2287 SNAP/soc-Slashdot0902 Slashdot social network, Feb 2009 2288 SNAP/wiki-Vote Wikipedia who-votes-on-whom network 2289 SNAP/email-EuAll Email network from a EU research institution 2290 SNAP/email-Enron Email communication network from Enron 2291 SNAP/wiki-Talk Wikipedia talk (communication) network 2292 SNAP/cit-HepPh Arxiv High Energy Physics paper citation network 2293 SNAP/cit-HepTh Arxiv High Energy Physics paper citation network 2294 SNAP/cit-Patents Citation network among US Patents 2295 SNAP/ca-AstroPh Collaboration network of Arxiv Astro Physics 2296 SNAP/ca-CondMat Collaboration network of Arxiv Condensed Matter 2297 SNAP/ca-GrQc Collaboration network of Arxiv General Relativity 2298 SNAP/ca-HepPh Collaboration network of Arxiv High Energy Physics 2299 SNAP/ca-HepTh Collaboration network of Arxiv High Energy Physics Theory 2300 SNAP/web-BerkStan Web graph of Berkeley and Stanford 2301 SNAP/web-Google Web graph from Google 2302 SNAP/web-NotreDame Web graph of Notre Dame 2303 SNAP/web-Stanford Web graph of Stanford.edu 2304 SNAP/amazon0302 Amazon product co-purchasing network from March 2 2003 2305 SNAP/amazon0312 Amazon product co-purchasing network from March 12 2003 2306 SNAP/amazon0505 Amazon product co-purchasing network from May 5 2003 2307 SNAP/amazon0601 Amazon product co-purchasing network from June 1 2003 2308 SNAP/p2p-Gnutella04 Gnutella peer to peer network from August 4 2002 2309 SNAP/p2p-Gnutella05 Gnutella peer to peer network from August 5 2002 2310 SNAP/p2p-Gnutella06 Gnutella peer to peer network from August 6 2002 2311 SNAP/p2p-Gnutella08 Gnutella peer to peer network from August 8 2002 2312 SNAP/p2p-Gnutella09 Gnutella peer to peer network from August 9 2002 2313 SNAP/p2p-Gnutella24 Gnutella peer to peer network from August 24 2002 2314 SNAP/p2p-Gnutella25 Gnutella peer to peer network from August 25 2002 2315 SNAP/p2p-Gnutella30 Gnutella peer to peer network from August 30 2002 2316 SNAP/p2p-Gnutella31 Gnutella peer to peer network from August 31 2002 2317 SNAP/roadNet-CA Road network of California 2318 SNAP/roadNet-PA Road network of Pennsylvania 2319 SNAP/roadNet-TX Road network of Texas 2320 SNAP/as-735 733 daily instances(graphs) from November 8 1997 to January 2 2000 2321 SNAP/as-Skitter Internet topology graph, from traceroutes run daily in 2005 2322 SNAP/as-caida The CAIDA AS Relationships Datasets, from January 2004 to November 2007 2323 SNAP/Oregon-1 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2324 SNAP/Oregon-2 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2325 SNAP/soc-sign-epinions Epinions signed social network 2326 SNAP/soc-sign-Slashdot081106 Slashdot Zoo signed social network from November 6 2008 2327 SNAP/soc-sign-Slashdot090216 Slashdot Zoo signed social network from February 16 2009 2328 SNAP/soc-sign-Slashdot090221 Slashdot Zoo signed social network from February 21 2009

    Then the following problems were added in July 2018. All data and metadata from the SNAP data set was imported into the SuiteSparse Matrix Collection.

    2777 SNAP/CollegeMsg Messages on a Facebook-like platform at UC-Irvine 2778 SNAP/com-Amazon Amazon product network 2779 SNAP/com-DBLP DBLP collaboration network 2780 SNAP/com-Friendster Friendster online social network 2781 SNAP/com-LiveJournal LiveJournal online social network 2782 SNAP/com-Orkut Orkut online social network 2783 SNAP/com-Youtube Youtube online social network 2784 SNAP/email-Eu-core E-mail network 2785 SNAP/email-Eu-core-temporal E-mails between users at a research institution 2786 SNAP/higgs-twitter twitter messages re: Higgs boson on 4th July 2012. 2787 SNAP/loc-Brightkite Brightkite location based online social network 2788 SNAP/loc-Gowalla Gowalla location based online social network 2789 SNAP/soc-Pokec Pokec online social network 2790 SNAP/soc-sign-bitcoin-alpha Bitcoin Alpha web of trust network 2791 SNAP/soc-sign-bitcoin-otc Bitcoin OTC web of trust network 2792 SNAP/sx-askubuntu Comments, questions, and answers on Ask Ubuntu 2793 SNAP/sx-mathoverflow Comments, questions, and answers on Math Overflow 2794 SNAP/sx-stackoverflow Comments, questions, and answers on Stack Overflow 2795 SNAP/sx-superuser Comments, questions, and answers on Super User 2796 SNAP/twitter7 A collection of 476 million tweets collected between June-Dec 2009 2797 SNAP/wiki-RfA Wikipedia Requests for Adminship (with text) 2798 SNAP/wiki-talk-temporal Users editing talk pages on Wikipedia 2799 SNAP/wiki-topcats Wikipedia hyperlinks (with communities)

    The following 13 graphs/networks were in the SNAP data set in July 2018 but have not yet been imported into the SuiteSparse Matrix Collection. They may be added in the future:

    amazon-meta ego-Facebook ego-Gplus ego-Twitter gemsec-Deezer gemsec-Facebook ksc-time-series memetracker9 web-flickr web-Reddit web-RedditPizzaRequests wiki-Elec wiki-meta wikispeedia

    The 2010 description of the SNAP data set gave these categories:

    • Social networks: online social networks, edges represent interactions between people

    • Communication networks: email communication networks with edges representing communication

    • Citation networks: nodes represent papers, edges represent citations

    • Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)

    • Web graphs: nodes represent webpages and edges are hyperlinks

    • Blog and Memetracker graphs: nodes represent time stamped blog posts, edges are hyperlinks [revised below]

    • Amazon networks : nodes represent products and edges link commonly co-purchased products

    • Internet networks : nodes represent computers and edges communication

    • Road networks : nodes represent intersections and edges roads connecting the intersections

    • Autonomous systems : graphs of the internet

    • Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)

    By July 2018, the following categories had been added:

    • Networks with ground-truth communities : ground-truth network communities in social and information networks

    • Location-based online social networks : Social networks with geographic check-ins

    • Wikipedia networks, articles, and metadata : Talk, editing, voting, and article data from Wikipedia

    • Temporal networks : networks where edges have timestamps

    • Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets

    • Online communities : Data from online communities such as Reddit and Flickr

    • Online reviews : Data from online review systems such as BeerAdvocate and Amazon

    https://sparse.tamu.edu/SNAP

  12. D

    Data from: The SWELL Knowledge Work Dataset for Stress and User Modeling...

    • ssh.datastations.nl
    • datacatalogue.cessda.eu
    • +1more
    bin, csv, docx, ods +7
    Updated Jun 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W. Kraaij; S. Koldijk; M. Sappelli; W. Kraaij; S. Koldijk; M. Sappelli (2025). The SWELL Knowledge Work Dataset for Stress and User Modeling Research [Dataset]. http://doi.org/10.17026/DANS-X55-69ZP
    Explore at:
    xml(2353866), bin(138816914), txt(3272), txt(11548691), txt(35044), csv(1405254), pdf(51927), docx(162), xml(5226971), pdf(27706), bin(217469794), txt(4275), txt(103380), xml(4889097), pdf(25377), txt(10223), txt(15357787), txt(31948), txt(35047), txt(13925128), pdf(49344), txt(4817), txt(9068370), pdf(38969), csv(256466), txt(9814984), pdf(50132), txt(24425), txt(1273), txt(15040), txt(12422533), pdf(30168), txt(9699691), txt(106605), bin(219943974), txt(42292), txt(251238), txt(13564178), xml(3673462), pdf(20543), txt(28586), pdf(1649889), txt(39200), pdf(34920), xml(6566455), txt(1799), txt(74833), txt(100476), txt(383342), pdf(49847), txt(4032), pdf(26503), txt(23804), txt(278463), pdf(43263), txt(29702), txt(15407277), txt(9765565), txt(13753104), txt(11732304), txt(319013), txt(15175587), txt(74775), pdf(42599), pdf(45861), csv(4358910), txt(2861), txt(46172), xml(3335130), pdf(34389), txt(106540), xml(6018529), xml(5680628), txt(12705795), txt(74700), txt(9990352), txt(9834685), txt(3001), pdf(640407), txt(8983002), xml(4837968), txt(3989), txt(219215), pdf(60740), pdf(33502), txt(5072005), pdf(31808), xml(11315963), pdf(31842), txt(9140), bin(231338224), csv(165687), txt(99565), bin(142918844), txt(11772170), txt(2932), txt(9865385), txt(113909), txt(14852849), txt(36913), bin(154182874), pdf(50411), txt(34731), bin(126641344), txt(4643), bin(203340924), txt(16605419), txt(99447), txt(10455754), pdf(44031), pdf(36502), txt(10464), txt(6820961), txt(14129648), txt(8483343), txt(101332), txt(32270), txt(9495353), txt(31846), txt(14782557), txt(297065), txt(2645), bin(152424904), pdf(53504), bin(143765274), xml(3232729), pdf(44451), txt(2566), txt(3641), docx(89540), txt(16774955), txt(13871051), txt(10536843), xml(10807256), txt(16564235), txt(73042), pdf(54921), txt(12336162), txt(15585167), txt(4849528), txt(2678), txt(8317317), txt(144704), txt(13098941), bin(215907154), pdf(1451788), pdf(44727), bin(217274464), txt(11025402), xml(4564234), txt(130607), xml(4019420), txt(5998032), pdf(40746), csv(251438), txt(4271), txt(9653848), xml(7408713), txt(250623), txt(11229149), pdf(43772), txt(8561831), txt(11193940), txt(25498), tsv(4274524), xml(3689251), txt(1407), txt(4298), txt(13386548), bin(213953854), txt(8550920), txt(9774), pdf(44382), txt(12562100), pdf(23897), txt(8050720), txt(30666), txt(24403), bin(155419964), txt(94378), txt(5437018), txt(7474238), txt(4739), bin(204252464), txt(294913), txt(53313), txt(3889), bin(91742384), xml(8605449), txt(4194), xml(2740126), txt(7864515), txt(29183), txt(194757), txt(11851789), xml(4646961), bin(137970484), txt(34414), txt(33979), pdf(43071), txt(4074), bin(220985734), txt(101327), txt(8142), txt(36728), xml(4464998), txt(9974743), txt(12684613), pdf(51716), ods(28215), bin(64786844), txt(3928), pdf(36852), txt(74887), pdf(53481), txt(103114), pdf(46680), txt(24654), txt(135983), txt(3802), txt(88873), txt(8419553), txt(11543624), bin(212391214), xml(4769205), txt(13068268), pdf(43584), txt(36086), txt(11926205), xml(3715555), txt(2889), xml(8067811), txt(12051237), pdf(44754), txt(2803), pdf(45737), txt(9896174), txt(23815), txt(21066), txt(8455), csv(267281), bin(128789974), xml(4102967), txt(4592), txt(2956), ods(3545058), txt(103243), pdf(62237), txt(4441), txt(16495405), txt(13617386), xml(5981999), pdf(85179), txt(3144), xml(5437684), txt(12084809), bin(139533124), txt(41401), txt(4184), pdf(72223), txt(95901), txt(14684828), txt(4192), txt(16611008), txt(335557), pdf(46524), xlsx(750416), txt(2821), bin(122864964), txt(20708), pdf(44974), bin(221767054), txt(11715608), xml(2732144), txt(4059), pdf(60420), tsv(5259961), txt(34152), txt(9330), pdf(45751), pdf(42463), xml(5908378), xml(5218925), pdf(34761), txt(13842469), xml(5128888), xml(4554565), txt(26247), pdf(37224), pdf(33962), xml(7609357), pdf(55682), txt(3877), txt(13457883), pdf(37039), txt(9973310), txt(11664198), txt(2960), txt(7691), txt(331569), txt(4369), txt(3919), txt(7576), txt(12801447), pdf(40691), pdf(44339), txt(4055), txt(2020), pdf(31475), txt(2997), txt(2446), txt(14090804), pdf(49357), txt(3179), bin(102420424), txt(13851712), txt(2773), pdf(32181), pdf(34939), bin(235830814), txt(3747), txt(14185671), pdf(46306), pdf(36578), txt(2178), txt(3210), txt(74727), bin(218511554), txt(103262), pdf(43320), txt(99051), txt(3297), txt(17515269), txt(8417590), pdf(38756), pdf(59710), pdf(54129), txt(108559), txt(100336), txt(4944), pdf(29544), txt(3093), pdf(29562), pdf(47035), txt(4075), txt(27928), txt(22312), txt(2146), xml(8220528), xml(8374112), pdf(46888), txt(15735447), xml(5343770), txt(99494), xml(2522031), txt(24304), bin(117200394), txt(8076222), txt(12852853), txt(11799784), bin(159261454), pdf(47186), bin(193509314), pdf(30720), xml(3363066), bin(187584304), txt(74938), txt(76377), txt(62260), xml(4515381), txt(3050), txt(8424342), txt(10711401), txt(8531898), txt(6951), txt(94314), bin(215321164), txt(259713), pdf(43321), pdf(42068), pdf(46586), txt(33584), pdf(88079), txt(11899637), pdf(51197), txt(24987), txt(12601644), txt(4018), pdf(47786), txt(32906), txt(101366), xml(4875650), txt(73002), txt(8988), txt(3909), pdf(75308), txt(10834247), txt(8933983), bin(104699274), bin(215711824), txt(103490), pdf(52630), txt(101344), txt(6710742), pdf(29434), bin(73186034), pdf(57778), pdf(37982), txt(11868610), pdf(573380), pdf(47305), txt(3379), txt(69271), txt(16037633), txt(33954), txt(13005678), txt(78359), txt(16305588), txt(94151), txt(74650), pdf(30188), xlsx(8857037), txt(32577), bin(167269984), txt(2133), txt(48071), txt(6447087), txt(33215), txt(44414), pdf(26037), xml(3293179), pdf(33092), txt(73164), bin(227040964), txt(3502), txt(106048), txt(9921246), txt(72898), txt(202046), txt(2962), txt(6148), txt(16127002), txt(13425204), txt(71288), txt(22901), pdf(31600), txt(10039868), txt(13), txt(11840473), txt(99744), ods(383472), txt(15068237), txt(1186), bin(141421314), txt(2145), txt(23313), xml(3727869), txt(9362), pdf(68643), txt(34413), bin(183091714), txt(12771090), pdf(25742), pdf(39357), txt(10360926), pdf(33726), txt(72825), pdf(34170), txt(11184721), txt(101302), txt(13977787), txt(13721550), txt(1930), txt(3752), pdf(112085), bin(134910314), bin(148908964), txt(2108), bin(192988434), txt(29690), bin(148257864), pdf(54887), pdf(37214), txt(5174060), txt(3422), txt(3935), txt(11482013), xml(175174), bin(154964194), txt(12016258), txt(12653), txt(4045), bin(139988894), txt(2859), xml(9546367), xml(2361951), text/x-matlab(403), txt(11791883), txt(7636846), txt(97465), pdf(22124), txt(12034769), txt(11778941), txt(4266), txt(14223942), bin(97732504), ods(18350), txt(4308), txt(1461), pdf(21152), txt(11823170), txt(5530), txt(35995), txt(8525118), txt(2976), txt(97817), pdf(20546), txt(14486568), pdf(24658), txt(31202), txt(121332), pdf(133899), txt(2929), pdf(33414), txt(25354), pdf(27038), txt(16970517), pdf(24421), xml(2366068), pdf(36788), pdf(47802), bin(180357094), txt(198126), txt(26362), txt(12077507), txt(3081), txt(15724462), xml(4733143), txt(12811265), txt(39096), txt(8315385), pdf(37081), txt(8705074), txt(118753), txt(3040), bin(149950724), txt(3020), txt(12357361), pdf(42087), txt(6593), xml(4863956), txt(10713482), txt(71120), pdf(33819), xml(6101193), txt(42113), txt(32644), txt(34329), txt(12377342), bin(31190084), txt(2951), txt(10792810), txt(9467851), pdf(33802), bin(207442854), bin(153271334), txt(1978), bin(109517414), txt(4102), txt(17393335), txt(11252174), txt(78466), pdf(28339), pdf(34042), txt(3863), txt(3576), txt(30052), txt(3971), txt(12501908), bin(190709584), csv(1271054), txt(14010640), txt(4262), txt(6426), txt(17556817), csv(4573837), txt(8155490), txt(101242), txt(26128), pdf(45349), txt(4155), txt(14495618), txt(12343828), txt(2922), xml(10275258), txt(9865258), txt(9355162), xml(7890937), txt(2985), txt(5833), txt(1928), bin(166944434), txt(293829), txt(11243118), bin(220334634), pdf(30860), xml(4809863), txt(6812), bin(68367894), bin(103462184), pdf(80740), pdf(51187), txt(6947), txt(13699), txt(11821509), txt(5860186), xml(3997233), txt(4236), xml(6272072), pdf(24055), txt(9621330), pdf(44956), xml(4804227), pdf(33872), txt(4139), txt(10427849), pdf(21707), pdf(60092), txt(22655), pdf(26176), pdf(27294), txt(12089100), txt(4280), bin(202103834), txt(17829256), txt(103273), xml(5395273), txt(4213), txt(7853937), txt(10116170), txt(12872171), txt(22956), txt(35601), bin(103787734), pdf(29111), xml(5290569), txt(2209), pdf(35982), txt(34213), txt(28214), txt(104690), txt(4260), pdf(30612), txt(13299672), txt(28679), pdf(40659), txt(12200362), xml(4113751), txt(16393766), txt(37180), bin(190970024), xml(4960080), txt(3634), bin(184328804), txt(11932588), txt(85344), txt(78164), bin(218381334), txt(10402260), txt(10637135), txt(7141085), ods(56284), txt(283292), txt(3672), txt(2608), pdf(25252), pdf(29150), xml(5076278), txt(3969), txt(1867), pdf(46805), txt(6391), txt(4456), txt(4276), pdf(43707), ods(25995), txt(4830), pdf(21937), pdf(66776), bin(211154124), bin(163493604), txt(14398240), txt(11826864), txt(6584), txt(5435), txt(24390), txt(9434372), bin(140705104), xml(5133733), bin(148453194), xml(3661045), pdf(35798), pdf(26489), txt(33774), pdf(38494), txt(19942842), txt(15238660), txt(11032366), text/x-matlab(9085), txt(8380526), txt(99769), pdf(23684), pdf(48251), txt(16274890), txt(8286277), txt(24109), txt(3203), txt(16494220), txt(15735605), pdf(37866), pdf(37744), pdf(23414), txt(10004745), txt(2033), txt(11192272), pdf(32407), bin(180161764), pdf(37135), txt(12133247), txt(6006893), pdf(27596), txt(3075), txt(10723082), txt(72269), txt(3756), txt(70823), xml(3027442), txt(3257), txt(11930641), pdf(40242), pdf(26348), pdf(55027), xml(7204702), txt(34983), txt(11029774), txt(38280), txt(28799), txt(8548060), txt(5255), txt(12734047), pdf(27879), txt(6932031), txt(3011), pdf(19125), pdf(66215), xml(4415179), txt(14011195), txt(8618075), pdf(45696), txt(6659709), xml(5376075), txt(8552777), txt(73112), txt(11564), pdf(28453), txt(35774), pdf(25423), txt(4044), xml(5282332), txt(103030), bin(208549724), zip(541496), txt(17417063), txt(10991263), txt(11155935), txt(3957), txt(9986588), bin(185696114), txt(12311362), txt(101197), xml(6584843), txt(8639127), xml(5987840), pdf(62739), txt(4151), pdf(24501), txt(5580597), txt(13703922), xml(6172580), bin(170134824), txt(13695001), pdf(34258), pdf(42764), txt(104787), txt(28286), txt(6296207), txt(3245), pdf(42220), pdf(34931), pdf(28577), pdf(29589), txt(2912), pdf(31620), txt(34447), txt(15948667), txt(5585855), txt(3922), txt(10777047), txt(290692), txt(98066), xml(2756461), txt(283442), txt(36182), xml(2037251), xml(3087131), txt(99183), pdf(19601), txt(7660330), txt(4149), txt(14858447), pdf(48715), txt(15009914), txt(16413117), xml(8469662), txt(3240), bin(156005954), txt(13565431), bin(880), txt(3938), xml(7585553), bin(154313094), pdf(32923), txt(72979), pdf(47851), pdf(37197), txt(9240), txt(7567369), txt(97811), pdf(52885), txt(7949746), bin(203080484), txt(284635), txt(16551368), txt(12235590), txt(13496284), pdf(52894), txt(4431), pdf(24371), pdf(37637), txt(11793576), txt(13719298), pdf(29561), txt(3181), txt(14553653), txt(10515), txt(35319), pdf(23057), txt(74824), txt(71255), txt(57033), txt(9589285), txt(12304465), txt(2365), txt(34860), txt(2820), pdf(40151), xml(5324611), bin(60294254), txt(26770), bin(29171674), txt(8669678), txt(35149), txt(3146), txt(11531762), pdf(32480), xml(5191573), pdf(851850), pdf(22545), txt(7432478), txt(4594), txt(5692), txt(11935669), pdf(692823), txt(17291274), txt(25777), txt(12113664), zip(7534108141), tsv(22193), tsv(5253), tsv(3223), tsv(3890543), tsv(1598)Available download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    W. Kraaij; S. Koldijk; M. Sappelli; W. Kraaij; S. Koldijk; M. Sappelli
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is the multimodal SWELL knowledge work (SWELL-KW) dataset for research on stress and user modeling. The dataset was collected in an experiment, in which 25 people performed typical knowledge work (writing reports, making presentations, reading e-mail, searching for information). We manipulated their working conditions with the stressors: email interruptions and time pressure. A varied set of data was recorded: computer logging, facial expression from camera recordings, body postures from a Kinect 3D sensor and heart rate (variability) and skin conductance from body sensors. Our dataset not only contains raw data, but also preprocessed data and extracted features. The participants' subjective experience on task load, mental effort, emotion and perceived stress was assessed with validated questionnaires as a ground truth. The resulting dataset on working behavior and affect is suitable for several research fields, such as work psychology, user modeling and context aware systems.The collection of this dataset was supported by the Dutch national program COMMIT (project P7 SWELL). SWELL is an acronym of Smart Reasoning Systems for Well-being at Work and at Home. Notes on the content of the dataset:- The uLog XML files refer to documents in the dataset. Most extensions of these files have changed due to file conversions. The original extension is now included in the file names at the end.- Due to copyrights not all original documents and images are included in the dataset.- Variable C in 'D - Physiology features (HR_HRV_SCL - final).csv' refers to the type of block, 1, 2 or 3.

  13. Profiling Fake News Spreaders on Twitter

    • zenodo.org
    Updated Sep 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FRANCISCO RANGEL; PAOLO ROSSO; BILAL GHANEM; ANASTASIA GIACHANOU; FRANCISCO RANGEL; PAOLO ROSSO; BILAL GHANEM; ANASTASIA GIACHANOU (2020). Profiling Fake News Spreaders on Twitter [Dataset]. http://doi.org/10.5281/zenodo.3692319
    Explore at:
    Dataset updated
    Sep 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    FRANCISCO RANGEL; PAOLO ROSSO; BILAL GHANEM; ANASTASIA GIACHANOU; FRANCISCO RANGEL; PAOLO ROSSO; BILAL GHANEM; ANASTASIA GIACHANOU
    Description

    Task

    Fake news has become one of the main threats of our society. Although fake news is not a new phenomenon, the exponential growth of social media has offered an easy platform for their fast propagation. A great amount of fake news, and rumors are propagated in online social networks with the aim, usually, to deceive users and formulate specific opinions. Users play a critical role in the creation and propagation of fake news online by consuming and sharing articles with inaccurate information either intentionally or unintentionally. To this end, in this task, we aim at identifying possible fake news spreaders on social media as a first step towards preventing fake news from being propagated among online users.

    After having addressed several aspects of author profiling in social media from 2013 to 2019 (bot detection, age and gender, also together with personality, gender and language variety, and gender from a multimodality perspective), this year we aim at investigating if it is possbile to discriminate authors that have shared some fake news in the past from those that, to the best of our knowledge, have never done it.

    As in previous years, we propose the task from a multilingual perspective:

    • English
    • Spanish

    NOTE: Although we recommend to participate in both languages (English and Spanish), it is possible to address the problem just for one language.

    Data

    Input

    The uncompressed dataset consists in a folder per language (en, es). Each folder contains:

    • A XML file per author (Twitter user) with 100 tweets. The name of the XML file correspond to the unique author id.
    • A truth.txt file with the list of authors and the ground truth.

    The format of the XML files is:

      
       

    The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.

      b2d5748083d6fdffec6c2d68d4d4442d:::0
      2bed15d46872169dc7deaf8d2b43a56:::0
      8234ac5cca1aed3f9029277b2cb851b:::1
      5ccd228e21485568016b4ee82deb0d28:::0
      60d068f9cafb656431e62a6542de2dc0:::1
      ...
      

    Output

    Your software must take as input the absolute path to an unpacked dataset, and has to output for each document of the dataset a corresponding XML file that looks like this:

      

    The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.

    IMPORTANT! Languages should not be mixed. A folder should be created for each language and place inside only the files with the prediction for this language.

    Evaluation

    The performance of your system will be ranked by accuracy. For each language, we will calculate individual accuracies in discriminating between the two classes. Finally, we will average the accuracy values per language to obtain the final ranking.

    Submission

    Once you finished tuning your approach on the validation set, your software will be tested on the test set. During the competition, the test set will not be released publicly. Instead, we ask you to submit your software for evaluation at our site as described below.

    We ask you to prepare your software so that it can be executed via command line calls. The command shall take as input (i) an absolute path to the directory of the test corpus and (ii) an absolute path to an empty output directory:

    mySoftware -i INPUT-DIRECTORY -o OUTPUT-DIRECTORY

    Within OUTPUT-DIRECTORY, we require two subfolders: en and es, one folder per language, respectively. As the provided output directory is guaranteed to be empty, your software needs to create those subfolders. Within each of these subfolders, you need to create one xml file per author. The xml file looks like this:

      

    The naming of the output files is up to you. However, we recommend to use the author-id as filename and "xml" as extension.

    Note: By submitting your software you retain full copyrights. You agree to grant us usage rights only for the purpose of the PAN competition. We agree not to share your software with a third party or use it for other purposes than the PAN competition.

    Related Work

  14. P

    CoAID Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Limeng Cui; Dongwon Lee (2022). CoAID Dataset [Dataset]. https://paperswithcode.com/dataset/coaid
    Explore at:
    Dataset updated
    Jun 30, 2022
    Authors
    Limeng Cui; Dongwon Lee
    Description

    CoAID include diverse COVID-19 healthcare misinformation, including fake news on websites and social platforms, along with users' social engagement about such news. CoAID includes 4,251 news, 296,000 related user engagements, 926 social platform posts about COVID-19, and ground truth labels.

  15. Communities Graphs

    • kaggle.com
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Communities Graphs [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-communities/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    com-LiveJournal: LiveJournal social network and ground-truth communities

    LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the LiveJournal friendship social network and ground-truth communities.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Friendster: Friendster social network and ground-truth communities

    Friendster is an on-line gaming network. Before re-launching as a game website, Friendster was a social networking site where users can form friendship edge each other. Friendster social network also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. For the social network, we take the induced subgraph of the nodes that either belong to at least one community or are connected to other nodes that belong to at least one community. This data is provided by The Web Archive Project, where the full graph is available.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Orkut: Orkut social network and ground-truth communities

    Orkut is a free on-line social network where users form friendship each other. Orkut also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the Orkut friendship social network and ground-truth communities. This data is provided by Alan Mislove et al.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Youtube: Youtube social network and ground-truth communities

    Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-DBLP: DBLP collaboration network and ground-truth communities

    The DBLP computer science bibliography provides a comprehensive list of research papers in computer science. We construct a co-authorship network where two authors are connected if they publish at least one paper together. Publication venue, e.g, journal or conference, defines an individual ground-truth community; authors who published to a certain journal or conference form a community.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Amazon: Amazon product co-purchasing network and ground-truth communities

    Network was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains an undirected edge from i to j. Each product category provided by Amazon defines each ground-truth community.

    We regard each connected component in a product category as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    email-Eu-core: email-Eu-core network

    The network was generated using email data from a large European research institution. We have anonymized information about all incoming and outgoing email between members of the research institution. Th...

  16. B

    Residential School Locations Dataset (CSV Format)

    • borealisdata.ca
    • search.dataone.org
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rosa Orlandini (2019). Residential School Locations Dataset (CSV Format) [Dataset]. http://doi.org/10.5683/SP2/RIYEMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2019
    Dataset provided by
    Borealis
    Authors
    Rosa Orlandini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1863 - Jun 30, 1998
    Area covered
    Canada
    Description

    The Residential School Locations Dataset [IRS_Locations.csv] contains the locations (latitude and longitude) of Residential Schools and student hostels operated by the federal government in Canada. All the residential schools and hostels that are listed in the Indian Residential School Settlement Agreement are included in this dataset, as well as several Industrial schools and residential schools that were not part of the IRRSA. This version of the dataset doesn’t include the five schools under the Newfoundland and Labrador Residential Schools Settlement Agreement. The original school location data was created by the Truth and Reconciliation Commission, and was provided to the researcher (Rosa Orlandini) by the National Centre for Truth and Reconciliation in April 2017. The dataset was created by Rosa Orlandini, and builds upon and enhances the previous work of the Truth and Reconcilation Commission, Morgan Hite (creator of the Atlas of Indian Residential Schools in Canada that was produced for the Tk'emlups First Nation and Justice for Day Scholar's Initiative, and Stephanie Pyne (project lead for the Residential Schools Interactive Map). Each individual school location in this dataset is attributed either to RSIM, Morgan Hite, NCTR or Rosa Orlandini. Many schools/hostels had several locations throughout the history of the institution. If the school/hostel moved from its’ original location to another property, then the school is considered to have two unique locations in this dataset,the original location and the new location. For example, Lejac Indian Residential School had two locations while it was operating, Stuart Lake and Fraser Lake. If a new school building was constructed on the same property as the original school building, it isn't considered to be a new location, as is the case of Girouard Indian Residential School.When the precise location is known, the coordinates of the main building are provided, and when the precise location of the building isn’t known, an approximate location is provided. For each residential school institution location, the following information is provided: official names, alternative name, dates of operation, religious affiliation, latitude and longitude coordinates, community location, Indigenous community name, contributor (of the location coordinates), school/institution photo (when available), location point precision, type of school (hostel or residential school) and list of references used to determine the location of the main buildings or sites.

  17. f

    Twitter follower-followee graph, labeled with benign/Sybil

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoyu Lu (2023). Twitter follower-followee graph, labeled with benign/Sybil [Dataset]. http://doi.org/10.6084/m9.figshare.20057300.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Haoyu Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.

    H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.

  18. c

    Data from: Truths and Tales: Understanding Online Fake News Networks in...

    • researchdata.canberra.edu.au
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benedict Sheehy (2023). Truths and Tales: Understanding Online Fake News Networks in South Korea [Dataset]. http://doi.org/10.17632/3xb4n9n6t4.1
    Explore at:
    Dataset updated
    Nov 24, 2023
    Authors
    Benedict Sheehy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Korea
    Description

    This study investigates the features of fake news networks and how they spread during the 2020 South Korean election. Using Actor-Network Theory (ANT), we assessed the network's central players and how they are connected. Results reveal the characteristics of the videoclips and channel networks responsible for the propagation of fake news. Analysis of the videoclip network reveals a high number of detected fake news videos and a high density of connections among users. Assessment of news videoclips on both actual and fake news networks reveals that the real news network is more concentrated. However, the scale of the network may play a role in these variations. Statistics for network centralization reveal that users are spread out over the network, pointing to its decentralized character. A closer look at the real and fake news networks inside videos and channels reveals similar trends. We find that the density of the real news videoclip network is higher than that of the fake news network, whereas the fake news channel networks are denser than their real news counterparts, which may indicate greater activity and interconnectedness in their transmission. We also found that fake news videoclips had more likes than real news videoclips, whereas real news videoclips had more dislikes than fake news videoclips. These findings strongly suggest that fake news videoclips are more accepted when people watch them on YouTube. In addition, we used semantic networks and automated content analysis to uncover common language patterns in fake news which helps us better understand the structure and dynamics of the networks involved in the dissemination of fake news. The findings reported here provide important insights on how fake news spread via social networks during the South Korean election of 2020. The results of this study have important implications for the campaign against fake news and ensuring factual coverage.

  19. o

    Data from: On the Influence of Twitter Trolls during the 2016 US...

    • explore.openaire.eu
    Updated Oct 1, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikos Salamanos; Michael J. Jensen; Xinlei He; Yang Chen; Michael Sirivianos (2019). On the Influence of Twitter Trolls during the 2016 US Presidential Election [Dataset]. http://doi.org/10.5281/zenodo.3540801
    Explore at:
    Dataset updated
    Oct 1, 2019
    Authors
    Nikos Salamanos; Michael J. Jensen; Xinlei He; Yang Chen; Michael Sirivianos
    Area covered
    United States
    Description

    It is a widely accepted fact that state-sponsored Twitter accounts operated during the 2016 US presidential election spreading millions of tweets with misinformation and inflammatory political content. Whether these social media campaigns of the so-called "troll" accounts were able to manipulate public opinion is still in question. Here we aim to quantify the influence of troll accounts and the impact they had on Twitter by analyzing 152.5 million tweets from 9.9 million users, including 822 troll accounts. The data collected during the US election campaign, contain original troll tweets before they were deleted by Twitter. From these data, we constructed a very large interaction graph; a directed graph of 9.3 million nodes and 169.9 million edges. Recently, Twitter released datasets on the misinformation campaigns of 8,275 state-sponsored accounts linked to Russia, Iran and Venezuela as part of the investigation on the foreign interference in the 2016 US election. These data serve as ground-truth identifier of troll users in our dataset. Using graph analysis techniques we qualify the diffusion cascades of web and media context that have been shared by the troll accounts. We present strong evidence that authentic users were the source of the viral cascades. Although the trolls were participating in the viral cascades, they did not have a leading role in them and only four troll accounts were truly influential. With this version, we are correcting an error in the Acknowledgments regarding the research funding that supports this work. The correct one is the European Union's Horizon 2020 Research and Innovation program under the Cybersecurity CONCORDIA project (Grant Agreement No. 830927)

  20. Z

    Machine Translation Evaluation Dataset for Amharic

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2020). Machine Translation Evaluation Dataset for Amharic [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3669948
    Explore at:
    Dataset updated
    Mar 31, 2020
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine Translation Evaluation Dataset for Amharic

    The dataset contains sentences in Amharic and their corresponding translations in English that were collected using crowd sourcing. These ground-truth sentences are from across different domains such as news headlines, social media, Wikipedia and everyday conversation.

    Metadata of files in the dataset

    amen.tsv - Domain: news | wiki | twitter | convo - Source Sentence: Amharic sentence - Reference Translation: English translation - Google Translate: output of Google Translate - Yandex Translate: output of Yandex Translate

    enam.tsv - Domain: news | wiki | twitter | convo - Source Sentence: English sentence - Reference Translation: Amharic translation - Google Translate: output of Google Translate - Yandex Translate: output of Yandex Translate

    Reference translations across domains

    News - These are news headlines from Ethiopian news websites.

    Wikipedia - A random sample of sentences from the Amharic Wikipedia.

    Twitter - Amharic Twitter posts on consumer products.

    Conversational - Everyday conversational expressions from Amharic native speakers.

    Evaluation of two systems that provide Amharic translation

    The dataset also contains evaluation of two commercial systems: "https://translate.google.com/">Google Translate and "https://translate.yandex.com/">Yandex Translate. Both systems provide free APIs that users can sign up and get access keys. The translations were generated on 14th February 2020.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger (2023). Truth Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.7531625
Organization logo

Truth Social Dataset

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jan 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Gerard; Nicholas Botzer; Tim Weninger; Patrick Gerard; Nicholas Botzer; Tim Weninger
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A Truth Social data set containing a network of users, their associated posts, and additional information about each post. Collected from February 2022 through September 2022, this dataset contains 454,458 user entries and 845,060 Truth (Truth Social’s term for post) entries.

Comprised of 12 different files, the entry count for each file is shown below.

FileData Points
users.tsv454,458
follows.tsv4,002,115
truths.tsv823,927
quotes.tsv10,508
replies.tsv506,276
media.tsv184,884
hashtags.tsv21,599
external_urls.tsv173,947
truth_hashtag_edges.tsv213,295
truth_media_edges.tsv257,500
truth_external_url_edges.tsv252,877
truth_user_tag_edges.tsv145,234

A readme file is provided that describes the structure of the files, necessary terms, and necessary information about the data collection.

Search
Clear search
Close search
Google apps
Main menu