Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
This dataset is provided courtesy of Ahrefs.com. The associated paper is upcoming at ICWSM 2024.
GeNeG is a knowledge graph constructed from news articles on the topic of refugees and migration, collected from German online media outlets. GeNeG contains rich textual and metadata information, as well as named entities extracted from the articles' content and metadata and linked to Wikidata. The graph is expanded with up to three-hop neighbors from Wikidata of the initial set of linked entities.
GeNeG comes in three flavors:
Information about uploaded files:
(all files are b-zipped and in the N-Triples format.)
File | Description |
---|---|
geneg_type-metadata.nt.bz2 | Metadata about the dataset, described using void vocabulary. |
geneg_type-instances_types.nt.bz2 | Class definitions of articles and events. |
geneg_type-instances_labels.nt.bz2 | Labels of instances. |
geneg_type-instances_metadata_literals.nt.bz2 | Relations between news article resurces and metadata literals (e.g. URL, publishing date, modification date, polarity score, stance). |
geneg_type-instances_metadata_resources.nt.bz2 | Relations between news article resources and metadata entities (i.e. publishers, authors, keywords). |
geneg_type-instances_content_relations.nt.bz2 | Relations between news article resources and content components (e.g. titles, abstracts, article bodies). |
geneg_type-instances_event_mapping.nt.bz2 | Mapping of news article resources to events. |
geneg_type-event_relations.nt.bz2 | Relations between news events and entities mentioned (i.e. actors, places, mentions). |
geneg_type-wiki_relations.nt.bz2 | Relations between news event Wikidata entities and their k-hop entities neighbors from Wikidata. |
Changelog
v1.0.1
*** Fake News on Twitter ***
These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:
1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.
2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."
3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.
4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.
5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.
The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).
DD
DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:
The structure of excel files for each dataset is as follow:
Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The description sentence in the profile of the user who has published the tweet/retweet
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Date and time of creation of the account by which the current tweet/retweet has been posted
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)
DG
DG for each fake news contains two files:
A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)
Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.
The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this project, we aimed to map the visualisation design space of visualisation embedded in right-to-left (RTL) scripts. We aimed to expand our knowledge of visualisation design beyond the dominance of research based on left-to-right (LTR) scripts. Through this project, we identify common design practices regarding the chart structure, the text, and the source. We also identify ambiguity, particularly regarding the axis position and direction, suggesting that the community may benefit from unified standards similar to those found on web design for RTL scripts. To achieve this goal, we curated a dataset that covered 128 visualisations found in Arabic news media and coded these visualisations based on the chart composition (e.g., chart type, x-axis direction, y-axis position, legend position, interaction, embellishment type), text (e.g., availability of text, availability of caption, annotation type), and source (source position, attribution to designer, ownership of the visualisation design). Links are also provided to the articles and the visualisations. This dataset is limited for stand-alone visualisations, whether they were single-panelled or included small multiples. We also did not consider infographics in this project, nor any visualisation that did not have an identifiable chart type (e.g., bar chart, line chart). The attached documents also include some graphs from our analysis of the dataset provided, where we illustrate common design patterns and their popularity within our sample.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The chart shows that that the oldest Americans, especially those over 65, were more likely to share fake news to their Facebook friends. This is true even when holding other characteristics—including education, ideology, and partisanship—constant. The coefficient on “Age over 65” implies that being in the oldest age group was associated with sharing nearly seven times as many articles from fake news domains on Facebook as those in the youngest age group, or about 2.3 times as many as those in the next-oldest age group, holding the effect of ideology, education, and the total number of web links shared constant.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news articles and information about organizations and persons mentioned in the articles. The dataset has the form of a graph. It has been produced by the SmartDataLake project (https://smartdatalake.eu), using data collected from the GDELT project (https://www.gdeltproject.org).
During a 2025 survey, ** percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just ** percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than ** percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than ** percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The latest release of ClaimsKG is available in Datorium.
ClaimsKG is a knowledge graph of metadata information for thousands of fact-checked claims which facilitates structured queries about their truth values, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia, and lifts all data to RDF using an RDF/S model that makes use of established vocabularies (such as schema.org).
ClaimsKG does NOT contain the text of the reviews from the fact-checking web sites; it only contains structured metadata information and links to the reviews.
More information, such as statistics, query examples and a user friendly interface to explore the knowledge graph, is available at: https://data.gesis.org/claimskg/site
If you use ClaimsKG, please cite the below paper:
Tchechmedjiev, Andon, Pavlos Fafalios, Katarina Boland, Malo Gasquet, Matthäus Zloch, Benjamin Zapilko, Stefan Dietze, and Konstantin Todorov. "ClaimsKG: a Knowledge Graph of Fact-Checked Claims." In International Semantic Web Conference, pp. 309-324. Springer, Cham, 2019. https://doi.org/10.1007/978-3-030-30796-7_20
[pdf, bib]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unemployment Rate in the United States increased to 4.20 percent in July from 4.10 percent in June of 2025. This dataset provides the latest reported value for - United States Unemployment Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Producer Price Index by Commodity: Pulp, Paper, and Allied Products: News and Other Low Grade Recyclable Paper was 160.00400 Index 1982=100 in July of 2025, according to the United States Federal Reserve. Historically, United States - Producer Price Index by Commodity: Pulp, Paper, and Allied Products: News and Other Low Grade Recyclable Paper reached a record high of 675.60000 in June of 1995 and a record low of 85.40000 in December of 2019. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Producer Price Index by Commodity: Pulp, Paper, and Allied Products: News and Other Low Grade Recyclable Paper - last updated from the United States Federal Reserve on September of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Producer Price Index by Industry: Material Recyclers: News and Other Low Grade Recyclable Paper was 115.82800 Index Dec 1986=100 in April of 2025, according to the United States Federal Reserve. Historically, United States - Producer Price Index by Industry: Material Recyclers: News and Other Low Grade Recyclable Paper reached a record high of 374.90000 in June of 1995 and a record low of 40.40000 in September of 1991. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Producer Price Index by Industry: Material Recyclers: News and Other Low Grade Recyclable Paper - last updated from the United States Federal Reserve on July of 2025.
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
A survey conducted in December 2020 assessing if news consumers in the United States had ever unknowingly shared fake news or information on social media found that 38.2 percent had done so. A similar share had not, whereas seven percent were unsure if they had accidentally disseminated misinformation on social networks.
Fake news in the U.S.
Fake news, or news that contains misinformation, has become a prevalent issue within the American media landscape. Fake news can be circulated online as news stories with deliberately misleading headings, or clickbait, but the rise of misinformation cannot be solely accredited to online social media. Forms of fake news are also found in print media, with 47 percent of Americans witnessing fake news in newspapers and magazines as of January 2019.
News consumers in the United States are aware of the spread of misinformation, with many Americans believing online news websites regularly report fake news stories. With such a high volume of online news websites publishing false information, it can be difficult to assess the credibility of a story. This can have damaging effects on society in that the public struggled to keep informed, creating a great deal of confusion about even basic facts and contributing to incivility.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Quotegraph is a large social network represented as a directed graph extracted from quotations in Quotebank. Edges point from the speaker of a quotation to a person mentioned in that quotation. The names of the actors are linked to Wikidata using quotebank-toolkit. Quotegraph boasts 528k nodes and 8.6 million edges, which makes it suitable for a large-scale analysis of speaker interactions in news articles.
Below is the schema of Quotegraph, stored in quotegraph.parquet.tar.gz:
|-- quoteID: string: primary key of the quotation (format: "YYYY-MM-DD-{increasing int:06d}")
|-- speaker: string: Wikidata QID of a speaker who uttered the quotation (source node)
|-- target: string: Wikidata QID of a person mentioned in the quotation (target node)
|-- quotation: string: Textual content of the quotation
|-- date: Earliest occurrence date of any version of the quotation
The data can further be enriched by the information about quotations and articles they appear in (available in Quotebank and Wikidata information about the actors available in speaker_attributes.parquet.tar.gz. The schema of the dataset is given below:
|-- id: Wikidata item QID of the speaker, primary key
|-- aliases: list of speaker's aliases
|-- date_of_birth: list of possible speaker's dates of birth
|-- nationality: list of speaker's nationalities
|-- gender: list of speaker's previous or current genders
|-- ethnic_group: list of ethnic groups the speaker belongs to
|-- US_congress_bio_ID: identifier for the speaker in the Biographical Directory of the United States Congress
|-- occupation: list of speaker's occupations
|-- party: list of parties the speaker is/was affiliated with
|-- academic_degree: list of academic degrees obtained by the speaker
|-- label: Wikidata label of the speaker
|-- religion: previous/current religious affiliations of the speaker
The graph shows the consumer perceptions of news and fake news in Canada as of May 2017. It was found that ** percent of respondents either strongly or somewhat agreed that they have falsely believed a news story to be true until discovering otherwise, and ** percent strongly or somewhat agreed that they did not know how to differentiate between real and fake news.
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Network of 42 papers and 59 citation links related to "Leading Sentence News TextRank".
According to a survey conducted on digital news in the Philippines between January and February 2025, news from GMA Network was the most trustworthy, as stated by 67 percent of respondents. In contrast, Rappler received the highest share of respondents who said they distrust the news the media outlet publishes.
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Network of 33 papers and 57 citation links related to "Smart Grid Handbook [Book News]".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
International Paper reported $42.38B in Assets for its fiscal quarter ending in June of 2025. Data for International Paper | IP - Assets including historical, tables and charts were last updated by Trading Economics this last September in 2025.
This organisation chart gives an overview of the top-level structure of the office.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
This dataset is provided courtesy of Ahrefs.com. The associated paper is upcoming at ICWSM 2024.