Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The chart shows that Americans over 65 were more likely to share fake news to their Facebook friends, regardless of their education, ideology, and partisanship. The oldest age group was likely to share nearly seven times as many articles from fake news domains on Facebook as those in the youngest age group, or about 2.3 times as many as those in the next-oldest age group. The data regarding the age group 18-29 and 30-44 are not displayed in the source, therefore the value of data in this chart are approximate, determined with pixel count.
This graph shows the share of news stories about the Trump administration that were positive from January to April 2017. According to the source, ** percent of the news stories in the early days of the Trump administration carried a positive assessment.
NeMig are two English and German knowledge graphs constructed from news articles on the topic of migration, collected from online media outlets from Germany and the US, respectively. NeMIg contains rich textual and metadata information, sub-topics and sentiment annotations, as well as named entities extracted from the articles' content and metadata and linked to Wikidata. The graphs are expanded with up to two-hop neighbors from Wikidata of the initial set of linked entities.
NeMig comes in four flavors, for both the German, and the English corpora:
Information about uploaded files:
(all files are b-zipped and in the N-Triples format.)
File | Description |
---|---|
nemig_${language}_ ${graph_type}-metadata.nt.bz2 | Metadata about the dataset, described using void vocabulary. |
nemig_${language}_ ${graph_type}-instances_types.nt.bz2 | Class definitions of news and event instances. |
nemig_${language}_ ${graph_type}-instances_labels.nt.bz2 | Labels of instances. |
nemig_${language}_ ${graph_type}-instances_related.nt.bz2 | Relations between news instances based on one another. |
nemig_${language}_ ${graph_type}-instances_metadata_literals.nt.bz2 | Relations between news instances and metadata literals (e.g. URL, publishing date, modification date, sentiment label, political orientation of news outlets). |
nemig_${language}_ ${graph_type}-instances_content_mapping.nt.bz2 | Mapping of news instances to content instances (e.g. title, abstract, body). |
nemig_${language}_ ${graph_type}-instances_topic_mapping.nt.bz2 | Mapping of news instances to sub-topic instances. |
nemig_${language}_ ${graph_type}-instances_content_literals.nt.bz2 | Relations between content instances and corresponding literals (e.g. text of title, abstract, body). |
nemig_${language}_ ${graph_type}-instances_metadata_resources.nt.bz2 | Relations between news or sub-topic instances and entities extracted from metadata (i.e. publishers, authors, keywords). |
nemig_${language}_ ${graph_type}-instances_event_mapping.nt.bz2 | Mapping of news instances to event instances. |
nemig_${language}_ ${graph_type}-event_resources.nt.bz2 | Relations between event instances and entities extracted from the text of the news (i.e. actors, places, mentions). |
nemig_${language}_ ${graph_type}-resources_provenance.nt.bz2 | Provenance information about the entities extracted from the text of the news (e.g. title, abstract, body). |
nemig_${language}_ ${graph_type}-wiki_resources.nt.bz2 | Relations between Wikidata entities from news and their k-hop entity neighbors from Wikidata. |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
*** Fake News on Twitter ***
These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:
1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.
2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."
3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.
4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.
5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.
The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).
DD
DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:
The structure of excel files for each dataset is as follow:
Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The description sentence in the profile of the user who has published the tweet/retweet
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Date and time of creation of the account by which the current tweet/retweet has been posted
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)
DG
DG for each fake news contains two files:
A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)
Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.
The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
GeNeG is a knowledge graph constructed from news articles on the topic of refugees and migration, collected from German online media outlets. GeNeG contains rich textual and metadata information, as well as named entities extracted from the articles' content and metadata and linked to Wikidata. The graph is expanded with up to three-hop neighbors from Wikidata of the initial set of linked entities.
GeNeG comes in three flavors:
Information about uploaded files:
(all files are b-zipped and in the N-Triples format.)
File | Description |
---|---|
geneg_type-metadata.nt.bz2 | Metadata about the dataset, described using void vocabulary. |
geneg_type-instances_types.nt.bz2 | Class definitions of articles and events. |
geneg_type-instances_labels.nt.bz2 | Labels of instances. |
geneg_type-instances_metadata_literals.nt.bz2 | Relations between news article resurces and metadata literals (e.g. URL, publishing date, modification date, polarity score, stance). |
geneg_type-instances_metadata_resources.nt.bz2 | Relations between news article resources and metadata entities (i.e. publishers, authors, keywords). |
geneg_type-instances_content_relations.nt.bz2 | Relations between news article resources and content components (e.g. titles, abstracts, article bodies). |
geneg_type-instances_event_mapping.nt.bz2 | Mapping of news article resources to events. |
geneg_type-event_relations.nt.bz2 | Relations between news events and entities mentioned (i.e. actors, places, mentions). |
geneg_type-wiki_relations.nt.bz2 | Relations between news event Wikidata entities and their k-hop entities neighbors from Wikidata. |
Changelog
v1.0.1
During a 2025 survey, ** percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just ** percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than ** percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than ** percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Here you find the History of Work resources as Linked Open Data. It enables you to look ups for HISCO and HISCAM scores for an incredible amount of occupational titles in numerous languages.
Data can be queried (obtained) via the SPARQL endpoint or via the example queries. If the Linked Open Data format is new to you, you might enjoy these data stories on History of Work as Linked Open Data and this user question on Is there a list of female occupations?.
This version is dated Apr 2025 and is not backwards compatible with the previous version (Feb 2021). The major changes are: - incredible simplification of graph representation (from 81 to 12); - use of sdo (https://schema.org/) rather than schema (http://schema.org); - replacement of prov:wasDerivedFrom with sdo:isPartOf to link occupational titles to originating datasets; - etl files (used for conversion to Linked Data) now publicly available via https://github.com/rlzijdeman/rdf-hisco; - update of issues with language tags; - specfication of language tags for english (eg. @en-gb, instead of @en); - new preferred API: https://api.druid.datalegend.net/datasets/HistoryOfWork/historyOfWork-all-latest/sparql (old API will be deprecated at some point: https://api.druid.datalegend.net/datasets/HistoryOfWork/historyOfWork-all-latest/services/historyOfWork-all-latest/sparql ) .
There are bound to be some issues. Please leave report them here.
Figure 1. Part of model illustrating the basic relation between occupations, schema.org and HISCO.
https://druid.datalegend.net/HistoryOfWork/historyOfWork-all-latest/assets/601beed0f7d371035bca5521" alt="hisco-basic">
Figure 2. Part of model illustrating the relation between occupation, provenance and HISCO auxiliary variables.
https://druid.datalegend.net/HistoryOfWork/historyOfWork-all-latest/assets/601beed0f7d371035bca551e" alt="hisco-aux">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains automated sentiment and emotionality annotations of 23 million headlines from 47 popular news media outlets popular in the United States.
The set of 47 news media outlets analysed (listed in Figure 1 of the main manuscript) was derived from the AllSides organization 2019 Media Bias Chart v1.1. The human ratings of outlets’ ideological leanings were also taken from this chart and are listed in Figure 2 of the main manuscript.
News articles headlines from the set of outlets analyzed in the manuscript are available in the outlets’ online domains and/or public cache repositories such as The Internet Wayback Machine, Google cache and Common Crawl. Articles headlines were located in articles’ HTML raw data using outlet-specific XPath expressions.
The temporal coverage of headlines across news outlets is not uniform. For some media organizations, news articles availability in online domains or Internet cache repositories becomes sparse for earlier years. Furthermore, some news outlets popular in 2019, such as The Huffington Post or Breitbart, did not exist in the early 2000’s. Hence, our data set is sparser in headlines sample size and representativeness for earlier years in the 2000-2019 timeline. Nevertheless, 20 outlets in our data set have chronologically continuous partial or full headline data availability since the year 2000. Figure S 1 in the SI reports the number of headlines per outlet and per year in our analysis.
In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the headline due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. After manual testing, we determined that the percentage of headlines following in this category is very small. Additionally, our method might miss detecting some articles in the online domains of news outlets. To conclude, in a data analysis of over 23 million headlines, we cannot manually check the correctness of every single data instance and hundred percent accuracy at capturing headlines’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our headlines set is representative of headlines in print news media content for the studied time period and outlets analyzed.
The list of compressed files in this data set is listed next:
-analysisScripts.rar contains the analysis scripts used in the main manuscript as well as aggregated data of sentiment and emotionality automated annotations of the headlines and human annotations of a subset of headlines sentiment and emotionality used as ground truth.
-models.rar contains the Transformer sentiment and emotion annotation models used in the analysis. Namely:
Siebert/sentiment-roberta-large-english from https://huggingface.co/siebert/sentiment-roberta-large-english. This model is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). See more information from the original authors at https://huggingface.co/siebert/sentiment-roberta-large-english
DistilbertSST2.rar is the default sentiment classification model of the HuggingFace Transformer library https://huggingface.co/ This model is only used to replicate the results of the sentiment analysis with sentiment-roberta-large-english
DistilRoberta j-hartmann/emotion-english-distilroberta-base from https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. The model is a fine-tuned checkpoint of DistilRoBERTa-base. The model allows annotation of English text with Ekman's 6 basic emotions, plus a neutral class. The model was trained on 6 diverse datasets. Please refer to the original author at https://huggingface.co/j-hartmann/emotion-english-distilroberta-base for an overview of the data sets used for fine tuning. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
-headlinesDataWithSentimentLabelsAnnotationsFromSentimentRobertaLargeModel.rar URLs of headlines analyzed and the sentiment annotations of the siebert/sentiment-roberta-large-english Transformer model. https://huggingface.co/siebert/sentiment-roberta-large-english
-headlinesDataWithSentimentLabelsAnnotationsFromDistilbertSST2.rar URLs of headlines analyzed and the sentiment annotations of the default HuggingFace sentiment analysis model fine-tuned on the SST-2 dataset. https://huggingface.co/
-headlinesDataWithEmotionLabelsAnnotationsFromDistilRoberta.rar URLs of headlines analyzed and the emotion categories annotations of the j-hartmann/emotion-english-distilroberta-base Transformer model. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Inventory: New Listing Count in Story County, IA was 100.00000 U.S. $ in May of 2025, according to the United States Federal Reserve. Historically, Housing Inventory: New Listing Count in Story County, IA reached a record high of 176.00000 in June of 2019 and a record low of 28.00000 in December of 2024. Trading Economics provides the current actual value, an historical data chart and related indicators for Housing Inventory: New Listing Count in Story County, IA - last updated from the United States Federal Reserve on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
New Patent Assignments in Story County, IA was 1.00000 Patents in November of 2023, according to the United States Federal Reserve. Historically, New Patent Assignments in Story County, IA reached a record high of 34.00000 in May of 2006 and a record low of 1.00000 in October of 1983. Trading Economics provides the current actual value, an historical data chart and related indicators for New Patent Assignments in Story County, IA - last updated from the United States Federal Reserve on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Inventory: New Listing Count Year-Over-Year in Story County, IA was -3.85% in May of 2025, according to the United States Federal Reserve. Historically, Housing Inventory: New Listing Count Year-Over-Year in Story County, IA reached a record high of 75.00 in September of 2019 and a record low of -51.95 in September of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for Housing Inventory: New Listing Count Year-Over-Year in Story County, IA - last updated from the United States Federal Reserve on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Housing Inventory: New Listing Count Month-Over-Month in Story County, IA was 2.04% in May of 2025, according to the United States Federal Reserve. Historically, Housing Inventory: New Listing Count Month-Over-Month in Story County, IA reached a record high of 120.00 in January of 2019 and a record low of -58.82 in December of 2024. Trading Economics provides the current actual value, an historical data chart and related indicators for Housing Inventory: New Listing Count Month-Over-Month in Story County, IA - last updated from the United States Federal Reserve on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in the United States increased to 2.40 percent in May from 2.30 percent in April of 2025. This dataset provides - United States Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
New Zealand Exports of printed books, newspapers, pictures to Cyprus was US$1.81 Thousand during 2022, according to the United Nations COMTRADE database on international trade. New Zealand Exports of printed books, newspapers, pictures to Cyprus - data, historical chart and statistics - was last updated on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Argentina Imports of printed books, newspapers, pictures from New Zealand was US$434 during 2019, according to the United Nations COMTRADE database on international trade. Argentina Imports of printed books, newspapers, pictures from New Zealand - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Slovakia Exports of printed books, newspapers, pictures to New Caledonia was US$66 during 2010, according to the United Nations COMTRADE database on international trade. Slovakia Exports of printed books, newspapers, pictures to New Caledonia - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
New Zealand Exports of printed books, newspapers, pictures to Greece was US$79 during 2024, according to the United Nations COMTRADE database on international trade. New Zealand Exports of printed books, newspapers, pictures to Greece - data, historical chart and statistics - was last updated on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
New Zealand Exports of printed books, newspapers, pictures to Qatar was US$22.92 Thousand during 2024, according to the United Nations COMTRADE database on international trade. New Zealand Exports of printed books, newspapers, pictures to Qatar - data, historical chart and statistics - was last updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The chart shows that Americans over 65 were more likely to share fake news to their Facebook friends, regardless of their education, ideology, and partisanship. The oldest age group was likely to share nearly seven times as many articles from fake news domains on Facebook as those in the youngest age group, or about 2.3 times as many as those in the next-oldest age group. The data regarding the age group 18-29 and 30-44 are not displayed in the source, therefore the value of data in this chart are approximate, determined with pixel count.