According to the most recently available data, there were 1,279 daily newspapers in the United States in 2018. The number of daily newspapers in the U.S. has been on the decline since 1970, when there were 1,748 daily news publications in the country. However, given the ongoing struggle of print media around the world, a decrease of around 460 newspapers over several decades is more positive than many might expect.
Daily newspapers in the U.S.
Whilst the actual number of daily newspapers has remained comparatively stable since the 1970s, the same cannot be said for circulation figures. In 2017, the paid circulation of daily newspapers in the United States amounted to 30.92 million, more than half the figure recorded for 1985. Even the major players in the industry are suffering – Chicago Tribune’s daily circulation fell from just over 438 thousand in September 2017 to 238 thousand in early 2019. Household names like The New York Times and The Wall Street Journal also saw a sharp drop in circulation figures, leaving little hope for smaller publications.
News consumers and the reluctance to pay
Media markets across the world have become saturated with digital alternatives to print, and U.S. consumers can now access news content, gossip columns and reports on the latest sports games from a variety of sources, rendering daily newspapers in particular less valuable and sought after than ever before. The reasons to pick up a newspaper on your daily commute when the information you seek is available online are becoming hazy – why pay for print when you can get the digital version for free? In fact, a 2018 study revealed that the vast majority of surveyed U.S. adults had not paid for any local news content in the last year.
However, a small solace for print-only news outlets is that even digital news providers are not completely safe from consumers’ reluctance to pay. A report revealed that the wealth of free content available was the main reason why U.S. consumers were unwilling to pay for online news.
Sadly, the future situation for print outlets does not look bright, and for news providers in general there lies a constant uphill struggle to maintain integrity, prove accuracy and capture the attention of new and current consumers alike.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The coronavirus disease (COVID-19) spread rampantly around the world at the beginning of 2020 before the governments of each country could prevent it by making decisions based on medical data analysis. With proper formalization, the terabytes of new textual data available online every day could have been used for the early description and detection of cases of this virus. Since then, the number of Event-Based Surveillance (EBS) applications has increased exponentially. These applications aim to mine channels of unstructured information to detect signs of possible public health events' progression. However, one problem with such systems is the need for expert intervention to define which event will be captured, which relevant terms should be used in the search, and to analyze the events to modify the search procedure constantly. Another problem is that many of these applications do not consider both spatial and temporal characteristics. Addressing such limitations, this datasets presents a novel approach. We propose the use of BioPropaPhenKG to replace such systems. In this dataset, BioPropaPhen was enhanced with information comming from unstructured texts from online newspapers and medical articles. BioPropaPhenKG, its ontology and other useful information can be found in https://zenodo.org/records/10911980. The code used for this use case can be found in https://github.com/Gabriel382/DDPF-Health-Risks . Finally, the datasets used where UMLS MetamorphoSys, OpenStreetMaps, Wikidata, Aylien (data only from November of 2019) and CORD-19 (data only from December of 2019).
To read, you just need to load it with Neo4j:4.4.3. Alternatively, you can open it with docker using the following command:
docker run --interactive --tty --rm \
--publish=7474:7474 --publish=7687:7687 \
--volume=/path-to-data-folder:/data --user="$(id -u):$(id -g)"\
neo4j:4.4.3 \
neo4j-admin load --from=/data/BioPropaPhenKG-Journal-Medical.dump --database "neo4j" --force
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains information for each of the images analyzed for:Rebich-Hespanha, S., Rice, R.E., Montello, D.R., Retzloff, S., Tien, S., and Hespanha, J.P. (2015) Image Themes and Frames in U.S. Print News Stories about Climate Change, Environmental Communication, 9(4), 491-519. doi:10.1080/17524032.2014.983534.The 350 images referenced in this data table (one image per row) appeared with 200 articles that were randomly sampled from among 5,637 image-containing newspaper and magazine stories about climate change that appeared in 11 US news source archives between the time each source first included any reference to climate change (earliest was 1969) and the date of data collection (September 2009). Sources were selected because they were associated with image metadata for at least some records, and were available on microfilm or as paper copy at the University of California, Santa Barbara library. Because this corpus was collected primarily for algorithmic text analysis, only English-language articles were included. Text stories were retrieved from the LexisNexis news database using a query for the subject terms ‘climate change’ or ‘global warming’. Queries returned 14,910 stories about climate change, and, of those, 5,637 (37.8%) were associated with metadata indicating inclusion of one or more images. Candidate articles from this image-containing set were randomly selected, and the first author read each article and excluded (1) stories that mentioned climate change but were focused on an unrelated topic (e.g., referring to climate change or global warming when describing a ‘hot’ sports team), (2) stories that were about environment-themed topics (e.g., alternative energy, ‘green’ lifestyles, weather phenomena) but did not explicitly refer to climate change or global warming, (3) stories that briefly referred to climate change or global warming but provided only minimal information about the relationship between climate change and the main topic of the story, and (4) news summaries that included only brief information about multiple top stories of the day. Articles were selected iteratively until 200 articles met the selection criteria. Once image identity was determined through microfilm scans, all images that could be located via web search in high-quality digital format were acquired. All of the stories that were selected for the sample based on the presence of graphics metadata contained at least one image. The final data consisted of 350 images from 200 articles, which are listed in this table. Information provided for each image includes the following (which are columns in the table):
imageNUM: unique number corresponding with each image ImageID: unique alphanumeric ID that provides information about the article (beginning numeric digits) and sequence in article (following alphabetic and numeric characters) publication date: date article and image(s) were published headline: headline of the article with which the image(s) appeared caption: caption (if any) that appeared with image commissioned by (e.g. AP images, MCT direct): organization responsible for creating image; ni = no information created by (name and occupation of photographer, journalist, scientist, etc.): person/people who created the image; ni = no information data source (if applicable): source of data upon which image is based; ni = no information, na = not applicable (i.e., no data were used) brief description: a basic description of the visual denotative elements of the image ImageType_chart: if 1, image was classified as a chart ImageType_illustration: if 1, image was classified as an illustration ImageType_photo: if 1, image was classified as a photo ImageType_diagram: if 1, image was classified as a diagram ImageType_hedcut: if 1, image was classified as a hedcut (portrait created using stipple method of drawing, used by Wall Street Journal to present images of columnists) ImageType_infographic: if 1, image was classified as an infographic ImageType_map: if 1, image was classified as a map ImageType_table: if 1, image was classified as a table
Financial narratives have always been relevant to economic fluctuations, rationalising current actions, such as spending and investing, inspiring and linking activities to important values and needs (Shiller 2017). In 2014, the European Parliament adopted the Bank Recovery and Resolution Directive (BRRD) which includes the bail-in tool. This means that taxpayers would not risk losses, but rather creditors and depositors would take a loss on their holdings. This Directive was applied to four banks and the press and media coverage of both resolutions and their effects was remarkable, influencing several issues regarding these banks' bondholders. The present study will investigate a corpus of articles from The Financial Times (FT.com, Europe) and one from The Times (thetimes.co.uk), selected around the keyword bail*-in, attempting to highlight how financial information is provided multimodally. The choice of the expression bail*-in was made because of its highly specialised semantic load in the financial field. The use of textual organisation, tables, graphs, and the relationship between text and image will be dealt with and applied to the corpus gathered. Verbal and visual elements have been considered as fulfilling, on the one hand, the three functions of informing, narrating and persuading, characterising news discourse, and, on the other hand, those of informing, evaluating and predicting, typical of financial discourse. This paper is part of an ongoing study on financial newspaper articles and whether and to what extent knowledge dissemination is popularised from specialised to non-specialised texts, recombined and recontextualised to be more intelligible to the layman. The main aim will be to analyse the combination of the verbal and visual structures of these articles, trying to detect any differences in the multimodal strategies employed by a specialised and a non-specialised newspaper.
The project investigates (a) how staged global political media events (i.e. the global climate summits) are produced, and (b) which discursive effects these events have on national climate debates in the media of five leading democratic countries around the world, namely the U.S., Germany, India, South Africa and Brazil.
I. Formal and general content related categories 1. Formal variables: article-ID; coder-ID; title (main headline of the article); date of publication; media outlet (newspaper, magazine or news website in which the article was published); length of the article; format of the article (fact-based article, opinion-based article, interview, press review, stand-alone visual image as an independent article, letter to the editor, other); placement of the article (front page article or cover story, article inside the newspaper and magazine referenced on the front page, article inside the newspaper and magazine without reference on the front page); section of newspaper, magazine and news website; author of the article.
II. Visual level 1. Formal variables: visual present; photo present; number of visual images; number of photos; visual image-ID, type of visual image (photograph, photomontage, chart, map or table, cartoon / caricature, official logo of COP, topical vignette by newspaper or magazine); source of visual image. 2. Visual framing (if the visual image is a photograph or photomontage): denotative level: institutional reference depicted in the photo; content of the photo: urban landscape, natural landscape (woods, mountains and/ or lake, plants and/ or grassland / meadow), ocean and/or ocean coast, snow, ice, glacier, desert or steppe, polar bear, other animals, transportation or conventional traffic, agriculture, conventional energy generation, green technology, other industry / technology, PR stunt installation; person(s) depicted in the photo: political actor, NGO representative(s), business representatives, scientists, celebrities, police / security personnel, ordinary citizen(s), other type of person; origin of depicted person; activity of depicted person (e.g. symbolic activity, demonstration and other form of protest, etc.); location of depicted scene.
Stylistic level: camera angle, distance / field size of photo.
III. Narration: 1. Narrative characteristics: narratively (dramatization, emotion, narrative personalization, fictionalization, stylistic ornamentation); narrative genre: overall theme (everyday business, failure after struggle, triumph over adversity, struggle over destiny or planet or civilization, political or social conflict); tone (fatalistic, optimistic, unexcited, neutral, passionate, pessimistic); expected outcome; no conceivable outcome. 2. Character specification: character as victim: narrative role: victim present; victim type; victim name; victim action taken; character as villain: narrative role: villain present; villain type; villain name; villain action taken; character as hero: narrative role: hero present; hero type; hero name; hero action taken; sum of all actors in the article; sum of NGO representatives, politicians, representatives, international organizations, business representatives, scientists, journalists, citizens, and other actors.
IV. Actor-statement level Actors: actor-statement-ID; name of the actor; type of actor; occupation / office of actor; origin of actor; type of quotation; prominence of actor-statement; type of ´we´ reference; frames: denial of reality of global warming; denial of problematic character / urgency of action; cntral aspect of problem definition: increase of temperature, extreme weather, melting ice or glaciers / rising sea levels, economic opportunities due to global warming, economic difficulties and hardships due to global warming, other societal consequences; causal attribution (situations or processes the actor identifies as causing or contributing to global warming): natural causes; anthropogenic causes (burning of fossil fuels / greenhouse gas emissions, deforestation, colliding national interests, other causes; countries responsible for causing global warming; endorsed and rejected remedies (no action should be taken, clean energy, reforestation and avoided deforestation); adaption action: adaption in agricultural production; adjusting political process: adoption of new legally binding, all-inclusive treaty on emission cuts; stronger focus on local efforts / working on the ground; other measures: financial assistance to disadvantaged countries; attributed responsibility for solving the problem.
Additionally coded was: country; COP (COP 16 Cancun, COP 17 Durban, COP 18 Doha, COP 19 Warsaw); 4 Cluster Solution Frames (political dispute, common...
The aim of this study was to chart the media participation of adolescents in Argentina, Egypt, Finland and India. The questions surveyed personal, social and public use of media as well as media literacy. First, the respondents were asked what media equipment (e.g. newspapers, radio, television, mobile phone, music players, video game consoles, computers) they had access to at home or somewhere else, and how often the respondents used them. They were also asked if a variety of media-related services were offered in their community/neighbourhood (e.g. libraries, movie theatres, internet cafés). In addition, the survey examined respondents' opinions on a variety of statements relating to matters such as media skills, rules for media use in the family, interest in media content, and fandom. Next, the respondents were asked which media they used for information-seeking regarding schoolwork, practical everyday matters such as timetables or the weather, current events and politics, puberty and sexuality, products and services, and which media they used for e.g. contacting relatives and friends, searching for a romantic partner, or contacting celebrities. They were also asked if they used media together with their families, friends, partners or virtual friends. It was asked if the respondents published their opinions via different media channels about social, political, cultural and environmental matters or human rights matters. The respondents were also asked if they would have liked to publish their opinions more often and why they had not done so. In addition to publishing opinions, they were asked if they published their own creative content, such as text, music, drawings, videos or computer programmes. Regarding media literacy, opinions were examined on the reliability and truthfulness of news articles, television programmes, internet pages, and advertisements. It was also surveyed whether the respondents thought that young people should be discussed more often in the media. They were also asked if the media often portrayed a distorted image of foreign countries and cultures. Next, the respondents' interest in a variety of themes was examined (both in "factual" and in "fictional" media), along with whether they published opinions or content regarding these themes or discussed them with other people (the themes included e.g. news, politics and society, the environment, human rights, sports, beauty and fashion, art, science, history, technology, and celebrities). The respondents were also asked if they had ever been interviewed for a newspaper article, a radio or television programme or some other medium, and to what extent it was possible for young people to have their opinions heard. The survey also charted respondents' experiences of limitations set on media use by e.g. governmental authorities or legislation, religion, school, family or friends. Finally, the respondents' internet use was surveyed with regard to how often they used the internet in different places (e.g. at school, at home, at a friend's home, in public or on a mobile phone) and for different purposes (e.g. information-seeking, e-mail, shopping, instant messaging, social media, gaming, or watching videos/movies). The respondents' mobile phone use was also examined with regard to how often they used a mobile phone for phone calls, SMS messages, gaming, listening to music or radio, using the internet, taking photos and watching videos, among others. Background variables included age, gender, country, type of neighbourhood, parents' occupations (categorised), ethnic group as well as the language spoken at home and the language used for responding to the questionnaire (mother tongue/English).
This dataset contains references to newspaper articles relating to what is now described as child sexual abuse 1918-1970 that have been collected through keywords searches of British newspapers that are available in digitised form. The dataset was created as part of the ESRC-funded project ES/M009750/1 ‘Historicising “historical child sexual abuse” cases: social, political and criminal justice contexts’. The purpose of this specific element of the project was to identify pattern sin newspaper coverage across time.
The historical sexual abuse of children has become a central focal point of political, social and legal concern. On 7 July 2014 Home Secretary Theresa May announced a public inquiry into how complaints of sexual abuse have been dealt with by public bodies over the last 40 years; the inquiry will produce an interim report by May 2015, with a full report to follow at a later stage. A 10-week investigation has also been launched into allegations relating to Whitehall politicians. These announcements follow the NHS and Department of Health Investigations into Matters Relating to Jimmy Savile (published on 26 June 2014); a second report is due in 2015. The enquiries will hear important evidence from witnesses and examine files associated with the bodies under scrutiny. As yet, however, our knowledge of the broader history of sexual abuse in the twentieth century is extremely partial, with some incidents well charted and others ignored. A full understanding of the wider historical circumstances that have shaped social, legal and political responses to child sexual abuse (or their lack) is urgently needed to provide missing information to contextualise and complement these public inquiries.
This research project will carry out rapid deck-based research, using very significant sets of online sources that are already available in digital form, but whose potential for research into the history of child sexual abuse has not been realised. It will cover four significant areas:
We will construct quantitative profiles of the extent of the reporting and convictions of sexual offences from 1918 to 1990, making use of the published Criminal Justice Statistics for England and Wales.
We will carry out a qualitative longitudinal study of the role of the national and local newspaper press in reporting cases of child sexual abuse, and in shaping social attitudes towards young people and sexuality in the period 1918-1990. The newspaper press was a crucial arena through which public opinion was shaped and shifting moralities were discussed and debated for much of the twentieth century. Whilst the press cannot be viewed as an unproblematic barometer of opinion, it provides historians with an important lens through which to access a range of viewpoints and to chart dominant tropes and narratives. A survey of the newspaper press also enables us to access reports of the decisions that were made in the court-room and thus to further explain the trends for reporting and conviction that analysis of the criminal justice statistics reveal.
We will examine the shifting viewpoints of key professional groups, including social workers and lawyers, by undertaking a survey of publications associated with these occupational groups.
We will begin a mapping of organisations, bodies and associations who have commented on and campaigned around issues relating to children and sexuality across the broad period 1918-1990. This initial mapping will involve research into the availability of archival and manuscripts sources (including those held in the National Archives and local repositories) and will form the basis of a further funding application.
Our time-table is designed to coincide with the undertaking of the public enquiries and the preparation of the further report relating to the NHS and Department of Health Investigations. We will run seminars/workshops for civil servants, lawyers and other professionals involved in these investigations, and make our findings available in a free and easily accessible format as briefings on the History & Policy website. Thus our project will provide essential knowledge to shape discussion, debate, and inform the final public inquiry reports.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
In an era where some find fake news around every corner, the use of sensationalism has inevitably found its way into the scientific literature. This is especially the case for host manipulation by parasites, a phenomenon in which a parasite causes remarkable change in the appearance or behaviour of its host. This concept, which has deservedly garnered popular interest throughout the world in recent years, is nearly 50-years old. In the past two decades, the use of scientific metaphors, including anthropomorphisms and science fiction, to describe host manipulation has become more and more prevalent. It is possible that the repeated use of such catchy, yet misleading words in both the popular media and the scientific literature could unintentionally hamper our understanding of the complexity and extent of host manipulation, ultimately shaping its narrative in part or in full. In this commentary, the impacts of exaggerating host manipulation are brought to light by examining trends in the use of embellishing words. By looking at key examples of exaggerated claims from widely reported host-parasite systems found in the recent scientific literature, it would appear that some of the fiction surrounding host manipulation has since become fact.
Methods Document entitled "Doherty_procb_ESM": Electronic supplementary material containing the original data used to make Figure 1. PART A: Online newspaper and magazine articles covering host manipulation that were used to create the word cloud in Figure 1A. METHODS: Online newspapers and magazine articles were searched in the Google search engine using the combinations of the word “parasite” with “manipulation”, “mind”, “hijack”, or “zombie”. Then, articles were randomly selected within the first five pages of results. A text file was created to include the titles and headlines (if any) from all the selected articles. This file was then used to generate a word cloud with packages tm, wordcloud, and RColorBrewer in R version 3.6.3 (R Core Team, 2020). PART B: Scientific papers that were used to create the graph in Figure 1B. For each group, the search terms used in Web of Science are provided in parentheses. METHODS: Using the results from the search terms, articles were assessed individually then grouped by year.
Document entitled "figure_1a_media": Text file including all the media titles and headlines (if any) described in the document entitled "Doherty_procb_ESM". The titles and headlines were trimmed down to contain only nouns, verbs, and descriptive words.
Document entitled "figure_1a_rcode": Code used in R to create the word cloud in Figure 1A.
The graph shows leading daily newspapers with paywalls in the United States from April to September 2014, by number of paid restricted access website accounts. In that time period, Los Angeles Times ranked fourth with nearly 60 thousand paid restricted access website accounts. Digital publishing – additional information
The New York Times has been the most successful American daily when it comes to attracting readers willing to pay for its online content. The paywall, which was introduced in March 2011, allows users to read 20 articles a month for free. Once that limit has been reached, users are required to pay in order to read more articles. According to the New York Times Company’s own data, the number of paid subscribers to its digital-only products amounted to 990 thousand in the second quarter of 2015; a steady growth since its implementation. Circulation revenue now exceeds the advertising revenue. ). The New York Times seems to belong to a successful minority. During a survey in late 2013, more publishers admitted that the introduction of the paywall led to a decrease of traffic on their website. This was the experience of a third of the respondents, whereas a quarter stated the paywall boosted the traffic. Most publishers allow their readers to view five or ten free articles a month, which makes the aforementioned New York Times model of 20 articles more than generous in comparison. In general, the U.S. digital publishing industry is expected to thrive. Between the period 2014 to 2020, revenues are predicted to nearly double, generating approximately 10.3 billion U.S. dollars. Among three major types of digital publishing products – e-books, e-magazines and e-newspapers – it is the latter that will develop the most rapidly over the aforementioned period of time.
During a 2024 survey, 77 percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just 23 percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis. Social media: trust and consumption Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than 35 percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than 50 percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media. What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis. Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers. Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
According to a survey held among adults in the United States in February 2022, ABC and CBS were considered to be the most credible news sources in the country, with 61 percent of respondents believing the organizations to be very or somewhat credible. Sources which fared less well were MSNBC, Fox News, National Public Radio, and HuffPost, with less than 50 percent of adults agreeing that they found these to be reliable news outlets. The credibility of all the news sources in the ranking was higher in 2022 than in the previous year, though the figures in 2021 were particularly low.
Trust and bias in news Finding trustworthy, impartial news sources can be difficult for audiences in a world where fake news is in constant circulation and bias in news is a growing concern. More than 50 percent of total respondents to a survey held in early 2020 believed that there was a fair amount or great deal of bias in the news sources they used most often. The same study found that close to 70 percent of respondents were more concerned with bias in news that other people may consume than with their own news source.
A report exploring trust in news found that radio, network news, and newspapers were the most trusted news sources in the United States, whereas social media was not considered reliable in this regard. The lack of trust in news on social media has yet to affect consumption – social networks are the most used source of news among many consumers, particularly younger generations. In fact, some news consumers are moving away from official news platforms altogether and getting their updates from influencers rather than journalists.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
According to the most recently available data, there were 1,279 daily newspapers in the United States in 2018. The number of daily newspapers in the U.S. has been on the decline since 1970, when there were 1,748 daily news publications in the country. However, given the ongoing struggle of print media around the world, a decrease of around 460 newspapers over several decades is more positive than many might expect.
Daily newspapers in the U.S.
Whilst the actual number of daily newspapers has remained comparatively stable since the 1970s, the same cannot be said for circulation figures. In 2017, the paid circulation of daily newspapers in the United States amounted to 30.92 million, more than half the figure recorded for 1985. Even the major players in the industry are suffering – Chicago Tribune’s daily circulation fell from just over 438 thousand in September 2017 to 238 thousand in early 2019. Household names like The New York Times and The Wall Street Journal also saw a sharp drop in circulation figures, leaving little hope for smaller publications.
News consumers and the reluctance to pay
Media markets across the world have become saturated with digital alternatives to print, and U.S. consumers can now access news content, gossip columns and reports on the latest sports games from a variety of sources, rendering daily newspapers in particular less valuable and sought after than ever before. The reasons to pick up a newspaper on your daily commute when the information you seek is available online are becoming hazy – why pay for print when you can get the digital version for free? In fact, a 2018 study revealed that the vast majority of surveyed U.S. adults had not paid for any local news content in the last year.
However, a small solace for print-only news outlets is that even digital news providers are not completely safe from consumers’ reluctance to pay. A report revealed that the wealth of free content available was the main reason why U.S. consumers were unwilling to pay for online news.
Sadly, the future situation for print outlets does not look bright, and for news providers in general there lies a constant uphill struggle to maintain integrity, prove accuracy and capture the attention of new and current consumers alike.