Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for CNN Dailymail Dataset
Dataset Summary
The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.
Supported Tasks and Leaderboards
'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.
The dataset consists of 348 online news stories from two UK online newspapers (Daily Mail and Evening Standard) during four specific periods of time [Period 1: May 7th – May 21th 2014 / Period 2: September 16th – September 30th 2014 / Period 3: May 7th – May 21th 2015 / Period 4: September 16th – September 30th 2015]. The aim of the generated dataset is to identify within the UK media products selected the presence or absence of EU values and identities, for example regarding euro-skeptical and/or pro EU arguments, and their relation to young people. The dataset builds on existing media discourse analyses of European issues and young people in the UK media by highlighting contemporary media-based conceptualizations of Europe and young people from both a news and cultural perspective, in the context of a political and social climate where European identity is deeply contested, and where the global migration effects of regional conflicts have also challenged understandings of and claims to European citizenship.
The national news brand with the highest reported drop circulation in the United Kingdom was Sunday People, with a circulation decrease of 19 percent. The leading national paper by circulation, the Daily Mail, saw a drop of 10 percent. Online newspaper readership The newspaper industry is having to make major readjustments to accommodate digital readership. Physical paper sales have been falling as consumers increasingly turn to online news outlets. Data on the monthly reach of national newspapers revealed that smartphones are the most popular platform for newspaper consumption across all major news brands, though free paper Metro had the highest print reach thanks to its regular distribution across London. Journalists Despite the fall in newspaper circulation, the number of journalists in the UK is climbing. In 2020, over 100 thousand journalists and newspaper and periodical editors were employed in the United Kingdom, the highest number recorded since 2010.
The newspaper with the highest print circulation in the United States in the six months running to September 2023 was The Wall Street Journal, with an average weekday print circulation of 555.2 thousand. Ranking second was The New York Times, followed by The New York Post. The paper in the ranking with the highest year-over-year drop in circulation was The Denver Post with a decline of 25 percent (although Buffalo News recorded a higher drop, data does not refer to September 2022 to September 2023, see notes).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for CNN Dailymail Dataset
Dataset Summary
The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.
Supported Tasks and Leaderboards
'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.