Facebook
TwitterThis dataset contains metadata of millions of news articles from Google News, including title, publisher, DateTime, link, and category.
This is also an automation project in which data is scraped every day at 4am UTC on 8 major categories. This dataset is expected to have a monthly update, thus the data collected daily will be merged into a single monthly csv file and published on Kaggle at the end of each month. One may expect the value of the dataset to continuously grow through time.
If you find this dataset useful, feel free to drop a like. If you have any requests/suggestions/inquires, feel free to leave it in the comment sections as well.
As mentioned, each monthly csv file mainly contain 5 columns
1. Title: The title of the news article
2. Publisher: The publisher of the news article
3. DateTime: The DateTime of when the news article is published on Google News
4. Link: A link that will direct users to the corresponding article, one may feel free to dig deeper and scrape extended content by following the links
5. Category: 8 major categories defined by Google News, particularly Business, Entertainment, Headlines, Health, Science, Sports, Technology and WorldWide.
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Google News technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterA total of 20 percent of U.S. adults responding to a survey in February 2022 said that they thought Google News was very credible, and eight percent found the source to be not all credible. Google News' credibility rating was higher among Black and Hispanic respondents than their white counterparts, and Gen Z and millennials were also more likely to consider Google News a very credible source of information than their older peers.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Didier Salazar
Released under Apache 2.0
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Simple Google News De technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Dataset consist of Title with it's snippet and publisher name and also the timestamp at which it was being posted. It Had been categorised in 7 columns i.e ['Buisness', 'entertainment', 'world', 'health', 'sport', 'science', 'technology']. Have fun with it!
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Google News Automatic Widget technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterThis dataset provides comprehensive access to news articles and headlines from Google News in real-time. Get top news globally or by specific topics, with support for geographic targeting and custom search queries. Perfect for applications requiring news monitoring, media analysis, and content aggregation. The dataset is delivered in a JSON format via REST API.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a news of sport news titles from Google News. The dataset is updated daily. It has 3 values
1. Headline - That is the headline of the news
2. Sport - The sport in question
3. Date - The day the news was scraped.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the set of records extracted from the main pages of some version of Google News (Brazil, Colombia, Mexico, Portugal, Spain). The data were extracted using a web scraping computational solution. The acquired data were integrated into a structured database. Google News versions: Brazil, Colombia, Mexico, Portugal, Spain
Facebook
TwitterThis repository hosts the word2vec pre-trained Google News corpus (3 billion running words) word vector model (3 million 300-dimension English word vectors).
Facebook
TwitterCorrelation between scientific production (as captured by Google Scholar and PubMed), news coverage (as captured by Google News), web queries (as captured by Google Trends), access to Wikipedia page and Internet activities (as captured by Twitter and YouTube).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Ahmad Khaled
Released under MIT
Facebook
TwitterThe Google News app was downloaded more than ****** times in Japan in December 2023. The total number of downloads during that year reached more than *******. The news aggregation app was released by Google LLC in 2012.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Results of scraping Google News search results for "JPY" (2017-2022).
Markets
jpy,news,google news,google
1233
$1700.00
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*[IQR], interquartile rangeGeneral characteristics of health news and scientific articles.
Facebook
TwitterOpera news ranked the top leading news and magazine mobile app in the Google Play Store in Germany as of December 2021, amounting to around 99.1 thousand downloads. Additionally, following that was ZDFheute - Nachrichten with 86.9 thousand.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Attributes:
This dataset comprises three attributes: the first corresponds to Headlines 1, the second to Headlines 2, and the third to the target variable. Both sentences are associated with news extracted from Google News, while the target variable indicates whether both sentences are related to the same event (1) or not (0).
Data Source:
The dataset is derived from Google News headlines between July 23, 2022, and July 30, 2022, which were manually annotated.… See the full description on the dataset page: https://huggingface.co/datasets/cmunhozc/google_news_en.
Facebook
TwitterStereotype Content Dictionary: A semantic space of 3 million words and phrases using Google News word2vec embeddings
Facebook
TwitterThis dataset contains metadata of millions of news articles from Google News, including title, publisher, DateTime, link, and category.
This is also an automation project in which data is scraped every day at 4am UTC on 8 major categories. This dataset is expected to have a monthly update, thus the data collected daily will be merged into a single monthly csv file and published on Kaggle at the end of each month. One may expect the value of the dataset to continuously grow through time.
If you find this dataset useful, feel free to drop a like. If you have any requests/suggestions/inquires, feel free to leave it in the comment sections as well.
As mentioned, each monthly csv file mainly contain 5 columns
1. Title: The title of the news article
2. Publisher: The publisher of the news article
3. DateTime: The DateTime of when the news article is published on Google News
4. Link: A link that will direct users to the corresponding article, one may feel free to dig deeper and scrape extended content by following the links
5. Category: 8 major categories defined by Google News, particularly Business, Entertainment, Headlines, Health, Science, Sports, Technology and WorldWide.