64 datasets found
  1. Hong Kong's most visited websites 2024

    • statista.com
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Hong Kong's most visited websites 2024 [Dataset]. https://www.statista.com/statistics/1054071/hong-kong-most-popular-websites/
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 1, 2024 - Nov 30, 2024
    Area covered
    Hong Kong
    Description

    Between September and November 2024, google.com was the most visited website in Hong Kong with 338 million average monthly visits. In terms of monthly traffic and pages per visit, international news website Yahoo.com ranked higher than the local news website hk01.com.

  2. Google Analytics & Twitter dataset from a movies, TV series and videogames...

    • figshare.com
    • portalcientificovalencia.univeuropea.com
    txt
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Yeste (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. http://doi.org/10.6084/m9.figshare.16553061.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Víctor Yeste
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio

  3. Buzzfeednews.com average visit length per user worldwide 2022-2024

    • statista.com
    Updated Feb 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Buzzfeednews.com average visit length per user worldwide 2022-2024 [Dataset]. https://www.statista.com/statistics/1477780/buzzfeednews-com-time-spent-per-visit/
    Explore at:
    Dataset updated
    Feb 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2022 - Jan 2024
    Area covered
    World
    Description

    In the period between its release in November 2022 and January 2024, Buzzfeednews.com saw the average duration of global visits to its web domain swing sensibly. Even in spite of the website's news division shutting down in April 2023, visitors worldwide spent *** seconds on average in the platform's domain in the last examined month, equating to ** minutes and ** seconds. The peak of the news website session length happened in November 2023, when users worldwide spent an average of *** seconds on the web page.

  4. MIT AI news dataset

    • kaggle.com
    zip
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Fawzi (2025). MIT AI news dataset [Dataset]. https://www.kaggle.com/datasets/losif01/mit-ai-news-dataset
    Explore at:
    zip(808350 bytes)Available download formats
    Dataset updated
    Aug 21, 2025
    Authors
    Yousef Fawzi
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    📄 Dataset Description

    This dataset contains articles scraped from the Massachusetts Institute of Technology (MIT) News website, specifically focusing on topics related to Artificial Intelligence, Machine Learning, Robotics, and Emerging Technologies.

    The data was collected from the MIT News topic page:
    👉 https://news.mit.edu/topic/artificial-intelligence2

    Each entry includes: - Title of the article - Author(s) - Publication date - Summary (dek) - Full article body text - URL to the original article - Link to related research paper (e.g., Nature, Science) when available

    The dataset spans multiple research domains, including: - AI for drug discovery & healthcare - Protein language models - Sustainable AI and eco-driving - Robotics and embodied intelligence - Chemistry and materials science - Climate and clean energy

    This dataset is ideal for: - Natural Language Processing (NLP) tasks (summarization, topic modeling, sentiment analysis) - Trend analysis in AI and scientific research - Text classification and information retrieval - Educational projects and AI literacy - Knowledge graph construction of AI research

    ⚠️ Important Notes

    • All content is copyright of MIT News and is shared under non-commercial, educational use only.
    • This dataset was collected respectfully, with delays between requests, in accordance with MIT’s robots.txt and ethical web scraping practices.
    • The full text of articles is included to enable research, but users are encouraged to cite original sources and visit the MIT News website for the latest updates.

    📁 Columns

    ColumnDescription
    titleArticle headline
    authorAuthor(s) of the article
    publication_dateHuman-readable publication date
    datetimeISO-formatted publication timestamp
    summaryArticle summary (lead paragraph)
    bodyFull article text
    paper_linkURL to the related research paper (e.g., Nature)
    urlDirect link to the MIT News article

    🔗 Source

    🙌 Inspiration

    Use this dataset to: - Track how AI is being applied across scientific disciplines - Build a news aggregator for AI research - Train a model to predict research trends - Create a search engine for MIT’s AI breakthroughs

    License

    This dataset is shared under Kaggle’s Terms of Service for non-commercial, educational, and research purposes.
    The original content remains the property of MIT News and should be properly attributed.

  5. r

    News articles and front pages from 19 Swedish news sites during the...

    • researchdata.se
    Updated Nov 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter M. Dahlgren (2021). News articles and front pages from 19 Swedish news sites during the covid-19/corona pandemic 2020–2021 [Dataset]. http://doi.org/10.5878/d18f-q220
    Explore at:
    (477962370), (255819)Available download formats
    Dataset updated
    Nov 2, 2021
    Dataset provided by
    University of Gothenburg
    Authors
    Peter M. Dahlgren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2021 - Apr 26, 2021
    Area covered
    Sweden
    Description

    This dataset contains news articles from Swedish news sites during the covid-19 corona pandemic 2020–2021. The purpose was to develop and test new methods for collection and analyses of large news corpora by computational means. In total, there are 677,151 articles collected from 19 news sites during 2020-01-01 to 2021-04-26. The articles were collected by scraping all links on the homepages and main sections of each site every two hours, day and night.

    The dataset also includes about 45 million timestamps at which the articles were present on the front pages (homepages and main sections of each news site, such as domestic news, sports, editorials, etc.). This allows for detailed analysis of what articles any reader likely was exposed to when visiting a news site. The time resolution is (as stated previously) two hours, meaning that you can detect changes in which articles were on the front pages every two hours.

    The 19 news sites are aftonbladet.se, arbetet.se, da.se, di.se, dn.se, etc.se, expressen.se, feministisktperspektiv.se, friatider.se, gp.se, nyatider.se, nyheteridag.se, samnytt.se, samtiden.nu, svd.se, sverigesradio.se, svt.se, sydsvenskan.se and vlt.se.

    Due to copyright, the full text is not available but instead transformed into a document-term matrix (in long format) which contains the frequency of all words for each article (in total, 80 million words). Each article also includes extensive metadata that was extracted from the articles themselves (URL, document title, article heading, author, publish date, edit date, language, section, tags, category) and metadata that was inferred by simple heuristic algorithms (page type, article genre, paywall).

    The dataset consists of the following: article_metadata.csv (53 MB): The file contains information about each news article, one article per row. In total, there are 677,151 observations and 17 variables.

    article_text.csv (236 MB): The file contains the id of each news article and how many times (count) a specific word occurs in the news article. The file contains 80,090,784 observations and 3 variables in long format.

    frontpage_timestamps.csv (175 MB): The file contains when each news article was found on the front page (homepage and main sections) of the news sites. The file contains 45,337,740 observations and 4 variables in long format.

    More information about the content in the files is found in the README-file. In it you will also find the R-script for using the data.

  6. Most popular online news properties in Colombia 2022, by average views per...

    • statista.com
    Updated Jun 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Most popular online news properties in Colombia 2022, by average views per visitor [Dataset]. https://www.statista.com/statistics/1251581/online-news-sites-views-per-visitor-colombia/
    Explore at:
    Dataset updated
    Jun 15, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2022
    Area covered
    Colombia
    Description

    In May 2022, Eltiempo.com had an average of 11 views per visitor, the highest figure among Colombia's news and information-oriented online properties with the highest number of unique users. Semana.com and Pulzo.com followed, each with an average of seven views per visitor. El Tiempo and Pulso were also among Colombia's most popular online news brands in 2022.

  7. h

    bbc-news

    • huggingface.co
    • opendatalab.com
    Updated Jun 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SetFit (2022). bbc-news [Dataset]. https://huggingface.co/datasets/SetFit/bbc-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2022
    Dataset authored and provided by
    SetFit
    Description

    BBC News Topic Dataset

    Dataset on BBC News Topic Classification consisting of 2,225 articles published on the BBC News website corresponding during 2004-2005. Each article is labeled under one of 5 categories: business, entertainment, politics, sport or tech. Original source for this dataset:

    Derek Greene, Pádraig Cunningham, “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering,” in Proc. 23rd International Conference on Machine learning (ICML’06)… See the full description on the dataset page: https://huggingface.co/datasets/SetFit/bbc-news.

  8. CBS News/New York Times National Surveys, 1982

    • icpsr.umich.edu
    ascii, sas, spss
    Updated Jan 12, 2006
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-university Consortium for Political and Social Research [distributor] (2006). CBS News/New York Times National Surveys, 1982 [Dataset]. http://doi.org/10.3886/ICPSR09053.v1
    Explore at:
    spss, ascii, sasAvailable download formats
    Dataset updated
    Jan 12, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/9053/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/9053/terms

    Time period covered
    1982
    Area covered
    United States
    Description

    This data collection is part of a continuing series of surveys that solicit public opinion on the presidency and on a range of other political and social issues. Respondents were asked to give their opinions of President Ronald Reagan and his handling of the presidency, foreign policy, and the economy, as well as their views on the Israeli-Lebanese conflict, El Salvador, and the Equal Rights Amendment. These national surveys were administered by telephone to one eligible respondent per household. The data are contained in seven files. Part 1, January 1982, includes data about the Reagan presidency and standard CBS demographic or background variables. Part 2, March 1982, contains questions on El Salvador and the policies of the Reagan Administration. Part 3, May 1982, contains questions on the nuclear freeze movement. Part 4, June 1982 (Part 1), contains a small set of background variables, and several questions about the Israeli-Lebanese conflict and Alexander Haig's resignation as Secretary of State. Part 5, June 1982 (All), contains data about the Equal Rights Amendment and women's movement. Part 6, September 1982, and Part 7, October 1982, are pre-election surveys and they include a number of questions relating to the forthcoming congressional elections, evaluation of the Reagan Administration's policies, the political parties, the impact of various issues on the elections, and the respondent's past voting behavior as well as current voting intentions. Information on demographic characteristics, such as age, sex, race, religion, income, and education, is available for each respondent.

  9. German news headlines (politics and economics)

    • kaggle.com
    zip
    Updated Jan 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MatthiasS (2022). German news headlines (politics and economics) [Dataset]. https://www.kaggle.com/datasets/matthiasse/german-news-headlines-politics-and-economics/discussion
    Explore at:
    zip(344492 bytes)Available download formats
    Dataset updated
    Jan 7, 2022
    Authors
    MatthiasS
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    The project was started as a web-scraping exercise to get more experience particularly with the scrapy framework. Since I daily check news from several sources I decided to get a webscraper do the work for me and look for the interesting headlines from politics and economics. The news sources have been anonymised and the licence limited to non-commercial use since this is the prerequisite to scrape the data from those homepages.

    Content

    In the csv file you find around 8400 records of news headlines from 7 different sources. For each record a teaser (or sub-headline) and a headline is provided.

    Acknowledgements

    My thanks go to Upendra who has a great Youtube channel on webscraping (https://www.youtube.com/user/eupendras).

    Inspiration

    All data enthusiasts are highly welcome to use the data and make something out of it. I will try and practise topic modelling as well as translation tasks with transformer models. Any inspiration for this or comments on my notebooks (which I will publish shortly) are highly appreciated!

  10. Market News Price Dataset

    • fisheries.noaa.gov
    • datasets.ai
    • +1more
    Updated Aug 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Northeast Fisheries Science Center (NEFSC) (2022). Market News Price Dataset [Dataset]. https://www.fisheries.noaa.gov/inport/item/26732
    Explore at:
    Dataset updated
    Aug 9, 2022
    Dataset provided by
    Northeast Fisheries Science Center
    Authors
    Northeast Fisheries Science Center (NEFSC)
    Time period covered
    Jul 1, 2012 - Nov 22, 2125
    Area covered
    New York, New England, Gloucester, MA, Portland, ME, New Bedford, MA
    Description

    Real-time price data collected by the Boston Market News Reporter. The NOAA Fisheries' "Fishery Market News" began operations in New York City on February 14, 1938. The primary function of this joint Federal/industry program is to provide accurate and unbiased reports depicting current conditions affecting the trade in fish and fishery products. The Boston and New York Market News Reports are...

  11. multinews_dense_oracle

    • huggingface.co
    Updated Feb 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). multinews_dense_oracle [Dataset]. https://huggingface.co/datasets/allenai/multinews_dense_oracle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This is a copy of the Multi-News dataset, except the input source documents of the train, validation, and test splits have been replaced by a dense retriever. The retrieval pipeline used:

    query: The summary field of each example corpus: The union of all documents in the train, validation and test splits retriever: facebook/contriever-msmarco via PyTerrier with default settings top-k strategy: "oracle", i.e. the number of documents retrieved, k, is set as the original number of input documents… See the full description on the dataset page: https://huggingface.co/datasets/allenai/multinews_dense_oracle.

  12. h

    ag_news_test

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SZ, ag_news_test [Dataset]. https://huggingface.co/datasets/szhuggingface/ag_news_test
    Explore at:
    Authors
    SZ
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Stratified and Nested Subsets of AG News for Performance Benchmarking

      Dataset Summary
    

    This repository contains stratified and progressively smaller, nested subsets of the AG News dataset. It was specifically created to benchmark the performance (e.g., accuracy, training time, and resource usage) of fine-tuning language models on varying amounts of training data. By using stratified samples, each training subset maintains the original class distribution of the AG News… See the full description on the dataset page: https://huggingface.co/datasets/szhuggingface/ag_news_test.

  13. h

    news-categories

    • huggingface.co
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Momentum AI (2025). news-categories [Dataset]. https://huggingface.co/datasets/momentum-lab/news-categories
    Explore at:
    Dataset updated
    Oct 1, 2025
    Authors
    Momentum AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    English News Headline Dataset

      Overview
    

    This dataset contains 50,000 English news headlines categorized into 10 topical classes, designed for text classification and NLP studies such as news topic modeling, transfer learning, and zero‑shot evaluation. Each record includes:

    title: news headline text
    topic: one of ten predefined categories
    genre: one of four predefined descriptor of the story style (e.g., Informational, Analysis)
    source: media outlet name
    date:… See the full description on the dataset page: https://huggingface.co/datasets/momentum-lab/news-categories.

  14. ABC News Panama Poll #1, December 1989

    • icpsr.umich.edu
    ascii, sas, spss +1
    Updated Jul 3, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABC News (2007). ABC News Panama Poll #1, December 1989 [Dataset]. http://doi.org/10.3886/ICPSR09433.v1
    Explore at:
    spss, stata, sas, asciiAvailable download formats
    Dataset updated
    Jul 3, 2007
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    ABC News
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/9433/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/9433/terms

    Time period covered
    Dec 20, 1989
    Area covered
    United States
    Description

    This survey focused on the United States military action in Panama. Respondents were asked if they approved of the way President George Bush was handling the situation, if they approved of the United States' having sent military forces to overthrow Manuel Noriega, if they would still approve if the action resulted in a large number of Panamanian civilian casualties, if the reasons Bush had given for invading Panama were good enough to warrant the action, and if sending military forces into Panama to overthrow Noriega was legal under United States law. Other topics covered include comparisons to Viet Nam, using similar military action in Nicaragua, the level of danger to Americans in Panama, Bush's trip to Colombia to discuss the drug problem, and if the action affected the respondents' feelings of pride in the United States. Background information on respondents includes political alignment, age, sex, and state/region of residence.

  15. g

    Ten Thousand German News Articles Dataset

    • tblock.github.io
    • kaggle.com
    csv
    Updated Mar 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. Block (2019). Ten Thousand German News Articles Dataset [Dataset]. https://tblock.github.io/10kGNAD/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 5, 2019
    Authors
    T. Block
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    10kGNAD - A german topic classification dataset. Visit the dataset page for more information: https://tblock.github.io/10kGNAD/

  16. News Articles

    • kaggle.com
    zip
    Updated May 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    harishaaram (2018). News Articles [Dataset]. https://www.kaggle.com/harishcscode/all-news-articles-from-home-page-media-house
    Explore at:
    zip(327948548 bytes)Available download formats
    Dataset updated
    May 6, 2018
    Authors
    harishaaram
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Data is collected from various media houses home page to see which News media shares/writes articles with less gory words.

    Content

    Datasource is obtained from these websites which are downloaded from a time period of Oct 2017 to Nov 2017:

    1. "http://www.nytimes.com/"
    2. "http://www.foxnews.com/"
    3. "http://www.reuters.com/"
    4. "http://www.cnn.com/"
    5. "http://www.huffingtonpost.com/"
    

    Each folder is named in the mmddyyyy convention. And Each CSV file has the media house name as the file name(eg: reuters.csv). The CSV has the following columns:

    • TITLE: the Title of the article.
    • SUMMARY: first few lines of the article's text.
    • TEXT: Full text inside the article
    • URL: web link to the article.
    • KEYWORDS: important words in the article.

    Acknowledgements

    This dataset is under CC0: public domain license.

    Inspiration

    All around the world both good and bad happens, and we get to know only those that are exposed to us. And, that’s the primary responsibility of the media. But the bigger responsibility of these media houses is the way in which they express the content to the people.

    A responsible media house’s content should be original, unbiased, free of exaggeration and should be very sensitive in handling the emotions of it’s readers and viewers. A same story could be told in different ways and these different ways could definitely trigger different emotions among it’s readers.

    It is known that we become who we are by what we say and what we read. Reading a story that’s filled with positive words would make us feel more positive and vice versa. So the wordings of a content definitely plays an equal role as that of the content itself.

    This dataset stands as sample to find out which media house conveys the NEWS in more optimistic way!!!

  17. AllSides : Ratings of bias in electronic media

    • kaggle.com
    zip
    Updated Sep 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Supratim Haldar (2021). AllSides : Ratings of bias in electronic media [Dataset]. https://www.kaggle.com/datasets/supratimhaldar/allsides-ratings-of-bias-in-electronic-media
    Explore at:
    zip(32548 bytes)Available download formats
    Dataset updated
    Sep 23, 2021
    Authors
    Supratim Haldar
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    Media is the 4th pillar of democracy, so they must execute their duty with rightfulness. While majority does so, very often news articles gets contaminated with personal perspectives of journalists authoring those articles, or the beliefs of people running those media houses. As per Wikipedia definition - media bias is the bias or perceived bias of journalists and news producers within the mass media in the selection of events and stories that are reported and how they are covered.

    Content

    https://www.allsides.com is doing an wonderful job in analyzing the bias of renowned media houses, and showing how a particular news is presented with complete different perspectives by different media publications. Based on analysis, each media publication is assigned a "bias" direction (left, right or neutral). General public can vote to express their opinion if they agree to this analysis. The details in captured in https://www.allsides.com/media-bias/media-bias-ratings and constantly updated based on new votes. The content of this dataset is scraped from this and subsequent pages.

    Acknowledgements

    https://www.allsides.com is the owner of this data and holds all rights to it. Many thanks to them for their effort!

    Inspiration

    A deeper analysis can reveal which side most of the media houses are leaned towards. The analysis can further be extended by comparing news articles on same event by different media publications, and as a final step to build a classifier to find biasness of any random article on the internet just by reading it. This might help fight the battle against fake news as well.

    Allsides will love to see any work which brings out insightful information from this data. Please feel free to share your work with Allsides (https://www.allsides.com/contact).

    Licenses and Attribution

    AllSides Media Bias Ratings by AllSides.com are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. You may use this data for research or noncommercial purposes provided you include this attribution.

    For commercial use, or to request this data as a CSV or JSON file, go to www.allsides.com/contact.

  18. Top 100 YouTube Channels - News & Politics Category

    • vidiq.com
    Updated May 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vidIQ (2023). Top 100 YouTube Channels - News & Politics Category [Dataset]. https://vidiq.com/youtube-stats/top/category/news/
    Explore at:
    Dataset updated
    May 8, 2023
    Dataset authored and provided by
    vidIQ
    Time period covered
    Dec 2, 2025
    Area covered
    YouTube, Worldwide
    Variables measured
    rank, subscribers, total views, video count
    Description

    Comprehensive ranking dataset of the top 100 YouTube channels in the News & Politics category. This dataset features 100 channels with detailed statistics including subscriber counts, total video views, video count, and global rankings. The leading channel has 74,400,000 subscribers and 42,602,103,612 total views. Each entry includes comprehensive metrics to analyze channel performance, growth trends, and competitive positioning. This dataset is regularly updated to reflect the latest YouTube channel statistics and ranking changes, providing valuable insights for content creators, marketers, and researchers analyzing YouTube ecosystem trends and channel performance benchmarks.

  19. Leading websites worldwide 2025, by monthly visits

    • statista.com
    • boostndoto.org
    Updated Oct 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading websites worldwide 2025, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Aug 2025
    Area covered
    Worldwide
    Description

    In August 2025, Google.com was the most visited website worldwide, with an average of 98.2 billion monthly visits. The platform has maintained its leading position since June 2010, when it surpassed Yahoo to take first place. YouTube ranked second during the same period, recording over 48 billion monthly visits. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

  20. multinews_sparse_max

    • huggingface.co
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2023). multinews_sparse_max [Dataset]. https://huggingface.co/datasets/allenai/multinews_sparse_max
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2023
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This is a copy of the Multi-News dataset, except the input source documents of its test split have been replaced by a sparse retriever. The retrieval pipeline used:

    query: The summary field of each example corpus: The union of all documents in the train, validation and test splits retriever: BM25 via PyTerrier with default settings top-k strategy: "max", i.e. the number of documents retrieved, k, is set as the maximum number of documents seen across examples in this dataset, in this case… See the full description on the dataset page: https://huggingface.co/datasets/allenai/multinews_sparse_max.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Hong Kong's most visited websites 2024 [Dataset]. https://www.statista.com/statistics/1054071/hong-kong-most-popular-websites/
Organization logo

Hong Kong's most visited websites 2024

Explore at:
Dataset updated
Feb 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 1, 2024 - Nov 30, 2024
Area covered
Hong Kong
Description

Between September and November 2024, google.com was the most visited website in Hong Kong with 338 million average monthly visits. In terms of monthly traffic and pages per visit, international news website Yahoo.com ranked higher than the local news website hk01.com.

Search
Clear search
Close search
Google apps
Main menu