Facebook
TwitterThe BBC News Article Dataset is a collection of 2,225 news articles published by BBC News, spanning various categories including Sport, Business, Politics, Tech, and Entertainment. Each article is accompanied by its corresponding category label, allowing for easy classification and analysis of the news content.
Dataset Columns:
2.Text: The actual textual content of the news article, providing detailed information on the topic covered in the article.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The BBC Articles Dataset is a widely used dataset in natural language processing (NLP) and machine learning tasks, particularly for text classification and sentiment analysis. It consists of a collection of news articles from the BBC (British Broadcasting Corporation) covering various topics such as politics, sports, entertainment, technology, and business. Each article is labeled with its respective category, making it an ideal resource for supervised learning tasks where the goal is to classify text into predefined categories.
Components of the BBC Articles Dataset News Articles: The dataset contains hundreds or even thousands of news articles sourced from BBC News. These articles are written in English and cover a broad range of subjects. The articles are typically stored in plain text format, and each one is associated with a specific category or topic.
Categories/Labels: The dataset is often split into distinct categories or labels, which correspond to different topics. For instance, the BBC News dataset might include labels like:
Business Entertainment Politics Sports Technology These labels are crucial for classification models, as they serve as the "target" variable that the model tries to predict based on the textual content of the articles.
Preprocessing: Before using the dataset for training a machine learning model, it often requires some preprocessing. This typically involves cleaning the text by removing punctuation, special characters, and stopwords (commonly used words like "the," "is," etc., which don't add much meaning to the text). The text might also be tokenized (split into individual words or phrases), and some advanced preprocessing techniques like stemming or lemmatization might be applied to reduce words to their base forms.
Training and Testing: The dataset is often divided into a training set and a testing set. The training set is used to train the machine learning model, while the testing set is used to evaluate its performance on unseen data. Some versions of the dataset also include a validation set, which helps in fine-tuning the model's hyperparameters.
Application: Classifying BBC News Articles The BBC Articles Dataset is typically used to build machine learning models that can classify news articles into their respective categories. Here's a step-by-step outline of how this process usually works:
Text Representation: Once the news articles are preprocessed, they need to be converted into a numerical format that a machine learning model can understand. This is often done using techniques like:
Bag of Words (BoW): Represents text as a frequency distribution of words. TF-IDF (Term Frequency-Inverse Document Frequency): Weights words based on how often they appear in a document relative to how often they appear across all documents in the dataset. Word Embeddings: More advanced techniques like Word2Vec or GloVe can be used to represent words in a dense, continuous vector space that captures semantic relationships between words. Choosing a Model: Various machine learning algorithms can be applied to classify BBC news articles:
Naive Bayes: A probabilistic classifier that works well for text classification. Support Vector Machines (SVM): Known for high performance in text classification tasks. Random Forest: A robust ensemble learning method. Deep Learning Models: More advanced models like Recurrent Neural Networks (RNNs) or Transformers can be used to capture complex relationships in text data. Model Training: The chosen model is trained on the preprocessed dataset, learning patterns that associate textual features (words, phrases) with specific categories.
Evaluation: After training, the model is evaluated on the test set to determine its accuracy, precision, recall, and F1-score, which measure how well the model can classify unseen articles.
Deployment: Once a model achieves satisfactory performance, it can be deployed in real-world applications, such as automatically categorizing new articles published on the BBC website.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MAAD dataset represents a comprehensive collection of Arabic news articles that may be employed across a diverse array of Arabic Natural Language Processing (NLP) applications, including but not limited to classification, text generation, summarization, and various other tasks. The dataset was diligently assembled through the application of specifically designed Python scripts that targeted six prominent news platforms: Al Jazeera, BBC Arabic, Youm7, Russia Today, and Al Ummah, in conjunction with regional and local media outlets, ultimately resulting in a total of 602,792 articles. This dataset exhibits a total word count of 29,371,439, with the number of unique words totaling 296,518; the average word length has been determined to be 6.36 characters, while the mean article length is calculated at 736.09 characters. This extensive dataset is categorized into ten distinct classifications: Political, Economic, Cultural, Arts, Sports, Health, Technology, Community, Incidents, and Local. The data fields are categorized into five distinct types: Title, Article, Summary, Category, and Published_ Date. The MAAD dataset is structured into six files, each named after the corresponding news outlets from which the data was sourced; within each directory, text files are provided, containing the number of categories represented in a single file, formatted in txt to accommodate all news articles. This dataset serves as an expansive standard resource designed for utilization within the context of our research endeavors.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Name: BBC Articles Sentiment Analysis Dataset
Source: BBC News
Description: This dataset consists of articles from the BBC News website, containing a diverse range of topics such as business, politics, entertainment, technology, sports, and more. The dataset includes articles from various time periods and categories, along with labels representing the sentiment of the article. The sentiment labels indicate whether the tone of the article is positive, negative, or neutral, making it suitable for sentiment analysis tasks.
Number of Instances: [Specify the number of articles in the dataset, for example, 2,225 articles]
Number of Features: 1. Article Text: The content of the article (string). 2. Sentiment Label: The sentiment classification of the article. The possible labels are: - Positive - Negative - Neutral
Data Fields: - id: Unique identifier for each article. - category: The category or topic of the article (e.g., business, politics, sports). - title: The title of the article. - content: The full text of the article. - sentiment: The sentiment label (positive, negative, or neutral).
Example: | id | category | title | content | sentiment | |----|-----------|---------------------------|-------------------------------------------------------------------------|-----------| | 1 | Business | "Stock Market Surge" | "The stock market has surged to new highs, driven by strong earnings..." | Positive | | 2 | Politics | "Election Results" | "The election results were a mixed bag, with some surprises along the way." | Neutral | | 3 | Sports | "Team Wins Championship" | "The team won the championship after a thrilling final match." | Positive | | 4 | Technology | "New Smartphone Release" | "The new smartphone release has received mixed reactions from users." | Negative |
Preprocessing Notes: - The text has been preprocessed to remove special characters and any HTML tags that might have been included in the original articles. - Tokenization or further text cleaning (e.g., lowercasing, stopword removal) may be necessary depending on the model and method used for sentiment classification.
Use Case: This dataset is ideal for training and evaluating machine learning models for sentiment classification, where the goal is to predict the sentiment (positive, negative, or neutral) based on the article's text.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 48 verified BBC locations in Indonesia with complete contact information, ratings, reviews, and location data.
Facebook
TwitterBangladesh Business Consulting (Bbc) Sarl Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterAs of November 2021, the BBC was the the leading global business news provider linked by African government websites, with *** governments and ministries linking to its online content. Other main international news providers in the ranking were CNN, Reuters, and the New York Times.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 1 verified BBC locations in Sumatera Selatan, Indonesia with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interbank Rate in the United Kingdom remained unchanged at 5.30 percent on Wednesday July 10. This dataset provides - United Kingdom Three Month Interbank Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Link to this report's codebookAbout the AuthorsProf. Jeffrey SachsDirector, SDSN; Project Director of the SDG IndexJeffrey D. Sachs is a world-renowned professor of economics, leader in sustainable development, senior UN advisor, bestselling author, and syndicated columnist whose monthly newspaper columns appear in more than 100 countries. He is the co-recipient of the 2015 Blue Planet Prize, the leading global prize for environmental leadership, and many other international awards and honors. He has twice been named among Time magazine’s 100 most influential world leaders. He was called by the New York Times, “probably the most important economist in the world,” and by Time magazine, “the world’s best known economist.” A survey by The Economist in 2011 ranked Professor Sachs as amongst the world’s three most influential living economists of the first decade of the 21st century.Professor Sachs serves as the Director of the Center for Sustainable Development at Columbia University. He is University Professor at Columbia University, the university’s highest academic rank. During 2002 to 2016 he served as the Director of the Earth Institute. Sachs is Special Advisor to United Nations Secretary-General António Guterres on the Sustainable Development Goals, and previously advised UN Secretary-General Ban Ki-moon on both the Sustainable Development Goals and Millennium Development Goals and UN Secretary-General Kofi Annan on the Millennium Development Goals.Guillaume LafortuneDirector, SDSN Paris; Scientific Co-Director of the SDG IndexGuillaume Lafortune took up his duties as Director of SDSN Paris in January 2021. He joined SDSN in 2017 to coordinate the production of the Sustainable Development Report and other projects on SDG data and statistics.Previously, he has served as an economist at the Organisation for Economic Co-operation and Development (OECD) working on public governance reforms and statistics. He was one of the lead advisors for the production of the 2015 and 2017 flagship statistical report Government at a Glance. He also contributed to analytical work related to public sector efficiency, open government data and citizens’ satisfaction with public services. Earlier, Guillaume worked as an economist at the Ministry of Economic Development in the Government of Quebec (Canada). Guillaume holds a M.Sc in public administration from the National School of Public Administration (ENAP) in Montreal and a B.Sc in international economics from the University of Montreal.Contact: EmailProf. Christian KrollProf. of Sustainability, IU International University of Applied Sciences; Senior Advisor, SDSN; Scientific Co-Director of the SDG IndexChristian Kroll is Professor of Sustainability. He created the prototype SDG Index as the world’s first measurement tool of the SDGs in the September 2015 publication “Sustainable Development Goals: Are the rich countries ready” with a foreword by Kofi Annan. Christian was honoured as a Young Global Leader by the World Economic Forum in 2018 for his achievements. He authored articles in scientific journals spanning several disciplines. He lectures as full professor on sustainable development, sustainable finance (ESG), circular economy, and CSR at IU International University of Applied Sciences, and previously taught classes at the London School of Economics and Political Science, Hertie School of Governance in Berlin, and held positions at Jacobs University Bremen and Bertelsmann Stiftung. Christian gained a PhD from the London School of Economics and Political Science with a thesis entitled “Towards a Sociology of Happiness”. His research has featured in national and international media such as BBC World News, Harvard Business Review, Washington Post, Le Monde, Die Zeit, ARD, Spiegel Online among others.Contact: Email / Twitter / WebGrayson FullerSenior Analyst, SDG Index, SDSNGrayson Fuller is the Senior Analyst at SDSN. His role consists of managing the data, coding, and statistical analyses for the SDG Index and Dashboards report. He additionally carries out research related to sustainable development. Grayson received his Masters degree in Economic Development at Sciences Po Paris. He holds a Bachelors in Latin American Studies from Harvard University, where he graduated cum laude. Grayson has lived in several Latin American countries and speaks English, Spanish, French, Portuguese, and Russian. He enjoys playing violin and hails from Atlanta, GA.Contact: EmailFinn WoelmCoordinator for Data Science and Research, SDSNFinn Woelm coordinates data science and research projects at the SDSN. He focuses on statistical analyses, data visualization, and web development. Prior to joining the SDSN, Finn co-founded a startup and worked for a number of organizations, including the International Panel on Social Progress. He holds a Bachelor of Arts in Interdisciplinary Studies from Naropa University in Boulder, Colorado, USA. Finn is passionate about open source, collaborative governance, and the environment.About the PublishersSustainable Development Solutions Network (SDSN)The Sustainable Development Solutions Network (SDSN) has been operating since 2012 under the auspices of the UN Secretary-General. SDSN mobilizes global scientific and technological expertise to promote practical solutions for sustainable development, including the implementation of the Sustainable Development Goals (SDGs) and the Paris Climate Agreement.Bertelsmann StiftungThe Bertelsmann Stiftung is one of the largest foundations in Germany. It works to promote social inclusion and is committed to advancing this goal through programs that improve education, shape democracy, advance society, promote health, vitalize culture and strengthen economies. The Bertelsmann Stiftung is a non-partisan, private operating foundation.Cambridge University PressCambridge University Press dates from 1534 and is part of the University of Cambridge. Its mission is to unlock people’s potential with the best learning and research solutions. Its vision is a world of learning and research inspired by Cambridge. Playing a leading role in today’s global market place, Cambridge University Press has over 50 offices around the globe, and distributes products to nearly every country in the world.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains tweets from the Twitter accounts of BBC, CNN and the Economist from 2010-2021.
Scraped tweets using twint, an advanced Twitter scraping tool that allows us to scrape tweets from Twitter profiles without using Twitter's API. The documentation can be found here.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's News Events Data for the United Kingdom: A Comprehensive Overview
Techsalerator's News Events Data for the United Kingdom provides a robust resource for businesses, researchers, and media organizations. This dataset aggregates information on major news events across the UK from various media sources, including news outlets, online publications, and social platforms. It offers valuable insights for those looking to track trends, analyze public sentiment, or monitor industry-specific developments.
Key Data Fields - Event Date: Records the exact date of the news event. Essential for analysts tracking trends over time or businesses reacting to market changes. - Event Title: A concise headline summarizing the event. Allows users to quickly categorize and evaluate news content based on relevance. - Source: Indicates the news outlet or platform reporting the event. Helps users gauge credibility and assess the event's reach and influence. - Location: Provides geographic details about where the event occurred within the UK. Useful for regional analysis or localized marketing strategies. - Event Description: Offers a detailed summary of the event, including key developments, participants, and potential impact. Important for understanding the context and implications.
Top 5 News Categories in the United Kingdom - Politics: Covers major news on government decisions, political movements, elections, and policy changes affecting the national landscape. - Economy: Focuses on economic indicators, inflation rates, international trade, and corporate activities impacting business and finance sectors. - Social Issues: Includes news on protests, public health, education, and other societal concerns driving public discourse. - Sports: Highlights events in football, cricket, and other popular sports, often generating widespread attention and engagement. - Technology and Innovation: Reports on tech developments, startups, and innovations in the UK’s tech sector, featuring emerging companies and advancements.
Top 5 News Sources in the United Kingdom - BBC News: A leading news outlet known for its comprehensive coverage of national and international news, including politics, economy, and social issues. - The Guardian: Provides in-depth reporting on a wide range of topics, including politics, culture, and current affairs. - Sky News: Offers breaking news updates and live coverage on major events across the UK and globally. - The Times: A well-established newspaper delivering detailed reports on politics, business, and social issues. - The Telegraph: Features extensive coverage of news, politics, and lifestyle topics, known for its analysis and commentary.
Accessing Techsalerator’s News Events Data for the United Kingdom To access Techsalerator’s News Events Data for the United Kingdom, please contact info@techsalerator.com with your specific needs. We will provide a customized quote based on the data fields and records you require, with delivery available within 24 hours. Ongoing access options can also be discussed.
Included Data Fields - Event Date - Event Title - Source - Location - Event Description - Event Category (Politics, Economy, Sports, etc.) - Participants (if applicable) - Event Impact (Social, Economic, etc.)
Techsalerator’s dataset is an invaluable tool for tracking significant events in the United Kingdom. It supports informed decision-making, whether for business strategy, market analysis, or academic research, providing a clear view of the country’s news landscape.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe BBC News Article Dataset is a collection of 2,225 news articles published by BBC News, spanning various categories including Sport, Business, Politics, Tech, and Entertainment. Each article is accompanied by its corresponding category label, allowing for easy classification and analysis of the news content.
Dataset Columns:
2.Text: The actual textual content of the news article, providing detailed information on the topic covered in the article.