100+ datasets found

Reuters-21578 (Text Categorization)
kaggle.com
zip
Updated Dec 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reuters-21578 (Text Categorization) [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-financial-insights-with-the-reuters-2
Explore at:
zip(18703298 bytes)Available download formats
Dataset updated
Dec 2, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reuters-21578 (Text Categorization)

Ruters financial newswire service in 1987

By Huggingface Hub [source]

About this dataset

The Reuters-21578 dataset, one of the most influential and widely used collections of newswire articles from the Reuters financial newswire service, is an essential benchmark for text categorization research. This extensive repository provides a range of valuable insight into topics frequently covered by financial publications and is available in multiple splits for optimal machine learning exploration.

Within this dataset, users will find columns with detailed information such as text (the full body of article text), text_type (classifying whether the article was part of the training or test set), topics (what topics are associated with the particular document), lewis_split (which split it belongs to) , cgis_split (split between train and test set given by core group iteration sampling method), places/people/orgs/exchanges mentioned within it, date and title. In addition to these classifications, there are separate files containing Reuters-21578 articles that were not used in specific splits (ModApte_unused.csv & ModLewis_unused.csv). By leveraging this dataset, you can unlock deep understanding into financial news categorization from an abundance of data points across categories - enabling you to build high performing models that provide better accuracy than ever before!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The Reuters-21578 dataset is a great resource for uncovering valuable insights in financial news. With its wide range of topics and data splits, it is well-suited to be used as a benchmark dataset for text categorization research. Here are some tips and tricks on how to get the most out of this dataset:

Familiarize yourself with the columns: Before getting started, make sure to familiarize yourself with all of the columns included in the dataset. This includes understanding what each column means, as well as identifying which are essential for your research project.

Use an appropriate split: Depending on your research goals, you may need to use different training and test sets from those provided in this dataset (ModHayes_train/test or ModLewis_train/test). You can also create custom splits from the unique ‘ModApte_unused’ set contained within this collection if desired.

Explore other methods: While text categorization is often used with this type of data, you may also want to explore other methods that can help uncover useful information such as topic modelling or sentiment analysis.

Leverage related packages: If you’re using Python or R there are some great packages available specifically designed for working with textual data from Reuters-21578 such as sklearn’s reuters21578 module and klabutils’ reutersR package respectively . Both offer helpful features such as vectorizers that let you transform words into feature vectors when implementing ML models such as Naive Bayes or Random Forest classifiers .

5 Tackle low-level preprocessing tasks : Before getting started with building models using ML algorithms , remember that all input data will benefit greatly from being cleaned up first – particularly in terms of removing invalid characters along side any symbols associated with a language other than English; which could severely affect model accuracy! Additionally , performing minor tasks like stopword removal and stemming words into their root form prior to getting underway could help improve overall performance too!

Research Ideas

Automated text classification - Using the data from the Reuters-21578 dataset, machine learning algorithms can be trained to automatically classify and categorize newswire articles into their appropriate topics. This not only saves time, but also ensures reliable results with minimal human intervention.

Sentiment analysis - By analyzing the sentiment of individual news article in the Reuters-21578 dataset, one could gain valuable insight into how people generally perceive financial news and then use this information to make more informed investing decisions.

Stock market predictions - By applying data mining techniques on the content of news articles in this dataset, correlations between certain topics or exchanges mentioned in an article and their effects on stock prices can be identified and used for algorithmic trading strategies aimed at predicting short term stock price movements accurately

Acknowledgements

If you use this dataset in your research, please credit the orig...
Reuters newswire classification dataset
kaggle.com
zip
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghanshyam Saini (2025). Reuters newswire classification dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/reuters-newswire-classification-dataset
Explore at:
zip(8272593 bytes)Available download formats
Dataset updated
May 15, 2025
Authors
Ghanshyam Saini
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Reuters newswire classification dataset

This dataset contains the Reuters-21578 text categorization collection, a widely used benchmark for text classification tasks. The data consists of news articles from the Reuters newswire in 1987, categorized into various topics. This upload provides the dataset in its raw Standard Generalized Markup Language (SGML) format, allowing users maximum flexibility in parsing and preprocessing the text.

Folder Structure:

The main downloaded folder (reuters21578) contains the following files:

all-exchanges-strings.lc.txt: A text file listing all the exchange-related categories present in the dataset.

all-orgs-strings.lc.txt: A text file listing all the organization-related categories.

all-people-strings.lc.txt: A text file listing all the people-related categories.

all-places-strings.lc.txt: A text file listing all the place-related categories.

all-topics-strings.lc.txt: Crucially, this file lists all the topic categories used for classifying the news articles. This is the primary set of labels for the text classification task.

cat-descriptions_120396.txt: A text file providing descriptions for some of the categories.

feldman-cia-worldfactbook-data.txt: This file appears to contain data related to the CIA World Factbook and might not be directly relevant to the Reuters article classification task.

lewis.dtd: This file is a Document Type Definition (DTD) file, which defines the structure and rules for the SGML files in the dataset. It's essential for correctly parsing the SGML files.

README.txt (within the main folder and potentially within the reuters21578 subfolder): These files contain important information about the dataset, its origin, and usage. Users should definitely read these files to understand the dataset in detail.

reut2-000.sgm to reut2-021.sgm (and potentially more): These are the core files of the dataset. Each .sgm file contains multiple Reuters news articles marked up in SGML format. These files include the article text, metadata, and the assigned topic labels.

Content of the Data:

The primary data for classification resides within the .sgm files. Each .sgm file contains one or more <REUTERS> blocks, representing individual news articles. Within these blocks, you will find:

<TEXT>: Contains the main body of the news article, often including <TITLE> and <BODY> tags.

<TOPICS>: Contains the topic labels assigned to the article, enclosed within <D> tags. An article can have multiple topics.

<DATE>: The date of the news article.

<LEWISSPLIT>, <CGISPLIT>, <OLDID>, <NEWID>: Metadata related to how the dataset has been split in different research contexts.

The all-topics-strings.lc.txt file provides the vocabulary of the topic labels you will be trying to predict.

How to Use This Dataset:

Download the entire folder.

Read the README.txt files to get a comprehensive understanding of the dataset and its conventions.

Utilize an SGML parsing library in Python (e.g., sgmllib, Beautiful Soup with an SGML parser) to process the .sgm files. You will need to understand the lewis.dtd to correctly interpret the SGML structure.

Extract the text content from the <TEXT> tags for each article.

Extract the topic labels from the <TOPICS> and <D> tags. Be aware that an article can have multiple labels.

You will likely need to perform significant data cleaning and preprocessing on the extracted text.

Use the all-topics-strings.lc.txt file to understand the possible output classes for your classification model.

Consider how you want to handle multi-label classification if an article has multiple topics.

Citation:

Please cite the original source of the Reuters-21578 dataset:

David D. Lewis. Reuters-21578 Text Categorization Test Collection. Distribution 1.0, 1991.

Data Contribution:

Thank you for uploading this raw SGML version of the Reuters-21578 dataset. By providing the data in its original format, you offer the Kaggle community the opportunity to work with the data at its most fundamental level, allowing for diverse approaches to parsing, preprocessing, and feature engineering in text classification tasks.

If you find this description helpful and the dataset well-represented, please consider giving it an upvote after downloading. Your feedback is valuable!
reuters-dataset
kaggle.com
zip
Updated Mar 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paladugula Lakshmi Snigdha (2023). reuters-dataset [Dataset]. https://www.kaggle.com/datasets/snigdhapaladugula/reuters-dataset
Explore at:
zip(26560 bytes)Available download formats
Dataset updated
Mar 19, 2023
Authors
Paladugula Lakshmi Snigdha
Description
Dataset

This dataset was created by Paladugula Lakshmi Snigdha

Contents
h
reuters
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ismael, reuters [Dataset]. https://huggingface.co/datasets/IsmaelMousa/reuters
Explore at:
Authors
Ismael
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Reuters News Articles

An open-source dataset designed for information retrieval and natural language processing tasks.

Abstract

This dataset is the processed version of reuters-21578 dataset.

Reuters-21578 text categorization test collection Distribution 1.0 (v 1.2) 26 September 1997 David D. Lewis AT&T Labs - Research lewis@research.att.com

Profile

The dataset was processed as part of our work on the reuters-search-engine project, where it was my primary… See the full description on the dataset page: https://huggingface.co/datasets/IsmaelMousa/reuters.
o
reuters
openml.org
api.openml.org
Updated Feb 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). reuters [Dataset]. https://www.openml.org/d/40594
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2017
Description
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
h
reuters-articles
huggingface.co
Updated Sep 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Susant Achary (2024). reuters-articles [Dataset]. https://huggingface.co/datasets/Susant-Achary/reuters-articles
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Authors
Susant Achary
Description
Susant-Achary/reuters-articles dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Reuters newswire topics in keras
kaggle.com
zip
Updated Aug 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nor (2021). Dataset Reuters newswire topics in keras [Dataset]. https://www.kaggle.com/wordroid/dataset-reuters-newswire-topics-in-keras
Explore at:
zip(2340745 bytes)Available download formats
Dataset updated
Aug 4, 2021
Authors
nor
Description
http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt

Reuters-21578 text categorization test collection Distribution 1.0 README file (v 1.3) 14 May 2004 David D. Lewis David D. Lewis Consulting and Ornarose, Inc. www.daviddlewis.com

I. Introduction

[Note: There's much that could be improved in this document, but given that Reuters-21578 is being superceded by RCV1, I'm not likely to make those improvements myself. Anyone who would like to create a revised version of this document is invited to contact me.]

This README describes Distribution 1.0 of the Reuters-21578 text categorization test collection, a resource for research in information retrieval, machine learning, and other corpus-based research.

II. Copyright & Notification

The copyright for the text of newswire articles and Reuters annotations in the Reuters-21578 collection resides with Reuters Ltd. Reuters Ltd. and Carnegie Group, Inc. have agreed to allow the free distribution of this data for research purposes only.
If you publish results based on this data set, please acknowledge its use, refer to the data set by the name "Reuters-21578, Distribution 1.0", and inform your readers of the current location of the data set (see "Availability & Questions").

III. Availability & Questions

The Reuters-21578, Distribution 1.0 test collection is available from http://www.daviddlewis.com/resources/testcollections/reuters21578

Besides this README file, the collection consists of 22 data files, an SGML DTD file describing the data file format, and six files describing the categories used to index the data. (See Sections VI and VII for more details.) Some additional files, which are not part of the collection but have been contributed by other researchers as useful resources are also included. All files are available uncompressed, and in addition a single gzipped Unix tar archive of the entire distribution is available as reuters21578.tar.gz.

The text categorization mailing list, DDLBETA, is a good place to send questions about this collection and other text categorization issues. You may join the list by writing David Lewis at ddlbeta-request@daviddlewis.com.

IV. History & Acknowledgements

The documents in the Reuters-21578 collection appeared on the Reuters newswire in 1987. The documents were assembled and indexed with categories by personnel from Reuters Ltd. (Sam Dobbins, Mike Topliss, Steve Weinstein) and Carnegie Group, Inc. (Peggy Andersen, Monica Cellio, Phil Hayes, Laura Knecht, Irene Nirenburg) in 1987.

In 1990, the documents were made available by Reuters and CGI for research purposes to the Information Retrieval Laboratory (W. Bruce Croft, Director) of the Computer and Information Science Department at the University of Massachusetts at Amherst. Formatting of the documents and production of associated data files was done in 1990 by David D. Lewis and Stephen Harding at the Information Retrieval Laboratory.

Further formatting and data file production was done in 1991 and 1992 by David D. Lewis and Peter Shoemaker at the Center for Information and Language Studies, University of Chicago. This version of the data was made available for anonymous FTP as "Reuters-22173, Distribution 1.0" in January 1993. From 1993 through 1996, Distribution 1.0 was hosted at a succession of FTP sites maintained by the Center for Intelligent Information Retrieval (W. Bruce Croft, Director) of the Computer Science Department at the University of Massachusetts at Amherst.

At the ACM SIGIR '96 conference in August, 1996 a group of text categorization researchers discussed how published results on Reuters-22173 could be made more comparable across studies. It was decided that a new version of collection should be produced with less ambiguous formatting, and including documentation carefully spelling out standard methods of using the collection. The opportunity would also be used to correct a variety of typographical and other errors in the categorization and formatting of the collection.

Steve Finch and David D. Lewis did this cleanup of the collection September through November of 1996, relying heavily on Finch's SGML-tagged version of the collection from an earlier study. One result of the re-examination of the collection was the removal of 595 documents which were exact duplicates (based on identity of timestamps down to the second) of other documents in the collection. The new collection therefore has only 21,578 documents, and thus is called the Reuters-21578 collection. This README describes version 1.0 of this new collection, which we refer to as "Reuters-21578, Distribution 1.0".

In preparing the collection...
h
reuters
huggingface.co
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shashank Verma (2024). reuters [Dataset]. https://huggingface.co/datasets/shashverma05/reuters
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2024
Authors
Shashank Verma
Description
shashverma05/reuters dataset hosted on Hugging Face and contributed by the HF Datasets community
h
reuters
huggingface.co
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yangchao (2001). reuters [Dataset]. https://huggingface.co/datasets/pinglsl/reuters
Explore at:
Dataset updated
Feb 1, 2001
Authors
yangchao
Description
pinglsl/reuters dataset hosted on Hugging Face and contributed by the HF Datasets community
reuters21578
kaggle.com
zip
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menna Allah Saeed (2024). reuters21578 [Dataset]. https://www.kaggle.com/datasets/mennaallahsaed/reuters21578
Explore at:
zip(8255999 bytes)Available download formats
Dataset updated
Apr 28, 2024
Authors
Menna Allah Saeed
Description
The Reuters-21578 dataset is a collection of documents containing news articles. Originally, the corpus comprises 10,369 documents and has a vocabulary of 29,930 unique words.

An additional challenge arises when the labels of the training instances are provided by noisy, heterogeneous crowdworkers with unknown qualities. Initially, assuming labels from a perfect source can help in modeling the problem effectively.

Source of data https://paperswithcode.com/dataset/reuters-21578
h
reuters-21578-train-val-test
huggingface.co
Updated Feb 26, 1987
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kushagri Tandon (1987). reuters-21578-train-val-test [Dataset]. https://huggingface.co/datasets/KushT/reuters-21578-train-val-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 26, 1987
Authors
Kushagri Tandon
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset from Kaggle The split is done on the training set using iterative_train_test_split from scikit-multilearn There are the following 90 labels. 'interest', 'groundnut-oil', 'potato', 'palmkernel', 'sun-meal', 'lei', 'cotton-oil', 'sunseed', 'sorghum', 'barley', 'dlr', 'groundnut', 'wpi', 'strategic-metal', 'livestock', 'l-cattle', 'lin-oil', 'gold', 'fuel', 'nzdlr', 'oat', 'soybean', 'hog', 'tin', 'lumber', 'bop', 'soy-oil', 'dfl', 'nkr', 'gas', 'carcass'… See the full description on the dataset page: https://huggingface.co/datasets/KushT/reuters-21578-train-val-test.
f
Results – Reuters.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 29, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Last, Mark; Howard, Newton; Argamon, Shlomo; Frieder, Ophir; Assaf, Dan; Neuman, Yair; Cohen, Yohai (2013). Results – Reuters. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001735882
Explore at:
Dataset updated
Apr 29, 2013
Authors
Last, Mark; Howard, Newton; Argamon, Shlomo; Frieder, Ophir; Assaf, Dan; Neuman, Yair; Cohen, Yohai
Description
Results – Reuters.
Thomson Reuters IPSOS PCSI
tipranks.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TipRanks, Thomson Reuters IPSOS PCSI [Dataset]. https://www.tipranks.com/calendars/economic/thomson-reuters-ipsos-pcsi-103347
Explore at:
Dataset authored and provided by
TipRankshttp://www.tipranks.com/
Time period covered
Jan 9, 2025 - Jan 15, 2026
Area covered
jp
Description
The Thomson Reuters IPSOS Primary Consumer Sentiment Index (PCSI) in Japan measures consumer confidence by aggregating data on personal financial conditions, economic expectations, investment climate, and employment outlook.
e
reuters.com Traffic Analytics Data
analytics.explodingtopics.com
Updated Jan 1, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2026). reuters.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/reuters.com
Explore at:
Dataset updated
Jan 1, 2026
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Mass Media Category Rank
Description
Traffic analytics, rankings, and competitive metrics for reuters.com as of January 2026
h
Reuters
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jukabo, Reuters [Dataset]. https://huggingface.co/datasets/Jukaboo/Reuters
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Jukabo
Description
Jukaboo/Reuters dataset hosted on Hugging Face and contributed by the HF Datasets community
reuters.com Website Traffic, Ranking, Analytics [February 2026]
sem1.heaventechit.com
Updated Mar 12, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2026). reuters.com Website Traffic, Ranking, Analytics [February 2026] [Dataset]. https://sem1.heaventechit.com/website/reuters.com/overview/
Explore at:
Dataset updated
Mar 12, 2026
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://sem1.heaventechit.com/company/legal/terms-of-service/https://sem1.heaventechit.com/company/legal/terms-of-service/
Time period covered
Mar 12, 2026
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
reuters.com is ranked #261 in US with 78.33M Traffic. Categories: Finance, Newspapers. Learn more about website traffic, market share, and more!
c
Thomson Reuters financial metrics and earnings dataset
capyfin.com
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CapyFin (2025). Thomson Reuters financial metrics and earnings dataset [Dataset]. https://capyfin.com/s/nasdaq/TRI
Explore at:
Dataset updated
Oct 29, 2025
Dataset authored and provided by
CapyFin
Variables measured
Adjusted EBITDA, Adjusted Earnings, Adjusted EBITDA Margin, Diluted Weighted-Average Shares, Net Cash from Operating Activities
Description
Quarterly and annual financial metrics, earnings history, and company performance data for Thomson Reuters.
w
Reuters-128 NIF NER Corpus
data.wu.ac.at
pdf, ttl
Updated Oct 29, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AKSW (2014). Reuters-128 NIF NER Corpus [Dataset]. https://data.wu.ac.at/odso/datahub_io/Y2UwODlhOTUtYTgxZC00NTk5LTlkOTgtODE4ZWUwMDAzMjM3
Explore at:
ttl, pdfAvailable download formats
Dataset updated
Oct 29, 2014
Dataset provided by
AKSW
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE. Compared to the News-100 corpus the documents of Reuters-128 are significantly shorter and thus carry a smaller context.

To create the annotation of NEs with URIs, we implemented a supporting judgement tool. . The input for the tool was a subset of more than 150 Reuters-21578 news articles sampled randomly. First, FOX (Ngonga Ngomo et al., 2011) was used for recognizing a first set of NEs. This reduced the amount of work to a feasible portion regarding the size of this dataset. Afterwards, the domain experts corrected the mistakes of FOX manually using the annotation tool. Therefore, the tool highlighted the entities in the texts and added initial URI candidates via simple string matching algorithms. Two scientists determined the correct URI for each named entity manually with an initial voter agreement of 74%. This low initial agreement rate hints towards the difficulty of the disambiguation task. In some cases judges did not agree initially, but came to an agreement shortly after reviewing the cases. While annotating, we left out ticker symbols of companies (e.g., GOOG for Google Inc.), abbreviations and job descriptions be- cause those are always preceded by the full company name respectively a person’s name.
Number of Thomson Reuters employees worldwide by region 2009-2023
statista.com
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of Thomson Reuters employees worldwide by region 2009-2023 [Dataset]. https://www.statista.com/statistics/292546/thomson-reuters-employees-by-region/
Explore at:
Dataset updated
Nov 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The workforce of Thomson Reuters declined significantly between 2009 and 2023. In 2023, however, their workforce grew slightly by approximately *** employees.

Thomson Reuters The Thomson Reuters Corporation is a multinational mass media and information company headquartered in Toronto, Canada. Outside of professional circles, the company is perhaps most associated with the provision of unaffiliated news content to media outlets under the Reuters name, including stories and photos for publication in newspapers. When broken down by business line, however, these services constituted a small amount of revenue generated by the company. The majority of revenue was generated by the provision of information services to corporations and governments, covering legal, tax and accounting, and policy-making more broadly. Of these services, the provision of legal information to law firms was their largest source of revenue. Reason for decline in employee numbers As with their employee numbers, the revenue of Thomson Reuters saw a major decline between 2011 and 2018, however has somewhat recovered since then. This decline was primarily due to the sale of the company’s stake in their financial and risk division. Formerly this division comprised a majority of the company’s revenue, with the sharp drop in revenue for 2017 reflecting the removal of this division’s revenue from Thomson Reuter’s balance sheet. Despite this loss of gross revenue, the company’s net income has remained relatively unaffected.
Thomson Reuters revenue 2020-2024
statista.com
Updated Mar 16, 2026
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2026). Thomson Reuters revenue 2020-2024 [Dataset]. https://www.statista.com/statistics/225359/thomson-reuters-revenue/
Explore at:
Dataset updated
Mar 16, 2026
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Canada
Description
The revenue of Thomson Reuters with headquarters in Canada amounted to ************* U.S. dollars in 2024. The reported fiscal year ends on December 31.Compared to 2020, this marks an increase of approximately ************* U.S. dollars. The trend from 2020 to 2024 shows, furthermore, that this increase happened continuously.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Reuters-21578 (Text Categorization) [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-financial-insights-with-the-reuters-2

Reuters-21578 (Text Categorization)

Ruters financial newswire service in 1987

Explore at:

zip(18703298 bytes)Available download formats

Dataset updated

Dec 2, 2022

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Reuters-21578 (Text Categorization)

Ruters financial newswire service in 1987

By Huggingface Hub [source]

About this dataset

The Reuters-21578 dataset, one of the most influential and widely used collections of newswire articles from the Reuters financial newswire service, is an essential benchmark for text categorization research. This extensive repository provides a range of valuable insight into topics frequently covered by financial publications and is available in multiple splits for optimal machine learning exploration.

Within this dataset, users will find columns with detailed information such as text (the full body of article text), text_type (classifying whether the article was part of the training or test set), topics (what topics are associated with the particular document), lewis_split (which split it belongs to) , cgis_split (split between train and test set given by core group iteration sampling method), places/people/orgs/exchanges mentioned within it, date and title. In addition to these classifications, there are separate files containing Reuters-21578 articles that were not used in specific splits (ModApte_unused.csv & ModLewis_unused.csv). By leveraging this dataset, you can unlock deep understanding into financial news categorization from an abundance of data points across categories - enabling you to build high performing models that provide better accuracy than ever before!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The Reuters-21578 dataset is a great resource for uncovering valuable insights in financial news. With its wide range of topics and data splits, it is well-suited to be used as a benchmark dataset for text categorization research. Here are some tips and tricks on how to get the most out of this dataset:

Familiarize yourself with the columns: Before getting started, make sure to familiarize yourself with all of the columns included in the dataset. This includes understanding what each column means, as well as identifying which are essential for your research project.

Use an appropriate split: Depending on your research goals, you may need to use different training and test sets from those provided in this dataset (ModHayes_train/test or ModLewis_train/test). You can also create custom splits from the unique ‘ModApte_unused’ set contained within this collection if desired.

Explore other methods: While text categorization is often used with this type of data, you may also want to explore other methods that can help uncover useful information such as topic modelling or sentiment analysis.

Leverage related packages: If you’re using Python or R there are some great packages available specifically designed for working with textual data from Reuters-21578 such as sklearn’s reuters21578 module and klabutils’ reutersR package respectively . Both offer helpful features such as vectorizers that let you transform words into feature vectors when implementing ML models such as Naive Bayes or Random Forest classifiers .

5 Tackle low-level preprocessing tasks : Before getting started with building models using ML algorithms , remember that all input data will benefit greatly from being cleaned up first – particularly in terms of removing invalid characters along side any symbols associated with a language other than English; which could severely affect model accuracy! Additionally , performing minor tasks like stopword removal and stemming words into their root form prior to getting underway could help improve overall performance too!

Research Ideas

Automated text classification - Using the data from the Reuters-21578 dataset, machine learning algorithms can be trained to automatically classify and categorize newswire articles into their appropriate topics. This not only saves time, but also ensures reliable results with minimal human intervention.

Sentiment analysis - By analyzing the sentiment of individual news article in the Reuters-21578 dataset, one could gain valuable insight into how people generally perceive financial news and then use this information to make more informed investing decisions.

Stock market predictions - By applying data mining techniques on the content of news articles in this dataset, correlations between certain topics or exchanges mentioned in an article and their effects on stock prices can be identified and used for algorithmic trading strategies aimed at predicting short term stock price movements accurately

Acknowledgements

If you use this dataset in your research, please credit the orig...

Clear search

Close search

Google apps

Main menu

Reuters-21578 (Text Categorization)

Reuters-21578 (Text Categorization)

Ruters financial newswire service in 1987

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

Reuters newswire classification dataset

Reuters newswire classification dataset

reuters-dataset

Dataset

Contents

reuters

reuters

reuters-articles

Dataset Reuters newswire topics in keras

reuters

reuters

reuters21578

reuters-21578-train-val-test

Results – Reuters.

Thomson Reuters IPSOS PCSI

reuters.com Traffic Analytics Data

Reuters

reuters.com Website Traffic, Ranking, Analytics [February 2026]

Thomson Reuters financial metrics and earnings dataset

Reuters-128 NIF NER Corpus

Number of Thomson Reuters employees worldwide by region 2009-2023

Thomson Reuters revenue 2020-2024

Reuters-21578 (Text Categorization)

Ruters financial newswire service in 1987

Reuters-21578 (Text Categorization)

Ruters financial newswire service in 1987

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements