Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview Anonymized message classification data and actantial analyses of public Telegram channels pertaining to the paper "A Computational Analysis of Telegram's Narrative Affordances". Message classification data All files are included in the zipped folder 'narrative_affordances_data.zip' Each file contains the message classification data for a single Telegram channel. Numbered files are included for each of the six datasets (1 combined, 5 thematic) discussed in the paper. Actantial analysis Frequency lists of retrieved actants are included in the zipped folder 'overview_of_actants.zip'
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
2019-12-01 to 2024-08-31. It includes anonymized channel information, sampled messages, and topics identified using BERTopic. The data has been anonymized and structured for ease of analysis. The dataset comprises two main CSV files:topics.csv)This file contains topics extracted from the full dataset using BERTopic. Each topic is described by a concise text generated by OpenAI o1.
| Column Name | Description |
|---|---|
Topic | Numeric identifier for each topic. -1 is the generic topic for non-assignable messages. |
Name | Human-readable name summarizing the topic. |
Representation | List of representative keywords for the topic. |
Description | Concise description of the topic generated by OpenAI. |
messages.csv)This file contains a 25% stratified sample of messages (on topic column) from Telegram channels.
| Column Name | Description |
|---|---|
channel_id | Anonymized identifier for the Telegram channel. |
week_year | Week and year when the message was posted (format: week_year). |
media_type | Type of media included in the message (txt, img, video, audio, doc, web). |
reach | Number of users reached by the message. |
virality | Virality score of the message. |
is_viral | Boolean indicating whether the message is considered viral. |
topics | Topic identifier associated with the message. |
probs | Probability scores for topic assignment. |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Anonymized message classification data from public Telegram channels pertaining to the paper "A Computational Analysis of Telegram's Narrative Affordances".
Each file contains the message classification data for a single Telegram channel. Numbered files are included for each of the six datasets (1 combined, 5 thematic) discussed in the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set consist of public Telegram channels, concentrated on German-language discussions of QAnon.
The date range of the data is from the creation of the channel to 01 July 2021.
To collect the data, I first downloaded the chat history of 3 channels (listed under "Primary"), counted the number of forwarded messages from other channels/accounts, and selected the top 5 most-forwarded-from channels/accounts from my Primary level, and used those most-forwarded-from channels/accounts as my Secondary level.
I then repeated the process for the Secondary level, downloaded the chat histories and determining for the Secondary level the most-forwarded-from channels/accounts - the top 5 for each channel/account in the Secondary level became the Tertiary level.
I repeated this for the members of the Tertiary level, downloading their chat histories and determining what channels/groups were forwarded into the Tertiary level, but stopped the process there. For the visualization, I used the unique channels/accounts as nodes and the forwarding of a message as an edge connecting nodes.
Also included in this data set are the full text histories of the channels I collected data from, in the "Corpus" folder. The text of the messages were extracted from the JSON files of the chat history, leaving only the content of the messages.
My own analysis of this dataset has been basic, but I hope other researchers find this data useful.WF Thomaswfthomas@protonmail.comwww.wfthomas.com2021USE WITH ATTRIBUTION ONLY
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
TeleScope is an extensive dataset suite that comprises metadata for about 500K Telegram channels and downloaded message metadata from all 71K public channels within this 500k channels accounting for about 120M crawled messages. In addition to metadata, TeleScope suite provides enrichments like language detection and active periods for each channel and telegram entity extracted from messages. It also comprises channel connections and user interaction data built using Telegram’s message-forwarding feature to study multiple use cases including information spread and message-forwarding patterns. The dataset is designed for diverse applications, independent of specific research objectives, and sufficiently versatile to facilitate the replication of social media studies comparable to those conducted on platforms like X (former Twitter).
Further information on the content of the files can be found in the file TeleScope_readme_v1-0-0.txt (see 'Technical Report').
keywords: Computational Social Science; Information Science, Web and Social Media; text analysis; text processing; text communication; social media; Online discourse; Information Dissemination; Information Analysis
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
IDF Telegram Channel News Dataset is scraped from the IDF Telegram channel, the official Israeli news update channel, for the period between October 7, 2023, and June 20, 2024. It contains structured information extracted from news posts, including publication dates, textual content, and entities mentioned in the text. The dataset is ideal for natural language processing (NLP) tasks, entity recognition, and trend analysis. The first version of this dataset contains news from October 7, 2023, to April 23, 2024. The second version contains all news from October 7, 2023, to June 23, 2024 Features The dataset includes the following features:
Date: The publication date of the news post.
Text: The textual content of the news post.
Location: Names of locations mentioned in the news text.
Person: Names of individuals mentioned in the news text.
Organization: Names of organizations mentioned in the news text.
Siren: Indicator if the news text is about sirens being launched from or to a state.
1 if the news text mentions sirens being launched, 0 otherwise.Contributions Contributions to the dataset, such as improving the scraping methods, expanding the feature set, or enhancing data quality, are welcome. Please refer to the contribution guidelines in the repository.
License If you use IDF in your research, please cite the DOI of this dataset
Facebook
TwitterThis dataset contains the raw data used in the article “Using AI to detect misinformation and emotions on Telegram: a comparison with the media”, accepted for publication in index.comunicación. The data includes: • Telegram dataset (tg_messages.csv): 54,456 posts extracted from 33 public Telegram channels between 23 July and 16 November 2023, related to the political debate around the Amnesty Law in Spain. Each entry includes message metadata such as channel, date, views, and content. • News headlines dataset (Titulares.csv): 46,022 news headlines mentioning “amnesty”, extracted from 377 Spanish national media outlets indexed in MediaCloud, during the same period. • Analysis scripts: Available upon request or pending publication in the article’s supplementary materials.
The data was used for topic modelling, sentiment and emotion detection with NLP techniques based on Python libraries like BERTopic and pysentimiento. All data is anonymized and publicly accessible or derived from open sources.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication files (data, code, and outputs) for "Amplifying the Regime: Identifying Coordinated Activity of Pro-Government Telegram Channels in Russia and Belarus" http://dx.doi.org/10.1080/19331681.2025.2540822.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data used in this analysis was obtained from the Cumta Telegram channel, which provides real-time alerts about rocket sirens (Red Alerts) in Israel. The channel broadcasts information about alerts, including affected regions, cities, and timestamps. The dataset consists of extracted historical messages from this channel for further analysis and visualization.
cumta_detailed_df - Each string represents an alert recorded in a specific district (including city neighborhoods). This approach provides a more accurate estimate of the affected area.
cumta_df - Each string represents an alert event per city, even if it was heard in multiple neighborhoods within two minutes. This method ensures a more accurate count of alerts by city.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes Telegram channels with both pro-Kremlin and anti-Kremlin communications, collected over a timeframe covering one year prior to and one year following the Russian invasion. It consists of 404 pro-Kremlin channels featuring 4,109,645 posts and 114 anti-Kremlin channels containing 1,117,768 posts, all provided in JSON format. anti_kremlin_channel_list and pro_kremlin_channel_list encompasses details such as the channel name, username, Telegram link, and corresponding annotations. The overall data structure is defined in the accompanying data_schema (json) file.Important Note: For proper attribution, researchers who use this dataset in their work are invited to cite the following papers that describe this dataset and an example analysis.Bawa, A., Kursuncu, U., Achilov, D., Shalin, V. L., Agarwal, N., & Akbas, E. (2025). Telegram as a Battlefield: Kremlin-related Communications during the Russia-Ukraine Conflict. arXiv preprint arXiv:2501.01884.Bawa, A., Kursuncu, U., Achilov, D., & Shalin, V. L. (2024). the adaptive strategies of anti-kremlin digital dissent in telegram during the Russian invasion of Ukraine. arXiv preprint arXiv:2408.07135.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To research the illegal activities of underground apps on Telegram, we have created a dataset called TUApps. TUApps is a progressively growing dataset of underground apps, collected from September 2023 to February 2024, consisting of a total of 1,000 underground apps and 200 million messages distributed across 71,332 Telegram channels.
In the process of creating this dataset, we followed strict ethical standards to ensure the lawful use of the data and the protection of user privacy. The dataset includes the following files:
(1) dataset.zip: We have packaged the underground app samples. The naming of Android app files is based on the SHA256 hash of the file, and the naming of iOS app files is based on the SHA256 hash of the publishing webpage.
(2) code.zip: We have packaged the code used for crawling data from Telegram and for performing data analysis.
(3) message.zip: We have packaged the messages crawled from Telegram, the files are named after the names of the channels in Telegram.
Availability of code and messages
Upon acceptance of our research paper, the dataset containing user messages and the code used for data collection and analysis will only be made available upon request to researchers who agree to adhere to strict ethical principles and maintain the confidentiality of the data.
Facebook
TwitterDescription in English:
Dataset collected from 30 Russian-language Telegram channels on the topic of Investments, shares, this dataset contains information about shares, securities, bonds, their value, as well as news about investments and stock exchange analysis The dataset was collected and marked automatically using the Scoutie data collection and marking service.Try Scoutie and collect the same or another dataset using the link.
Dataset fields:
taskId - task… See the full description on the dataset page: https://huggingface.co/datasets/ScoutieAutoML/Investments_shares_stock_exchange_dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Summary
This dataset contains structured textual content in Markdown format extracted from Telegram messages written in the Tatar language. The messages were collected from several public Tatar-speaking groups and channels. The dataset is intended to provide clean, structured, and semantically meaningful conversational content to support natural language processing tasks, social media analysis, and research in Tatar language technologies. The extracted Markdown preserves key… See the full description on the dataset page: https://huggingface.co/datasets/yasalma/tg-messages.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study examines climate change narratives within the Midnight Rider Telegram channel, an alternative media platform characterized by minimal content moderation and high susceptibility to disinformation. Using inductive thematic analysis, we identify four interrelated narrative themes: Faultfinding, Disinformation, Elitism, and Appeal to Emotion. This channel was selected for analysis due to data availability around the COP period, the number of subscribers, and its association with right-wing followers. Our research focuses on how these narratives were constructed in this media Telegram channel between November 2, 2023, and January 10, 2024, around the time of COP28. We filtered messages using specific keywords relevant to climate change. For our analysis, we decided to focus exclusively on channel messages and short videos (shorter than 10 minutes) and to exclude images, due to the low number of images and their minimal text content. We transcribed the videos using the AI-driven video editor Descript (https://www.descript.com/) and merged the transcribed video content and the text messages into one document in chronological order. Thematic analysis was conducted using MAXQDA, a qualitative data analysis software, to identify patterns and themes in the collected data. The process involved familiarizing ourselves with the data, generating initial codes, searching for themes, reviewing and refining these themes, and producing a report on the findings. We ultimately identified nine codes, which were then grouped into four overarching themes.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The anonymous messaging app market, currently valued at $250 million in 2025, is projected to experience robust growth, driven by increasing concerns over online privacy and the need for secure communication. A Compound Annual Growth Rate (CAGR) of 9.3% from 2025 to 2033 indicates a significant expansion of this market. This growth is fueled by several factors. The rising adoption of smartphones and internet penetration globally provides a wider user base for these applications. Furthermore, the increasing instances of cyberbullying, data breaches, and government surveillance are driving users towards platforms that prioritize anonymity and data security. The market is segmented by features (e.g., self-destructing messages, end-to-end encryption), user demographics (age, location), and the type of messaging service offered (text, voice, video). Competitive pressures are high, with established players like WhatsApp and newer entrants vying for market share. The diverse range of apps, each offering unique features, caters to a wide spectrum of user needs and preferences, leading to a dynamic and competitive landscape. The success of individual apps depends on their ability to balance anonymity with user safety and compliance with data protection regulations. Challenges include managing the potential for misuse of the platforms, ensuring user privacy without compromising law enforcement efforts, and navigating the evolving regulatory landscape concerning data protection and online communication. Despite these challenges, the long-term outlook for the anonymous messaging app market remains positive, propelled by the unwavering demand for secure and private communication in an increasingly digital world. Continued innovation in encryption technology and user-friendly interfaces will further drive market expansion. Strategic partnerships and mergers and acquisitions among players in this space are expected, leading to further market consolidation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview Anonymized message classification data and actantial analyses of public Telegram channels pertaining to the paper "A Computational Analysis of Telegram's Narrative Affordances". Message classification data All files are included in the zipped folder 'narrative_affordances_data.zip' Each file contains the message classification data for a single Telegram channel. Numbered files are included for each of the six datasets (1 combined, 5 thematic) discussed in the paper. Actantial analysis Frequency lists of retrieved actants are included in the zipped folder 'overview_of_actants.zip'