8 datasets found

p
Albania Telegram Data
listtodata.com
ha.listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Albania Telegram Data [Dataset]. https://listtodata.com/albania-telegram-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Albania
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Albania telegram data provides an accurate TG phone number list—contact information for active telegram users. If you want to sell your items using telegram marketing campaigns, you may utilize our Albania telegram list. Data from 2025 is fresh and up to date. This tag data does not generate additional sales. It cannot be sold. The database is accurate and authentic. Albania telegram screening data will provide you with live and accurate telegram phone number leads. The Albania telegram dataset includes the following data: All number is open in telegram Gender age tg users name last activity date industry calcification. Albania tg powder might help you increase your business sales. Telegram is becoming one of the most effective tools for direct marketing. The offered data provides you with an accurate and up-to-date tg powder database. We also provide after-sales assistance to meet your company’s needs. Check out our packages here.
TUApps
zenodo.org
zip
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous Anonymous; Anonymous Anonymous (2024). TUApps [Dataset]. http://doi.org/10.5281/zenodo.11201267
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11201267
Dataset updated
May 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous Anonymous; Anonymous Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To research the illegal activities of underground apps on Telegram, we have created a dataset called TUApps. TUApps is a progressively growing dataset of underground apps, collected from September 2023 to February 2024, consisting of a total of 1,000 underground apps and 200 million messages distributed across 71,332 Telegram channels.
In the process of creating this dataset, we followed strict ethical standards to ensure the lawful use of the data and the protection of user privacy. The dataset includes the following files:
(1) dataset.zip: We have packaged the underground app samples. The naming of Android app files is based on the SHA256 hash of the file, and the naming of iOS app files is based on the SHA256 hash of the publishing webpage.
(2) code.zip: We have packaged the code used for crawling data from Telegram and for performing data analysis.
(3) message.zip: We have packaged the messages crawled from Telegram, the files are named after the names of the channels in Telegram.
Availability of code and messages
Upon acceptance of our research paper, the dataset containing user messages and the code used for data collection and analysis will only be made available upon request to researchers who agree to adhere to strict ethical principles and maintain the confidentiality of the data.
p
Denmark Telegram Data
listtodata.com
ha.listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Denmark Telegram Data [Dataset]. https://listtodata.com/denmark-telegram-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Denmark
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Denmark telegram data includes 100% accurate contact information. If you want to promote your business or product and build your marketing campaign, you can use this Denmark telegram data without any hesitation. This database includes active telegram user contact information. If you use this tg data, you can receive a return on investment (ROI). List to Data generates new and active leads, which drive corporate success. Denmark telegram screening data offers the latest and most reliable leads for telegram phone numbers. The information will be provided as follows: All numbers are open in telegram Gender, Age, Telegram username, Last activity date, Industry calcification. Denmark tg powder pertains to telegram data originating from Denmark. This information offers valuable insights into the behavior of Danish consumers, enabling businesses to customize their marketing strategies effectively.
Vent Here Dataset - Emotion and Sentiment Analysis
kaggle.com
Updated Dec 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alpha mintamir (2024). Vent Here Dataset - Emotion and Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/alphamintamir/vent-here-dataset-emotion-and-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alpha mintamir
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains venting content scraped from the Ethiopian-based Telegram channel Vent Here. It has been pre-processed to remove non-English entries, emojis, and unwanted prefixes, with sentiment and emotion labels added for each entry.

Key Features

Source: Sourced from the public Telegram channel Vent Here, where users share emotional experiences.

Pre-processing:

Removed emojis

Cleaned text prefixes

Filtered out non-English text

Analysis

Sentiment: Classified using a Hugging Face sentiment pipeline (positive, negative, neutral) with a sentiment score ranging from -1 (negative) to 1 (positive).

Emotion: Analyzed using a RoBERTa-based model for emotion classification (e.g., happiness, sadness, anger).

Columns

date: Timestamp of the venting content.

text: Cleaned vent text.

emotion_label: Predicted emotion.

sentiment_label: Sentiment label (positive, negative, neutral).

sentiment_score: Sentiment score, ranging from -1 to 1.

This dataset provides valuable insights into emotional expression and sentiment in online communities.
MultiSocial
zenodo.org
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal (2025). MultiSocial [Dataset]. http://doi.org/10.5281/zenodo.13846152
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13846152
Dataset updated
May 21, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.

If you use this dataset in any publication, project, tool or in any other form, please, cite the a paper.

Disclaimer

Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously (if feeling affected by such content, report the found issues in this regard to dpo[at]kinit.sk). The intended use if for non-commercial research purpose only.

Data Source

The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:

Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.

Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).

Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.

Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).

WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.

From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.

The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).

The dataset has the following fields:

'text' - a text sample,

'label' - 0 for human-written text, 1 for machine-generated text,

'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,

'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,

'language' - the ISO 639-1 language code identifying the detected language of the given text,

'length' - word count of the given text,

'source' - a string identifying the source dataset / platform of the given text,

'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.

ToDo Statistics (under construction)
Z
Dataset on the online cryptocurrency discussion on Twitter, Telegram, and...
data.niaid.nih.gov
Updated Nov 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tardelli, Serena (2022). Dataset on the online cryptocurrency discussion on Twitter, Telegram, and Discord [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3895020
Explore at:
Dataset updated
Nov 22, 2022
Dataset provided by
Avvenuti, Marco
Ferrara, Emilio
Cresci, Stefano
Nizzoli, Leonardo
Tardelli, Serena
Tesconi, Maurizio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This Dataset is described in Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020), a study that aims to map and assess the extent of cryptocurrency manipulations within and across the online ecosystems of Twitter, Telegram, and Discord. Starting from tweets mentioning cryptocurrencies, we leveraged and followed invite URLs from platform to platform, building the invite-link network, in order to study the invite link diffusion process.

Please, refer to the paper below for more details.

Nizzoli, L., Tardelli, S., Avvenuti, M., Cresci, S., Tesconi, M. & Ferrara, E. (2020). Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020).

This dataset is composed of:

~16M tweet ids shared between March and May 2019, mentioning at least one of the 3,822 cryptocurrencies (cashtags) provided by the CryptoCompare public API;

~13k nodes of the invite-link network, i.e., the information about the Telegram/Discord channels and Twitter users involved in the cryptocurrency discussion (e.g., id, name, audience, invite URL);

~62k edges of the invite-link network, i.e., the information about the flow of invites (e.g., source id, target id, weight).

With such information, one can easily retrieve the content of channels and messages through Twitter, Telegram, and Discord public APIs.

Please, refer to the README file for more details about the fields.
Data from: Messages from SA covid vax chat Telegram channel
zenodo.org
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Pointer; Rebecca Pointer; Peter van Heusden; Peter van Heusden (2025). Messages from SA covid vax chat Telegram channel [Dataset]. http://doi.org/10.25379/uwc.26965024.v1
Explore at:
Unique identifier
https://doi.org/10.25379/uwc.26965024.v1
Dataset updated
Jun 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rebecca Pointer; Rebecca Pointer; Peter van Heusden; Peter van Heusden
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1. SQLite format database containing messages from SA COVID VAX CHAT from 10 May 2021 to 22 May 2022. Original user IDs and user names have been replaced with anonymous IDs.
2. Messages in CSV format filtered to remove spam and with themes annotated, from May 2021 to end 2022.
3. Spam messages (messages showing up more than 100 times in the message dataset) in JSON format - one JSON record pre line (thus JSON-L format).
MentalRiskES corpus
zenodo.org
investigacion.ujaen.es
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alba María Mármol-Romero; Alba María Mármol-Romero; Adrián Moreno-Muñoz; Adrián Moreno-Muñoz; Flor Miriam Plaza del Arco; Flor Miriam Plaza del Arco; María Dolores Molina-González; María Dolores Molina-González; Maria-Teresa Martin-Valdivia; Maria-Teresa Martin-Valdivia; Alfonso Ureña López; Alfonso Ureña López; Arturo Montejo-Ráez; Arturo Montejo-Ráez (2025). MentalRiskES corpus [Dataset]. http://doi.org/10.5281/zenodo.15275274
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15275274
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alba María Mármol-Romero; Alba María Mármol-Romero; Adrián Moreno-Muñoz; Adrián Moreno-Muñoz; Flor Miriam Plaza del Arco; Flor Miriam Plaza del Arco; María Dolores Molina-González; María Dolores Molina-González; Maria-Teresa Martin-Valdivia; Maria-Teresa Martin-Valdivia; Alfonso Ureña López; Alfonso Ureña López; Arturo Montejo-Ráez; Arturo Montejo-Ráez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MentalRiskES is a new dataset about mental disorders in Spanish. The dataset is divided into three distinct mental disorders:

Eating Disorder

Depression

Anxiety

Each dataset contains a set of subjects and their message thread in a Telegram social network chat.

How is constructed?
Public groups on the Telegram social network were accessed, and conversations were extracted from them. This data was processed, and we kept only the text messages, excluding images, audio, etc. In order to carry out the annotation, a subset of messages was extracted from each subject. This message thread was annotated by 10 different annotators through the Prolific platform and made use of the Doccano annotation platform.

In this way, we associated a user ID with some tags that emerged after averaging the annotators' decisions. The labels available for each set are:

Eating Disorder: suffer (s), control (c)

Depression: suffer + in favour (sf), suffer + against (sa), suffer + other (so), control (c)

Anxiety: suffer (s), control (c)

Labels
The values available in Anxiety files are:

bs (binary suffer): 1 if the subject suffers and 0 if not according to the frequency of the labels (in case of a tie it is marked as suffers)

bc (binary control): 1 if the subject does not suffer and 0 if they do according to the frequency of the labels (in case of a tie it is marked as suffers)

rbs (regression binary suffer): number of times the subject has been marked as suffering among the total number of scorers, i.e., 10

rbc (regression binary control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10

The values available in the Depression and Eating Disorders files are:

bs (binary suffer): 1 if the subject suffers and 0 if not, according to the frequency of the labels (in case of a tie it is marked as suffers)

bsf (binary suffer favour): 1 if the subject suffers and is in favour,r and 0 if not according to the frequency of the labels

bsa (binary suffer against): 1 if the subject suffers and is against, and 0 if not according to the frequency of the labels

bso (binary suffer other): 1 if the subject suffers and is neither in favour nor against and 0 if not according to the frequency of the labels

bc (binary control): 1 if the subject does not suffer and 0 if they do according to the frequency of the labels (in case of a tie it is marked as suffers)

rbs (regression binary suffer): number of times the subject has been marked as suffering among the total number of scorers, i.e., 10

rbc (regression binary control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10

rsf (regression suffer favour): number of times the subject has been marked as suffering and in favour among the total number of scorers, i.e., 10

rsa (regression suffer against): number of times the subject has been marked as suffering and against and in favour among the total number of scorers, i.e., 10

rso (regression suffer other): number of times the subject has been marked as suffering and is neither in favour nor against among the total number of scorers, i.e., 10

rc (regression control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10 (Note that it is equal to 'rbc')
So, the labels 'rbs' and 'rbc' must sum to 1, and the labels 'rsf','rsa', 'rso' and 'rc' must sum to 1 too.

Preprocessing
The same corpus is found with emojis or without emojis; that is to say, in the folder 'processed' is the corpus with emojis in text format, while in the folder 'raw' is the corpus with emojis in original format.

MentalRiskES evaluation campaign
MentalRiskES is a shared task organized at IberLEF. The aim of this task is to promote the early detection of mental risk disorders in Spanish. In this task we made use of the corpusMentalRiskES, the partitions used are available in the folder MentalRiskES2023edition.zip provided in git (https://github.com/sinai-uja/corpusMentalRiskES). To cite the task: Mármol-Romero, A. M., Moreno-Muñoz, A., Plaza-del-Arco, F. M., Molina-González, M. D., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Raéz, A. (2023). Overview of MentalriskES at IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish. Procesamiento del Lenguaje Natural, 71, 329-350.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

List to Data (2025). Albania Telegram Data [Dataset]. https://listtodata.com/albania-telegram-data

Albania Telegram Data

Explore at:

.csv, .xls, .txtAvailable download formats

Dataset updated

Jul 17, 2025

Dataset authored and provided by

List to Data

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered

Jan 1, 2025 - Dec 31, 2025

Area covered

Albania

Variables measured

phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,

Description

Albania telegram data provides an accurate TG phone number list—contact information for active telegram users. If you want to sell your items using telegram marketing campaigns, you may utilize our Albania telegram list. Data from 2025 is fresh and up to date. This tag data does not generate additional sales. It cannot be sold. The database is accurate and authentic. Albania telegram screening data will provide you with live and accurate telegram phone number leads. The Albania telegram dataset includes the following data: All number is open in telegram Gender age tg users name last activity date industry calcification. Albania tg powder might help you increase your business sales. Telegram is becoming one of the most effective tools for direct marketing. The offered data provides you with an accurate and up-to-date tg powder database. We also provide after-sales assistance to meet your company’s needs. Check out our packages here.

Clear search

Close search

Google apps

Main menu

Albania Telegram Data

TUApps

Denmark Telegram Data

Vent Here Dataset - Emotion and Sentiment Analysis

Key Features

Analysis

Columns

MultiSocial

Disclaimer

Data Source

Dataset on the online cryptocurrency discussion on Twitter, Telegram, and...

Data from: Messages from SA covid vax chat Telegram channel

MentalRiskES corpus

Albania Telegram Data