8 datasets found
  1. p

    Albania Telegram Data

    • listtodata.com
    • ha.listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Albania Telegram Data [Dataset]. https://listtodata.com/albania-telegram-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Albania
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Albania telegram data provides an accurate TG phone number list—contact information for active telegram users. If you want to sell your items using telegram marketing campaigns, you may utilize our Albania telegram list. Data from 2025 is fresh and up to date. This tag data does not generate additional sales. It cannot be sold. The database is accurate and authentic. Albania telegram screening data will provide you with live and accurate telegram phone number leads. The Albania telegram dataset includes the following data: All number is open in telegram Gender age tg users name last activity date industry calcification. Albania tg powder might help you increase your business sales. Telegram is becoming one of the most effective tools for direct marketing. The offered data provides you with an accurate and up-to-date tg powder database. We also provide after-sales assistance to meet your company’s needs. Check out our packages here.

  2. TUApps

    • zenodo.org
    zip
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Anonymous; Anonymous Anonymous (2024). TUApps [Dataset]. http://doi.org/10.5281/zenodo.11201267
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous Anonymous; Anonymous Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To research the illegal activities of underground apps on Telegram, we have created a dataset called TUApps. TUApps is a progressively growing dataset of underground apps, collected from September 2023 to February 2024, consisting of a total of 1,000 underground apps and 200 million messages distributed across 71,332 Telegram channels.
    In the process of creating this dataset, we followed strict ethical standards to ensure the lawful use of the data and the protection of user privacy. The dataset includes the following files:
    (1) dataset.zip: We have packaged the underground app samples. The naming of Android app files is based on the SHA256 hash of the file, and the naming of iOS app files is based on the SHA256 hash of the publishing webpage.
    (2) code.zip: We have packaged the code used for crawling data from Telegram and for performing data analysis.
    (3) message.zip: We have packaged the messages crawled from Telegram, the files are named after the names of the channels in Telegram.
    Availability of code and messages
    Upon acceptance of our research paper, the dataset containing user messages and the code used for data collection and analysis will only be made available upon request to researchers who agree to adhere to strict ethical principles and maintain the confidentiality of the data.

  3. p

    Denmark Telegram Data

    • listtodata.com
    • ha.listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Denmark Telegram Data [Dataset]. https://listtodata.com/denmark-telegram-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Denmark
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Denmark telegram data includes 100% accurate contact information. If you want to promote your business or product and build your marketing campaign, you can use this Denmark telegram data without any hesitation. This database includes active telegram user contact information. If you use this tg data, you can receive a return on investment (ROI). List to Data generates new and active leads, which drive corporate success. Denmark telegram screening data offers the latest and most reliable leads for telegram phone numbers. The information will be provided as follows: All numbers are open in telegram Gender, Age, Telegram username, Last activity date, Industry calcification. Denmark tg powder pertains to telegram data originating from Denmark. This information offers valuable insights into the behavior of Danish consumers, enabling businesses to customize their marketing strategies effectively.

  4. Vent Here Dataset - Emotion and Sentiment Analysis

    • kaggle.com
    Updated Dec 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alpha mintamir (2024). Vent Here Dataset - Emotion and Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/alphamintamir/vent-here-dataset-emotion-and-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alpha mintamir
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains venting content scraped from the Ethiopian-based Telegram channel Vent Here. It has been pre-processed to remove non-English entries, emojis, and unwanted prefixes, with sentiment and emotion labels added for each entry.

    Key Features

    • Source: Sourced from the public Telegram channel Vent Here, where users share emotional experiences.
    • Pre-processing:
      • Removed emojis
      • Cleaned text prefixes
      • Filtered out non-English text

    Analysis

    • Sentiment: Classified using a Hugging Face sentiment pipeline (positive, negative, neutral) with a sentiment score ranging from -1 (negative) to 1 (positive).
    • Emotion: Analyzed using a RoBERTa-based model for emotion classification (e.g., happiness, sadness, anger).

    Columns

    • date: Timestamp of the venting content.
    • text: Cleaned vent text.
    • emotion_label: Predicted emotion.
    • sentiment_label: Sentiment label (positive, negative, neutral).
    • sentiment_score: Sentiment score, ranging from -1 to 1.

    This dataset provides valuable insights into emotional expression and sentiment in online communities.

  5. MultiSocial

    • zenodo.org
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal (2025). MultiSocial [Dataset]. http://doi.org/10.5281/zenodo.13846152
    Explore at:
    Dataset updated
    May 21, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.

    If you use this dataset in any publication, project, tool or in any other form, please, cite the a paper.

    Disclaimer

    Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously (if feeling affected by such content, report the found issues in this regard to dpo[at]kinit.sk). The intended use if for non-commercial research purpose only.

    Data Source

    The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:

    1. Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.

    2. Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).

    3. Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.

    4. Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).

    5. WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.

    From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.

    The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).

    The dataset has the following fields:

    • 'text' - a text sample,

    • 'label' - 0 for human-written text, 1 for machine-generated text,

    • 'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,

    • 'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,

    • 'language' - the ISO 639-1 language code identifying the detected language of the given text,

    • 'length' - word count of the given text,

    • 'source' - a string identifying the source dataset / platform of the given text,

    • 'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.

    ToDo Statistics (under construction)

  6. Z

    Dataset on the online cryptocurrency discussion on Twitter, Telegram, and...

    • data.niaid.nih.gov
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tardelli, Serena (2022). Dataset on the online cryptocurrency discussion on Twitter, Telegram, and Discord [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3895020
    Explore at:
    Dataset updated
    Nov 22, 2022
    Dataset provided by
    Avvenuti, Marco
    Ferrara, Emilio
    Cresci, Stefano
    Nizzoli, Leonardo
    Tardelli, Serena
    Tesconi, Maurizio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Dataset is described in Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020), a study that aims to map and assess the extent of cryptocurrency manipulations within and across the online ecosystems of Twitter, Telegram, and Discord. Starting from tweets mentioning cryptocurrencies, we leveraged and followed invite URLs from platform to platform, building the invite-link network, in order to study the invite link diffusion process.

    Please, refer to the paper below for more details.

    Nizzoli, L., Tardelli, S., Avvenuti, M., Cresci, S., Tesconi, M. & Ferrara, E. (2020). Charting the Landscape of Online Cryptocurrency Manipulation. IEEE Access (2020).

    This dataset is composed of:

    ~16M tweet ids shared between March and May 2019, mentioning at least one of the 3,822 cryptocurrencies (cashtags) provided by the CryptoCompare public API;

    ~13k nodes of the invite-link network, i.e., the information about the Telegram/Discord channels and Twitter users involved in the cryptocurrency discussion (e.g., id, name, audience, invite URL);

    ~62k edges of the invite-link network, i.e., the information about the flow of invites (e.g., source id, target id, weight).

    With such information, one can easily retrieve the content of channels and messages through Twitter, Telegram, and Discord public APIs.

    Please, refer to the README file for more details about the fields.

  7. Data from: Messages from SA covid vax chat Telegram channel

    • zenodo.org
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Pointer; Rebecca Pointer; Peter van Heusden; Peter van Heusden (2025). Messages from SA covid vax chat Telegram channel [Dataset]. http://doi.org/10.25379/uwc.26965024.v1
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rebecca Pointer; Rebecca Pointer; Peter van Heusden; Peter van Heusden
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1. SQLite format database containing messages from SA COVID VAX CHAT from 10 May 2021 to 22 May 2022. Original user IDs and user names have been replaced with anonymous IDs.
    2. Messages in CSV format filtered to remove spam and with themes annotated, from May 2021 to end 2022.
    3. Spam messages (messages showing up more than 100 times in the message dataset) in JSON format - one JSON record pre line (thus JSON-L format).

  8. MentalRiskES corpus

    • zenodo.org
    • investigacion.ujaen.es
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alba María Mármol-Romero; Alba María Mármol-Romero; Adrián Moreno-Muñoz; Adrián Moreno-Muñoz; Flor Miriam Plaza del Arco; Flor Miriam Plaza del Arco; María Dolores Molina-González; María Dolores Molina-González; Maria-Teresa Martin-Valdivia; Maria-Teresa Martin-Valdivia; Alfonso Ureña López; Alfonso Ureña López; Arturo Montejo-Ráez; Arturo Montejo-Ráez (2025). MentalRiskES corpus [Dataset]. http://doi.org/10.5281/zenodo.15275274
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alba María Mármol-Romero; Alba María Mármol-Romero; Adrián Moreno-Muñoz; Adrián Moreno-Muñoz; Flor Miriam Plaza del Arco; Flor Miriam Plaza del Arco; María Dolores Molina-González; María Dolores Molina-González; Maria-Teresa Martin-Valdivia; Maria-Teresa Martin-Valdivia; Alfonso Ureña López; Alfonso Ureña López; Arturo Montejo-Ráez; Arturo Montejo-Ráez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MentalRiskES is a new dataset about mental disorders in Spanish. The dataset is divided into three distinct mental disorders:

    • Eating Disorder
    • Depression
    • Anxiety

    Each dataset contains a set of subjects and their message thread in a Telegram social network chat.

    How is constructed?
    Public groups on the Telegram social network were accessed, and conversations were extracted from them. This data was processed, and we kept only the text messages, excluding images, audio, etc. In order to carry out the annotation, a subset of messages was extracted from each subject. This message thread was annotated by 10 different annotators through the Prolific platform and made use of the Doccano annotation platform.

    In this way, we associated a user ID with some tags that emerged after averaging the annotators' decisions. The labels available for each set are:

    • Eating Disorder: suffer (s), control (c)
    • Depression: suffer + in favour (sf), suffer + against (sa), suffer + other (so), control (c)
    • Anxiety: suffer (s), control (c)

    Labels
    The values available in Anxiety files are:

    • bs (binary suffer): 1 if the subject suffers and 0 if not according to the frequency of the labels (in case of a tie it is marked as suffers)
    • bc (binary control): 1 if the subject does not suffer and 0 if they do according to the frequency of the labels (in case of a tie it is marked as suffers)
    • rbs (regression binary suffer): number of times the subject has been marked as suffering among the total number of scorers, i.e., 10
    • rbc (regression binary control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10

    The values available in the Depression and Eating Disorders files are:

    • bs (binary suffer): 1 if the subject suffers and 0 if not, according to the frequency of the labels (in case of a tie it is marked as suffers)
    • bsf (binary suffer favour): 1 if the subject suffers and is in favour,r and 0 if not according to the frequency of the labels
    • bsa (binary suffer against): 1 if the subject suffers and is against, and 0 if not according to the frequency of the labels
    • bso (binary suffer other): 1 if the subject suffers and is neither in favour nor against and 0 if not according to the frequency of the labels
    • bc (binary control): 1 if the subject does not suffer and 0 if they do according to the frequency of the labels (in case of a tie it is marked as suffers)
    • rbs (regression binary suffer): number of times the subject has been marked as suffering among the total number of scorers, i.e., 10
    • rbc (regression binary control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10
    • rsf (regression suffer favour): number of times the subject has been marked as suffering and in favour among the total number of scorers, i.e., 10
    • rsa (regression suffer against): number of times the subject has been marked as suffering and against and in favour among the total number of scorers, i.e., 10
    • rso (regression suffer other): number of times the subject has been marked as suffering and is neither in favour nor against among the total number of scorers, i.e., 10
    • rc (regression control): number of times the subject has been marked as not suffering among the total number of scorers, i.e., 10 (Note that it is equal to 'rbc')
      So, the labels 'rbs' and 'rbc' must sum to 1, and the labels 'rsf','rsa', 'rso' and 'rc' must sum to 1 too.

    Preprocessing
    The same corpus is found with emojis or without emojis; that is to say, in the folder 'processed' is the corpus with emojis in text format, while in the folder 'raw' is the corpus with emojis in original format.

    MentalRiskES evaluation campaign
    MentalRiskES is a shared task organized at IberLEF. The aim of this task is to promote the early detection of mental risk disorders in Spanish. In this task we made use of the corpusMentalRiskES, the partitions used are available in the folder MentalRiskES2023edition.zip provided in git (https://github.com/sinai-uja/corpusMentalRiskES). To cite the task: Mármol-Romero, A. M., Moreno-Muñoz, A., Plaza-del-Arco, F. M., Molina-González, M. D., Martín-Valdivia, M. T., Ureña-López, L. A., & Montejo-Raéz, A. (2023). Overview of MentalriskES at IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish. Procesamiento del Lenguaje Natural, 71, 329-350.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
List to Data (2025). Albania Telegram Data [Dataset]. https://listtodata.com/albania-telegram-data

Albania Telegram Data

Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Albania
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description

Albania telegram data provides an accurate TG phone number list—contact information for active telegram users. If you want to sell your items using telegram marketing campaigns, you may utilize our Albania telegram list. Data from 2025 is fresh and up to date. This tag data does not generate additional sales. It cannot be sold. The database is accurate and authentic. Albania telegram screening data will provide you with live and accurate telegram phone number leads. The Albania telegram dataset includes the following data: All number is open in telegram Gender age tg users name last activity date industry calcification. Albania tg powder might help you increase your business sales. Telegram is becoming one of the most effective tools for direct marketing. The offered data provides you with an accurate and up-to-date tg powder database. We also provide after-sales assistance to meet your company’s needs. Check out our packages here.

Search
Clear search
Close search
Google apps
Main menu