51 datasets found
  1. f

    Twemoji Dataset

    • uvaauas.figshare.com
    txt
    Updated Feb 28, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.H. Cappallo (2018). Twemoji Dataset [Dataset]. http://doi.org/10.21942/uva.5822100.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 28, 2018
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    S.H. Cappallo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.The Balanced testset is a subset of the test set chosen to improve emoji class balance.The Image subsets are image-containing tweets.Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

  2. Tweets containing emojis 2013-2023

    • statista.com
    • ai-chatbox.pro
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Tweets containing emojis 2013-2023 [Dataset]. https://www.statista.com/statistics/1399380/tweets-containing-emojis/
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2013 - Mar 2023
    Area covered
    Worldwide
    Description

    The share of posts on microblogging platform Twitter that contain emojis has increased significantly over the past ten years. In July 2013, 4.25 percent of tweets contained at least one emoji. Just under one decade later, in March 2023, 26.7 percent of tweets contained an emoji. The most common reason for using emojis, according to users in the United States, was to make conversations more fun.

  3. f

    Coincidence matrix for tweets with emojis.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič (2023). Coincidence matrix for tweets with emojis. [Dataset]. http://doi.org/10.1371/journal.pone.0144296.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coincidence matrix for tweets with emojis.

  4. P

    Data from: Multimodal Emoji Prediction Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Barbieri; Miguel Ballesteros; Horacio Saggion (2021). Multimodal Emoji Prediction Dataset [Dataset]. https://paperswithcode.com/dataset/multimodal-emoji-prediction
    Explore at:
    Authors
    Francesco Barbieri; Miguel Ballesteros; Horacio Saggion
    Description

    The twitter emoji dataset obtained from CodaLab comprises of 50 thousand tweets along with the associated emoji label. Each tweet in the dataset has a corresponding numerical label which maps to a specific emoji. The emojis are of the 20 most frequent emojis and hence the labels range from 0 to 19

  5. f

    Sentiment of tweets with and without emojis.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič (2023). Sentiment of tweets with and without emojis. [Dataset]. http://doi.org/10.1371/journal.pone.0144296.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For each set, the mean, sd and sem are computed from the distribution of negative, neutral, and positive tweets.

  6. E

    ITAmoji dataset

    • live.european-language-grid.eu
    json
    Updated Nov 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). ITAmoji dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7477
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 25, 2021
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The ITAmoji dataset collects 275, 000 tweets that contain one and only one emoji over the 25 most frequent emojis. The dataset has been created and used in the context of the ITAmoji task (https://sites.google.com/view/itamoji/), organised as part of EVALITA 2018(http://www.evalita.it/2018). The task challenged participants to develop automatic systems that predict, given an Italian tweet, its most likely associ- ated emoji, selected in a wide and heterogeneous emoji space. The dataset is split into training set (250,000 tweets) and test set (25,000 tweets).

    In order to comply with GDPR privacy rules and Twitter’s policies, the identifiers of tweets and users have been anonymized and replaced by unique identifiers.

  7. Most popular emojis on Twitter by usage rate 2022

    • statista.com
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most popular emojis on Twitter by usage rate 2022 [Dataset]. https://www.statista.com/statistics/1367508/most-popular-emojis-twitter-usage-rate/
    Explore at:
    Dataset updated
    Mar 18, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2022 - Jan 31, 2022
    Area covered
    Worldwide
    Description

    In January 2022, the face with tears of joy emoji was the most used emoji on Twitter, with a usage rate of 1.81 for every ten thousand tweets. Loudly crying face emoji followed, with a usage rate of 1.78. Other popular emojis on Twitter included sparkles, rolling on the floor laughing, pleading face, and the red heart emoji.

  8. Z

    Italian Tweet Embeddings Used For Emoji Prediction

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giacomo Zara (2020). Italian Tweet Embeddings Used For Emoji Prediction [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1467219
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Yaroslav Nechaev
    Andrei Catalin Coman
    Giacomo Zara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 100d word embeddings trained on 48M Italian tweets using fastText and employed by our team to predict emojis during ITAmoji competition of EVALITA 2018 Evaluation Campaign.

  9. Emoji Gestures in English Tweets: California

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marina Zhukova; Marina Zhukova (2022). Emoji Gestures in English Tweets: California [Dataset]. http://doi.org/10.5281/zenodo.5802317
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 18, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marina Zhukova; Marina Zhukova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    The dataset consists of 479 193 tweets each of them contains one of the 31 gesture emoji (different hand configurations) and its skin tone modifier options (e.g. 🙏🙏🏿🙏🏾🙏🏽🙏🏼🙏🏻), posted within 250km from San Jose, CA and within 200km from Los Angeles, CA, in English, during May-August 2021. The dataset can be used to investigate the use of gesture emoji by English-speaking California Twitter users. Python libraries used for collecting tweets and preprocessing: tweepy, re, preprocessor, emoji, regex, string, nltk.

    The dataset contains 12 columns:

    1. tweet_original

      original text of the tweet

    2. preprocessed

      preprocessed text of the tweet (4 steps)

    3. all_emoji

      lists all emoji in a given tweet

    4. hashtags

      lists all hashtags in a given tweet

    5. user_encoded

      encoded Twitter user name: the first 3 characters of the user name and the first 3 characters of the user's location

    6. location_encoded

      location of the user: "los_angeles", "san_diego", "san_jose", "san_francisco", "fresno", "long_beach", "sacramento", "oakland", "bakersfield", "anaheim", or "other"

    7. mention_present

      checks whether each tweet contains mentions

    8. url_present

      checks whether each tweet contains url

    9. preprocess_tweet

      preprocessing step 1: tokenizing mentions, urls, and hashtags

    10. lowercase_tweet

      preprocessing step 2: lowercasing

    11. remove_punct_tweet

      preprocessing step 3: removing punctuation

    12. tokenize_tweet

      preprocessing step 4: tokenizing

    The further information on the research project can be found here: https://github.com/mzhukovaucsb/emoji_gestures/

  10. f

    Coincidence matrix for tweets without emojis.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič (2023). Coincidence matrix for tweets without emojis. [Dataset]. http://doi.org/10.1371/journal.pone.0144296.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coincidence matrix for tweets without emojis.

  11. Emoji usage on X/Twitter 2016-2021

    • statista.com
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Emoji usage on X/Twitter 2016-2021 [Dataset]. https://www.statista.com/statistics/1367443/share-of-tweets-containing-emojis/
    Explore at:
    Dataset updated
    Dec 12, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2016 - Jul 2021
    Area covered
    Worldwide
    Description

    In July 2021, 20.69 percent of monitored tweets contained at least one emoji, up from 20.15 percent in July of the previous year. Between 2016 and 2021, emoji usage on the micro-blogging platform increased by over 42 percent. Overall, 2018 to 2019 saw the largest year-on-year increase in emoji usage on Twitter.

  12. Emoji Gestures in Russian Tweets: Moscow

    • zenodo.org
    csv
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marina Zhukova; Marina Zhukova (2022). Emoji Gestures in Russian Tweets: Moscow [Dataset]. http://doi.org/10.5281/zenodo.5800200
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 18, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marina Zhukova; Marina Zhukova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Moscow
    Description

    The dataset consists of 48 838 tweets each of them contains one of the 31 gesture emoji (different hand configurations) and its skin tone modifier options (e.g. 🙏🙏🏿🙏🏾🙏🏽🙏🏼🙏🏻), and posted within 50km from Moscow, Russia, in Russian, during May-August 2021. The dataset can be used to investigate the use of gesture emoji by Russian users of the Twitter platform. Python libraries used for collecting tweets and preprocessing: tweepy, re, preprocessor, emoji, regex, string, nltk.

    The dataset contains 12 columns:

    1. tweet_original

      original text of the tweet

    2. preprocessed

      preprocessed text of the tweet (4 steps)

    3. all_emoji

      lists all emoji in a given tweet

    4. hashtags

      lists all hashtags in a given tweet

    5. user_encoded

      encoded Twitter user name: the first 3 characters of the user name and the first 3 characters of the user's location

    6. location_encoded

      location of the user: "moscow", "moscow_region", or "other"

    7. mention_present

      checks whether each tweet contains url

    8. url_present

      checks whether each tweet contains url

    9. preprocess_tweet

      preprocessing step 1: tokenizing mentions, urls, and hashtags

    10. lowercase_tweet

      preprocessing step 2: lowercasing

    11. remove_punct_tweet

      preprocessing step 3: removing punctuation

    12. tokenize_tweet

      preprocessing step 4: tokenizing

    The further information on the research project can be found here: https://github.com/mzhukovaucsb/emoji_gestures/

  13. f

    Top 10 emojis.

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič (2023). Top 10 emojis. [Dataset]. http://doi.org/10.1371/journal.pone.0144296.g001
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Emojis are ordered by the number of occurrences N. The average position ranges from 0 (the beginning of the tweets) to 1 (the end of the tweets). pc, c ∈ {−1, 0, +1}, are the negativity, neutrality, and positivity, respectively. is the sentiment score.

  14. d

    Tweet IDs used to study emoji syntax - Covid

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pereira, Alexandre (2024). Tweet IDs used to study emoji syntax - Covid [Dataset]. http://doi.org/10.7910/DVN/EVZVAN
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Pereira, Alexandre
    Description

    Tweet IDs used to study emoji syntax - Covid According to: Pereira, A., & Pestana, G. (2022). Is There Meaning in the Emoji Sequences Used on Social Media? The Architecture of a Model for Emoji Sequences Analysis. World Conference on Information Systems and Technologies (pp. 279–292). https://doi.org/10.1007/978-3-031-04819-7_28 Pereira, A., & Pestana, G. (2024). Syntax in Emoji Sequences on Social Media Posts. In World Conference on Information Systems and Technologies (pp. 97–107). Pereira, A., & Leite M.C., & Pestana, G. (2024) [Forthcoming]. Analyzing Syntactic Patterns in Emoji Sequences on Social Media.

  15. h

    tweet_eval

    • huggingface.co
    Updated Nov 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cardiff NLP (2021). tweet_eval [Dataset]. https://huggingface.co/datasets/cardiffnlp/tweet_eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2021
    Dataset authored and provided by
    Cardiff NLP
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for tweet_eval

      Dataset Summary
    

    TweetEval consists of seven heterogenous tasks in Twitter, all framed as multi-class tweet classification. The tasks include - irony, hate, offensive, stance, emoji, emotion, and sentiment. All tasks have been unified into the same benchmark, with each dataset presented in the same format and with fixed training, validation and test splits.

      Supported Tasks and Leaderboards
    

    text_classification: The dataset can be… See the full description on the dataset page: https://huggingface.co/datasets/cardiffnlp/tweet_eval.

  16. o

    New Years 2021 Tweets

    • opendatabay.com
    • kaggle.com
    .undefined
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). New Years 2021 Tweets [Dataset]. https://www.opendatabay.com/data/ai-ml/e621ef68-74a1-4014-9005-d8e7e51fba1b
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    Context I wrote a quick script to scrape 100k tweets that mentioned the keywords "New Year". I pulled these tweets from the Twitter API over the span of a couple of hours so there wouldn't be a clustering of tweets from a single timezone/country.

    Content These tweets were all scraped in the evening to the night of December 31st, 2021 from the Twitter API. I ignored all tweets that just retweeted or quote tweets from other users.

    Column 1 This column is just to keep track of the tweet number in this dataset. Since the id column tracks the tweet id from Twitter and those numbers are quite large. I wanted something smaller to keep track of ids in this scope.

    author_id This column is the unique id of the author of the tweet.

    id This column is the tweet id provided by Twitter.

    text The text of the tweet. Some tweets contain emojis, links, and mentions.

    username The username of the author of the tweet.

    Acknowledgements This dataset would not exist without the Twitter API.

    Inspiration One of my main ideas of something that could be done with this data would be a sentiment analysis on how people were feeling about the new year starting.

    License

    CC0

    Original Data Source: New Years 2021 Tweets

  17. f

    Inter-annotator agreement on tweets with and without emojis.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič (2023). Inter-annotator agreement on tweets with and without emojis. [Dataset]. http://doi.org/10.1371/journal.pone.0144296.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The agreement is computed in terms of three measures over a subset of tweets that were labeled by two different annotators.

  18. Video game tweets

    • kaggle.com
    Updated Jun 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Manikantan (2021). Video game tweets [Dataset]. https://www.kaggle.com/datasets/adimanz/video-game-tweets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2021
    Dataset provided by
    Kaggle
    Authors
    Aditya Manikantan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset consists of tweets scraped from Twitter containing the hashtag "videogames". There are 1135 tweets from August 2020 to December 2020. For simplicity of use, I have added another column that consists of clean tweets in tokenized form.

    Content

    This dataset consists of 1135 tweets and 5 columns: timestamp: Contains both the dates in YYYY-MM-DD format and time in HH:MM:SS format from August 2020 to December 2020. text: Tweets in their raw text format. likes: Number of likes the tweet received. retweets: Number of times the tweet was retweeted. clean_text: Tweets after they were cleaned (punctuations, stopwords, emojis and URLs removed, lemmatized, tokenized)

    Inspiration

    • Predict the number of likes or retweets for a given tweet.
    • Find the sentiment polarity of tweets.
    • Predict the sentiments of the given tweets using a pre-trained model.
    • Analyze the tweets to understand the positive and negative aspects of games through the perception of a user.
    • Frequency of Positive vs Negative tweets.
    • Topic modeling to group the most recurrent topic for a given sentiment.
  19. H

    Tweet IDs used to study emoji syntax - description

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Pereira (2024). Tweet IDs used to study emoji syntax - description [Dataset]. http://doi.org/10.7910/DVN/GWCGHL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Alexandre Pereira
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tweet IDs used to study emoji syntax - description

  20. d

    Tweet IDs used to study emoji syntax - Climate change

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pereira, Alexandre (2024). Tweet IDs used to study emoji syntax - Climate change [Dataset]. http://doi.org/10.7910/DVN/LJVCER
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Pereira, Alexandre
    Description

    Tweet IDs used to study emoji syntax - Climate change According to: Pereira, A., & Pestana, G. (2022). Is There Meaning in the Emoji Sequences Used on Social Media? The Architecture of a Model for Emoji Sequences Analysis. World Conference on Information Systems and Technologies (pp. 279–292). https://doi.org/10.1007/978-3-031-04819-7_28 Pereira, A., & Pestana, G. (2024). Syntax in Emoji Sequences on Social Media Posts. In World Conference on Information Systems and Technologies (pp. 97–107). Pereira, A., & Leite M.C., & Pestana, G. (2024) [Forthcoming]. Analyzing Syntactic Patterns in Emoji Sequences on Social Media.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
S.H. Cappallo (2018). Twemoji Dataset [Dataset]. http://doi.org/10.21942/uva.5822100.v3

Twemoji Dataset

Explore at:
221 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Feb 28, 2018
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
S.H. Cappallo
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.The Balanced testset is a subset of the test set chosen to improve emoji class balance.The Image subsets are image-containing tweets.Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

Search
Clear search
Close search
Google apps
Main menu