2 datasets found
  1. 𝒙 Twemoji Dataset

    • kaggle.com
    Updated Sep 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 𝒙 Twemoji Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/twemoji-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    mexwell
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.

    The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.

    The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.

    The Balanced testset is a subset of the test set chosen to improve emoji class balance.

    The Image subsets are image-containing tweets.

    Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

    URL to get the tweet based on ID: `https://twitter.com/anyuser/status/

  2. f

    Twemoji Dataset

    • uvaauas.figshare.com
    txt
    Updated Feb 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.H. Cappallo (2018). Twemoji Dataset [Dataset]. http://doi.org/10.21942/uva.5822100.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 28, 2018
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    S.H. Cappallo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.The Balanced testset is a subset of the test set chosen to improve emoji class balance.The Image subsets are image-containing tweets.Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
mexwell (2023). 𝒙 Twemoji Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/twemoji-dataset/discussion
Organization logo

𝒙 Twemoji Dataset

13M tweets to predict the emoji based on text and/or image

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mexwell
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.

The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.

The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.

The Balanced testset is a subset of the test set chosen to improve emoji class balance.

The Image subsets are image-containing tweets.

Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

URL to get the tweet based on ID: `https://twitter.com/anyuser/status/

Search
Clear search
Close search
Google apps
Main menu