34 datasets found
  1. Twitter Friends

    • kaggle.com
    Updated Sep 2, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hubert Wassner (2016). Twitter Friends [Dataset]. https://www.kaggle.com/hwassner/TwitterFriends/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2016
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hubert Wassner
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Twitter Friends and hashtags

    Context

    This datasets is an extract of a wider database aimed at collecting Twitter user's friends (other accound one follows). The global goal is to study user's interest thru who they follow and connection to the hashtag they've used.

    Content

    It's a list of Twitter user's informations. In the JSON format one twitter user is stored in one object of this more that 40.000 objects list. Each object holds :

    • avatar : URL to the profile picture

    • followerCount : the number of followers of this user

    • friendsCount : the number of people following this user.

    • friendName : stores the @name (without the '@') of the user (beware this name can be changed by the user)

    • id : user ID, this number can not change (you can retrieve screen name with this service : https://tweeterid.com/)

    • friends : the list of IDs the user follows (data stored is IDs of users followed by this user)

    • lang : the language declared by the user (in this dataset there is only "en" (english))

    • lastSeen : the time stamp of the date when this user have post his last tweet.

    • tags : the hashtags (whith or without #) used by the user. It's the "trending topic" the user tweeted about.

    • tweetID : Id of the last tweet posted by this user.

    You also have the CSV format which uses the same naming convention.

    These users are selected because they tweeted on Twitter trending topics, I've selected users that have at least 100 followers and following at least 100 other account (in order to filter out spam and non-informative/empty accounts).

    Acknowledgements

    This data set is build by Hubert Wassner (me) using the Twitter public API. More data can be obtained on request (hubert.wassner AT gmail.com), at this time I've collected over 5 milions in different languages. Some more information can be found here (in french only) : http://wassner.blogspot.fr/2016/06/recuperer-des-profils-twitter-par.html

    Past Research

    No public research have been done (until now) on this dataset. I made a private application which is described here : http://wassner.blogspot.fr/2016/09/twitter-profiling.html (in French) which uses the full dataset (Millions of full profiles).

    Inspiration

    On can analyse a lot of stuff with this datasets :

    • stats about followers & followings
    • manyfold learning or unsupervised learning from friend list
    • hashtag prediction from friend list

    Contact

    Feel free to ask any question (or help request) via Twitter : @hwassner

    Enjoy! ;)

  2. Twitter Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 8, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.

  3. u

    Data from: Exploratory Twitter hashtag analysis of movie premieres in the...

    • portalcientificovalencia.univeuropea.com
    Updated 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeste, Víctor; Yeste, Víctor (2024). Exploratory Twitter hashtag analysis of movie premieres in the USA [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed1aea56d4af0485dad
    Explore at:
    Dataset updated
    2024
    Authors
    Yeste, Víctor; Yeste, Víctor
    Area covered
    United States
    Description

    This work is an exploratory, quantitative, and not experimental study with an inductive inference type and a longitudinal follow-up. It analyzes movie data and tweets published by users using the official Twitter hashtags of movie premieres the week before, the same week, and the week after each release date.The scope of the study is the collection of movies released in February 2022 in the USA, and the object of the study includes them and the tweets that refer to the film in the 3 closest weeks to their premiere dates. The tweets recollected were classified by the week they were published, so they are classified by a time dimension called timepoint. The week before the release date has been designated as timepoint 1, the week of the release date is timepoint 2, and the week immediately afterward is timepoint 3. Another dimension that has been considered is if the movie has domestic production or not, which means that if one of the countries of origin is the United States, the movie is designated as domestic.The chosen variables are organized in two data tables, one for the movies and one for the collected tweets.Variables related to the movies:id: Internal id of the moviename: Title of the moviehashtag: Official hashtag of the moviecountries: List of countries of the movie, separated by a semicolonmpaa: Film ratings system by the Motion Picture Association of America. It is a completely voluntary rating system and ratings have no legal standing. The currently rating systems include G (general audiences), PG (parental guidance suggested), PG-13 (parents strongly cautioned), R (restricted, under 17 requires accompanying parent or adult guardian) and NC-17 (no one 17 and under admitted)(Film Ratings - Motion Picture Association, n.d.)genres: List of genres of the movie, e.g., Action or Thriller, separated by a semicolonrelease_date: Release date of the movie in a format YYYY-MM-DDopening_grosses: Amount of USA dollars that the movie obtained on the opening date (the first week after the release date)opening_theaters: Amount of USA theaters that released the movie on the opening date (the first week after the release date)rating_avg: Average rating of the movieVariables related to the tweets:id: Internal id of the tweetstatus_id: Twitter id of the tweetmovie_id: Internal id of the movietimepoint: Week number related to the movie premiere that the tweet was published on. “1” is the week before the movie release, “2” is the week after the movie release” and “3” is the second week after the movie release.author_id: Twitter id of the author of the tweetcreated_at: Date and time of the tweet, with format “YYYY-MM-DD HH:MM:SS”quote_count: Number of the tweet’s quotesreply_count: Number of the tweet’s repliesretweet_count: Number of the tweet’s retweetslike_count: Number of the tweet’s likessentiment: Sentiment analysis of the tweet’s content with a range from -1 (negative) to 1 (positive)This dataset has contributed to the elaboration of the book chapters:Yeste, Víctor; Calduch-Losa, Ángeles (2022). Genre classification of movie releases in the USA: Exploring data with Twitter hashtags. In Narrativas emergentes para la comunicación digital (pp. 1012-1044). Dykinson, S. L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). Exploratory Twitter hashtag analysis of movie premieres in the USA. In Desafíos audiovisuales de la tecnología y los contenidos en la cultura digital (pp. 169-187). McGraw-Hill Interamericana de España S.L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). ANOVA to study movie premieres in the USA and online conversation on Twitter. The case of rating average using data from official Twitter hashtags. In El mapa y la brújula. Navegando por las metodologías de investigación en comunicación (pp. 151-168). Editorial Fragua.

  4. B

    COVID-19 Twitter Dataset

    • borealisdata.ca
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anatoliy Gruzd; Philip Mai (2020). COVID-19 Twitter Dataset [Dataset]. http://doi.org/10.5683/SP2/PXF2CU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Borealis
    Authors
    Anatoliy Gruzd; Philip Mai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs

  5. P

    Homophobia Detection Dataset (Twitter/X) Dataset

    • paperswithcode.com
    Updated May 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Homophobia Detection Dataset (Twitter/X) Dataset [Dataset]. https://paperswithcode.com/dataset/homophobia-detection-dataset-twitter-x
    Explore at:
    Dataset updated
    May 14, 2024
    Description

    Dataset Description

    Paper: TBC Point of Contact: Josh McGiff (Josh.McGiff@ul.ie)

    Dataset Summary This dataset was developed to address the significant gap in online hate speech detection, particularly focusing on homophobia, which is often neglected in sentiment analysis research. It comprises tweets scraped from X (formerly Twitter), which have been labeled for the presence of homophobic content by volunteers from diverse backgrounds. This dataset is the largest open-source labelled English dataset for homophobia detection known to the authors and aims to enhance online safety and inclusivity.

    Supported Tasks

    Task: Homophobic hate speech detection.

    Languages English.

    Dataset Structure

    Data Fields: tweet_text: The text content of the tweet. label: Binary label indicating the presence of homophobic content (0 = no homophobic content, 1 = homophobic content). 'language': The language of the tweet, as tagged by X/Twitter.

    Dataset Creation

    Curation Rationale: The dataset was curated to enhance the detection and classification of homophobic content on social media platforms, particularly focusing on the gap where homophobia is underrepresented in current research. Source Data: Data was scraped from X (formerly Twitter) focusing on terms and accounts associated with the LGBTQIA+ community. Annotation Process: Annotations were made by three volunteers from different sexualities and gender identities using a majority vote for label assignment. Annotations were conducted in Microsoft Excel over several days. Personal and Sensitive Information: Usernames and other personal identifiers have been anonymized or removed. URLs have also been removed. The dataset contains sensitive content related to homophobia.

    Considerations for Using the Data

    Social Impact: The dataset is intended for research purposes to combat online hate speech and improve inclusivity and safety on digital platforms. Ethical Considerations: Given the sensitive nature of hate speech, researchers should consider the impact of their work on marginalised communities and ensure that their use of the dataset aims to reduce harm and promote inclusivity. Legal and Privacy Concerns: Researchers should comply with legal standards and ethical guidelines regarding hate speech and data privacy.

    Additional Information

    License: CC-BY-4.0 Citation: TBC

    Acknowledgements This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Artificial Intelligence under Grant No. 18/CRT/6223.

  6. Top 5 sources of place-tagged tweets in our data set.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudy Arthur; Hywel T. P. Williams (2023). Top 5 sources of place-tagged tweets in our data set. [Dataset]. http://doi.org/10.1371/journal.pone.0218454.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rudy Arthur; Hywel T. P. Williams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There are 18,421,520 tweets in total.

  7. g

    Geotagged Twitter posts from the United States: A tweet collection to...

    • search.gesis.org
    • datacatalogue.cessda.eu
    • +1more
    Updated Mar 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pfeffer, Jürgen; Morstatter, Fred (2021). Geotagged Twitter posts from the United States: A tweet collection to investigate representativeness [Dataset]. http://doi.org/10.7802/1166
    Explore at:
    Dataset updated
    Mar 4, 2021
    Dataset provided by
    GESIS, Köln
    GESIS search
    Authors
    Pfeffer, Jürgen; Morstatter, Fred
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Area covered
    United States
    Description

    This dataset consists of IDs of geotagged Twitter posts from within the United States. They are provided as files per day and state as well as per day and county. In addition, files containing the aggregated number of hashtags from these tweets are provided per day and state and per day and county. This data is organized as a ZIP-file per month containing several zip-files per day which hold the txt-files with the ID/hash information.

    Also part of the dataset are two shapefiles for the US counties and states and Python scripts for the data collection and sorting geotags into counties.

  8. d

    Replication Data for Hashtag Co-occurrence Community Detection

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhu, Xi; Alan M. MacEachren (2023). Replication Data for Hashtag Co-occurrence Community Detection [Dataset]. https://dataone.org/datasets/sha256%3Ae05ed893fdd0f93eb0847738fc1d2d4f8e95fb6ed5d293a95bd5f3fe04cfe1ea
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zhu, Xi; Alan M. MacEachren
    Time period covered
    Jan 1, 2016 - Dec 31, 2017
    Description

    Geotagged public tweets from Twitter streaming API. Date range: January 1, 2016 to December 31, 2017. Data size:4 GB; about 170 million tweets with hashtags. Attributes: Each tweet is associated with a tweet id, timestamp, anonymized user ID, and a list of hashtags.

  9. Squid Game Netflix Twitter Data

    • kaggle.com
    zip
    Updated Oct 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deep Contractor (2021). Squid Game Netflix Twitter Data [Dataset]. https://www.kaggle.com/datasets/deepcontractor/squid-game-netflix-twitter-data/versions/6
    Explore at:
    zip(6803403 bytes)Available download formats
    Dataset updated
    Oct 16, 2021
    Authors
    Deep Contractor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.looper.com/img/gallery/the-ending-of-squid-game-season-1-explained/intro-1632168234.jpg" alt="">

    • The dataset contains the recent tweets about the record-breaking Netflix show "Squid Game"

    • The data is collected using tweepy Python package to access Twitter API.

  10. f

    Selected tweets of a detected migrant who moved from Virginia to New York on...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock (2023). Selected tweets of a detected migrant who moved from Virginia to New York on 2014-09-04 based on our approach. [Dataset]. http://doi.org/10.1371/journal.pone.0239408.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Selected tweets of a detected migrant who moved from Virginia to New York on 2014-09-04 based on our approach.

  11. r

    2011 UK Riots Tweets

    • researchdata.edu.au
    Updated May 30, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RMIT University, Australia (2017). 2011 UK Riots Tweets [Dataset]. http://doi.org/10.4225/61/593f17d319bc1
    Explore at:
    Dataset updated
    May 30, 2017
    Dataset provided by
    RMIT University, Australia
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Time period covered
    Aug 1, 2011 - Aug 31, 2011
    Area covered
    United Kingdom
    Description

    Collection of tweets captured at the time of the 2011 UK Riots. This collection is only partial, retrieved via the streaming API.

    The data provides a historical record of public discussion on Twitter during a significant social happening. It also represents a useful resources for experimentation and methodological development. The data is both a social and an informational resource, enabling the analysis of a significant social event and the development/application of computational tools for, among other aims, natural language processing, information retrieval, meta data analysis. In addition to the principle collection of tweets (UK Riots Database), a sub-collection has been extracted that includes only the geo-tagged tweets. Finally, these databases are stored on MongoDB and are made queryable using a special interface (see the UK Riots Database for access and instructions) that allows queries to be stored in another dataset, shared, and re-executed.

  12. Tweets Tagged with #NoJusticeNoLeBron

    • data.4tu.nl
    • 4tu.edu.hpc.n-helix.com
    zip
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy J Piper (2023). Tweets Tagged with #NoJusticeNoLeBron [Dataset]. http://doi.org/10.4121/uuid:57b4f590-8e3e-475c-a7c7-56052303d5cb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2023
    Dataset provided by
    4TUhttps://www.4tu.nl/
    Authors
    Timothy J Piper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a table describing Tweets posted from 12/28/2016 through 12/29/2016 that were tagged with #NoJusticeNoLeBron

  13. Performance of the six frequency-based algorithms and our proposed...

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock (2023). Performance of the six frequency-based algorithms and our proposed segment-based algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0239408.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    See section “Traditional frequency-based methods” for the details of the six frequency-based methods.

  14. Z

    Data from: Detecting East Asian Prejudice on Social Media

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bertie Vidgen (2024). Detecting East Asian Prejudice on Social Media [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3816666
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    David Broniatowski
    Scott Hale
    Rebekah Tromble
    Matthew Hall
    Austin Botelho
    Ella Guest
    Bertie Vidgen
    Helen Margetts
    Zeerak Waseem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    East Asia
    Description

    This repository contains:

    A deep learning model which distinguishes between Hostililty against East Asia, Criticism of East Asia, Discussion of East Asian prejudice and Neutral content. The F1 score is 0.83.

    A detailed annotation codebook used for marking up the tweets.

    A labelled dataset with 20,000 entries.

    A dataset with all 40,000 annotations, which can be used to investigate annotation processes for abusive content moderation.

    A list of thematic hashtag replacements.

    Three sets of annotations for the 1,000 most used hashtags in the original database of COVID-19 related tweets. Hashtags were annotated for COVID-19 relevance, East Asian relevance and stance.

    The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostility directed against East Asia. This data repository is for a classifier that detects and categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice and a neutral class. The classifier achieves an F1 score of 0.83 across all four classes. We provide our final model (coded in Python), as well as a new 20,000 tweet training dataset used to make the classifier, two analyses of hashtags associated with East Asian prejudice and the annotation codebook. The classifier can be implemented by other researchers, assisting with both online content moderation processes and further research into the dynamics, prevalence and impact of East Asian prejudice online during this global pandemic.

    This work is a collaboration between The Alan Turing Institute and the Oxford Internet Institute. It was funded by the Criminal JusticeTheme of the Alan Turing Institute under Wave 1 of The UKRI Strategic Priorities Fund, EPSRC Grant EP/T001569/1

  15. c

    Data from: Twitter corpus Janes-Tweet 1.0

    • clarin.si
    • live.european-language-grid.eu
    Updated Sep 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikola Ljubešić; Tomaž Erjavec; Darja Fišer (2017). Twitter corpus Janes-Tweet 1.0 [Dataset]. https://www.clarin.si/repository/xmlui/handle/11356/1142
    Explore at:
    Dataset updated
    Sep 5, 2017
    Authors
    Nikola Ljubešić; Tomaž Erjavec; Darja Fišer
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by approx. 9,000 users that tweet mostly in Slovene. The corpus is structured into individual tweets, together with their metadata. The tweets in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to Twitter terms-of-service, the corpus is distributed in an encoded version. The included tweetpub program (also available and documented on https://github.com/clarinsi/tweetpub) should be used to decode it, which it does by fetching the original tweets and applying a diff operation on the distributed corpus. Note that the retrieved corpus can have fewer tweets than the distributed version if some have been removed from Twitter by their authors in the meantime.

  16. u

    Hashtags used by museums Twitter accounts from REMED

    • portalcientificovalencia.univeuropea.com
    • figshare.com
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeste, Víctor; Yeste, Víctor (2024). Hashtags used by museums Twitter accounts from REMED [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed1aea56d4af0485daa
    Explore at:
    Dataset updated
    2024
    Authors
    Yeste, Víctor; Yeste, Víctor
    Description

    This study consists of quantitative, explanatory, and non-experimental research using inductive inference longitudinally. Thus, the use of hashtags by the Twitter accounts of the set of museums that are part of REMED is studied, and the analysis of hashtag trends by Twitter users in Spanish is performed.The primary variable is the favorite count, and it is hypothesized from this study that it is possible to predict the primary variable five weeks later. The field of study is formed by the 104 Twitter accounts of the museums that are part of REMED (Red de Museos y Estrategias Digitales).Seven analysis variables explain the information related to the use of hashtags, both in the size of the Twitter accounts of museums of the sample chosen (prefix "m_" in the variables) and Twitter users in Spanish in general (prefix "tw_" in variables). All variables represent the data in count mode, which means that they sum up the total of the data collected for each tweet of each hashtag processed:Number of tweets (variable name "num_tweets")Number of retweets (variable name "retweet_count") Number of favorites (variable name "favorite_count")Number of followers of tweeters (variable name "user_num_followers")Number of tweets published by tweeters (variable name "user_num_tweets")Age in days of tweeters' Twitter accounts (variable name "user_age")Number of tweets including a URL (variable name "url_inclusion")With the variables above, an investigation has been carried out by checking the correlations between the variables and performing a regression analysis. Thus, the relationships between the variables are ascertained and analyzed to determine if it is possible to predict the number of favorites of the hashtags used by museums. The first initial intake is presented in the file cimed-2021-ini.csv, and the intake made 5 weeks later is presented in the file cimed-2021-end.csv.This dataset has contributed to the elaboration of the book chapter:Yeste Moreno, V.; Calduch-Losa, Á.; Serrano-Cobos, J. (2022). Estudio predictivo del uso colectivo de hashtags en museos de la red REMED. En CIMED21 - I Congreso internacional de museos y estrategias digitales. Editorial Universitat Politècnica de València. 251-265. https://doi.org/10.4995/CIMED21.2021.12281

  17. f

    An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT

    • city.figshare.com
    html
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ernesto Priego (2023). An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT [Dataset]. http://doi.org/10.6084/m9.figshare.3487103.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    City, University of London
    Authors
    Ernesto Priego
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe Digital Humanities 2016 conference is taking/took place in Kraków, Poland, between Sunday 11 July and Saturday 16 July 2016. #DH2016 is/was the conference official hashtag.What This Output IsThis is a CSV file containing a total of 3717 Tweets publicly published with the hashtag #DH2016 on Thursday 14 July 2016 GMT.The

    archive starts with a Tweet published on Thursday July 14 2016 at 00:01:04 +0000 and ends with a Tweet published on Thursday July 14 2016 at 23:49:14 +0000 (GMT). Previous days have been shared on a different output. A breakdown of Tweets per day so far:Sunday 10 July 2016: 179 TweetsMonday 11 July 2016: 981 TweetsTuesday 12 July 2016: 2318 TweetsWednesday 13 July 2016: 4175 TweetsThursday 14 July 2016: 3717 Tweets Methodology and LimitationsThe Tweets contained in this file were collected by Ernesto Priego using Martin Hawksey's TAGS 6.0. Only users with at least 1 follower were included in the archive. Retweets have been included (Retweets count as Tweets). The collection spreadsheet was customised to reflect the time zone and geographical location of the conference.The profile_image_url and entities_str metadata were removed before public sharing in this archive. Please bear in mind that the conference hashtag has been spammed so some Tweets colllected may be from spam accounts. Some automated refining has been performed to remove Tweets not related to the conference but the data is likely to require further refining and deduplication. Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (Gonzalez-Bailon, Sandra, et al. 2012).Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet tagged with #dh2016 during the indicated period, and the dataset is shared for archival, comparative and indicative educational research purposes only.Only content from public accounts is included and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.Each Tweet and its contents were published openly on the Web with the queried hashtag and are responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually. No private personal information is shared in this dataset. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road. This dataset is shared to archive, document and encourage open educational research into scholarly activity on Twitter. Other ConsiderationsTweets published publicly by scholars during academic conferences are often tagged (labeled) with a hashtag dedicated to the conference in question.The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour. In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies. Professional associations like the Modern Language Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.Beyond individual tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. To date, collecting in real time is the only relatively accurate method to archive tweets at a small scale. Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time.The CC-BY license has been applied to the output in the repository as a curated dataset. Authorial/curatorial/collection work has been performed on the file in order to make it available as part of the scholarly record. The data contained in the deposited file is otherwise freely available elsewhere through different methods and anyone not wishing to attribute the data to the creator of this output is needless to say free to do their own collection and clean their own data.

  18. Tweets Targeting Isis

    • kaggle.com
    zip
    Updated Nov 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ActiveGalaXy (2019). Tweets Targeting Isis [Dataset]. https://www.kaggle.com/activegalaxy/isis-related-tweets
    Explore at:
    zip(10419329 bytes)Available download formats
    Dataset updated
    Nov 17, 2019
    Authors
    ActiveGalaXy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The image at the top of the page is a frame from today's (7/26/2016) Isis #TweetMovie from twitter, a "normal" day when two Isis operatives murdered a priest saying mass in a French church. (You can see this in the center left). A selection of data from this site is being made available here to Kaggle users.

    UPDATE: An excellent study by Audrey Alexander titled Digital Decay? is now available which traces the "change over time among English-language Islamic State sympathizers on Twitter.

    Intent

    This data set is intended to be a counterpoise to the How Isis Uses Twitter data set. That data set contains 17k tweets alleged to originate with "100+ pro-ISIS fanboys". This new set contains 122k tweets collected on two separate days, 7/4/2016 and 7/11/2016, which contained any of the following terms, with no further editing or selection:

    • isis
    • isil
    • daesh
    • islamicstate
    • raqqa
    • Mosul
    • "islamic state"

    This is not a perfect counterpoise as it almost surely contains a small number of pro-Isis fanboy tweets. However, unless some entity, such as Kaggle, is willing to expend significant resources on a service something like an expert level Mechanical Turk or Zooniverse, a high quality counterpoise is out of reach.

    A counterpoise provides a balance or backdrop against which to measure a primary object, in this case the original pro-Isis data. So if anyone wants to discriminate between pro-Isis tweets and other tweets concerning Isis you will need to model the original pro-Isis data or signal against the counterpoise which is signal + noise. Further background and some analysis can be found in this forum thread.

    This data comes from postmodernnews.com/token-tv.aspx which daily collects about 25MB of Isis tweets for the purposes of graphical display. PLEASE NOTE: This server is not currently active.

    Data Details

    There are several differences between the format of this data set and the pro-ISIS fanboy dataset. 1. All the twitter t.co tags have been expanded where possible 2. There are no "description, location, followers, numberstatuses" data columns.

    I have also included my version of the original pro-ISIS fanboy set. This version has all the t.co links expanded where possible.

  19. Data from: Dataset: tweets and analysis related to the paper 'Signaling...

    • ssh.datastations.nl
    • datacatalogue.cessda.eu
    bin, csv, pdf, txt +2
    Updated Jun 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANS Data Station Social Sciences and Humanities (2017). Dataset: tweets and analysis related to the paper 'Signaling sarcasm: From hyperbole to hashtag' [Dataset]. http://doi.org/10.17026/dans-2ce-mcr3
    Explore at:
    zip(21511), xlsx(43028), pdf(56107), bin(144), txt(7549586), csv(1969)Available download formats
    Dataset updated
    Jun 8, 2017
    Dataset provided by
    Data Archiving and Networked Services
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Date: Collection period: start=2013-06-01; end=2013-06-30;

  20. r

    Raw data on use on #neoEBM on Twitter: 2018-2021

    • researchdata.edu.au
    • adelaide.figshare.com
    Updated Apr 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Keir (2021). Raw data on use on #neoEBM on Twitter: 2018-2021 [Dataset]. http://doi.org/10.25909/14329754.V1
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    The University of Adelaide
    Authors
    Amy Keir
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Raw data describing the use of the hashtag #neoEBM on Twitter (social media platform) by numbers of Twitter users and use of the hashtag (monthly).

    The datasheet includes the top 20 users of the hashtag and details about each one (publicly available information about each of these users available on the social media platform).
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hubert Wassner (2016). Twitter Friends [Dataset]. https://www.kaggle.com/hwassner/TwitterFriends/discussion
Organization logo

Twitter Friends

40k full Twitter user profile data (including who they follow!)

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hubert Wassner
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Twitter Friends and hashtags

Context

This datasets is an extract of a wider database aimed at collecting Twitter user's friends (other accound one follows). The global goal is to study user's interest thru who they follow and connection to the hashtag they've used.

Content

It's a list of Twitter user's informations. In the JSON format one twitter user is stored in one object of this more that 40.000 objects list. Each object holds :

  • avatar : URL to the profile picture

  • followerCount : the number of followers of this user

  • friendsCount : the number of people following this user.

  • friendName : stores the @name (without the '@') of the user (beware this name can be changed by the user)

  • id : user ID, this number can not change (you can retrieve screen name with this service : https://tweeterid.com/)

  • friends : the list of IDs the user follows (data stored is IDs of users followed by this user)

  • lang : the language declared by the user (in this dataset there is only "en" (english))

  • lastSeen : the time stamp of the date when this user have post his last tweet.

  • tags : the hashtags (whith or without #) used by the user. It's the "trending topic" the user tweeted about.

  • tweetID : Id of the last tweet posted by this user.

You also have the CSV format which uses the same naming convention.

These users are selected because they tweeted on Twitter trending topics, I've selected users that have at least 100 followers and following at least 100 other account (in order to filter out spam and non-informative/empty accounts).

Acknowledgements

This data set is build by Hubert Wassner (me) using the Twitter public API. More data can be obtained on request (hubert.wassner AT gmail.com), at this time I've collected over 5 milions in different languages. Some more information can be found here (in french only) : http://wassner.blogspot.fr/2016/06/recuperer-des-profils-twitter-par.html

Past Research

No public research have been done (until now) on this dataset. I made a private application which is described here : http://wassner.blogspot.fr/2016/09/twitter-profiling.html (in French) which uses the full dataset (Millions of full profiles).

Inspiration

On can analyse a lot of stuff with this datasets :

  • stats about followers & followings
  • manyfold learning or unsupervised learning from friend list
  • hashtag prediction from friend list

Contact

Feel free to ask any question (or help request) via Twitter : @hwassner

Enjoy! ;)

Search
Clear search
Close search
Google apps
Main menu