34 datasets found

Twitter Friends
kaggle.com
Updated Sep 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hubert Wassner (2016). Twitter Friends [Dataset]. https://www.kaggle.com/hwassner/TwitterFriends/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hubert Wassner
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Twitter Friends and hashtags

Context

This datasets is an extract of a wider database aimed at collecting Twitter user's friends (other accound one follows). The global goal is to study user's interest thru who they follow and connection to the hashtag they've used.

Content

It's a list of Twitter user's informations. In the JSON format one twitter user is stored in one object of this more that 40.000 objects list. Each object holds :

avatar : URL to the profile picture

followerCount : the number of followers of this user

friendsCount : the number of people following this user.

friendName : stores the @name (without the '@') of the user (beware this name can be changed by the user)

id : user ID, this number can not change (you can retrieve screen name with this service : https://tweeterid.com/)

friends : the list of IDs the user follows (data stored is IDs of users followed by this user)

lang : the language declared by the user (in this dataset there is only "en" (english))

lastSeen : the time stamp of the date when this user have post his last tweet.

tags : the hashtags (whith or without #) used by the user. It's the "trending topic" the user tweeted about.

tweetID : Id of the last tweet posted by this user.

You also have the CSV format which uses the same naming convention.

These users are selected because they tweeted on Twitter trending topics, I've selected users that have at least 100 followers and following at least 100 other account (in order to filter out spam and non-informative/empty accounts).

Acknowledgements

This data set is build by Hubert Wassner (me) using the Twitter public API. More data can be obtained on request (hubert.wassner AT gmail.com), at this time I've collected over 5 milions in different languages. Some more information can be found here (in french only) : http://wassner.blogspot.fr/2016/06/recuperer-des-profils-twitter-par.html

Past Research

No public research have been done (until now) on this dataset. I made a private application which is described here : http://wassner.blogspot.fr/2016/09/twitter-profiling.html (in French) which uses the full dataset (Millions of full profiles).

Inspiration

On can analyse a lot of stuff with this datasets :

stats about followers & followings

manyfold learning or unsupervised learning from friend list

hashtag prediction from friend list

Contact

Feel free to ask any question (or help request) via Twitter : @hwassner

Enjoy! ;)
Twitter Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jan 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 8, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
u
Data from: Exploratory Twitter hashtag analysis of movie premieres in the...
portalcientificovalencia.univeuropea.com
Updated 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Exploratory Twitter hashtag analysis of movie premieres in the USA [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed1aea56d4af0485dad
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Area covered
United States
Description
This work is an exploratory, quantitative, and not experimental study with an inductive inference type and a longitudinal follow-up. It analyzes movie data and tweets published by users using the official Twitter hashtags of movie premieres the week before, the same week, and the week after each release date.The scope of the study is the collection of movies released in February 2022 in the USA, and the object of the study includes them and the tweets that refer to the film in the 3 closest weeks to their premiere dates. The tweets recollected were classified by the week they were published, so they are classified by a time dimension called timepoint. The week before the release date has been designated as timepoint 1, the week of the release date is timepoint 2, and the week immediately afterward is timepoint 3. Another dimension that has been considered is if the movie has domestic production or not, which means that if one of the countries of origin is the United States, the movie is designated as domestic.The chosen variables are organized in two data tables, one for the movies and one for the collected tweets.Variables related to the movies:id: Internal id of the moviename: Title of the moviehashtag: Official hashtag of the moviecountries: List of countries of the movie, separated by a semicolonmpaa: Film ratings system by the Motion Picture Association of America. It is a completely voluntary rating system and ratings have no legal standing. The currently rating systems include G (general audiences), PG (parental guidance suggested), PG-13 (parents strongly cautioned), R (restricted, under 17 requires accompanying parent or adult guardian) and NC-17 (no one 17 and under admitted)(Film Ratings - Motion Picture Association, n.d.)genres: List of genres of the movie, e.g., Action or Thriller, separated by a semicolonrelease_date: Release date of the movie in a format YYYY-MM-DDopening_grosses: Amount of USA dollars that the movie obtained on the opening date (the first week after the release date)opening_theaters: Amount of USA theaters that released the movie on the opening date (the first week after the release date)rating_avg: Average rating of the movieVariables related to the tweets:id: Internal id of the tweetstatus_id: Twitter id of the tweetmovie_id: Internal id of the movietimepoint: Week number related to the movie premiere that the tweet was published on. “1” is the week before the movie release, “2” is the week after the movie release” and “3” is the second week after the movie release.author_id: Twitter id of the author of the tweetcreated_at: Date and time of the tweet, with format “YYYY-MM-DD HH:MM:SS”quote_count: Number of the tweet’s quotesreply_count: Number of the tweet’s repliesretweet_count: Number of the tweet’s retweetslike_count: Number of the tweet’s likessentiment: Sentiment analysis of the tweet’s content with a range from -1 (negative) to 1 (positive)This dataset has contributed to the elaboration of the book chapters:Yeste, Víctor; Calduch-Losa, Ángeles (2022). Genre classification of movie releases in the USA: Exploring data with Twitter hashtags. In Narrativas emergentes para la comunicación digital (pp. 1012-1044). Dykinson, S. L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). Exploratory Twitter hashtag analysis of movie premieres in the USA. In Desafíos audiovisuales de la tecnología y los contenidos en la cultura digital (pp. 169-187). McGraw-Hill Interamericana de España S.L.Yeste, Víctor; Calduch-Losa, Ángeles (2022). ANOVA to study movie premieres in the USA and online conversation on Twitter. The case of rating average using data from official Twitter hashtags. In El mapa y la brújula. Navegando por las metodologías de investigación en comunicación (pp. 151-168). Editorial Fragua.
B
COVID-19 Twitter Dataset
borealisdata.ca
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anatoliy Gruzd; Philip Mai (2020). COVID-19 Twitter Dataset [Dataset]. http://doi.org/10.5683/SP2/PXF2CU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/PXF2CU
Dataset updated
Nov 10, 2020
Dataset provided by
Borealis
Authors
Anatoliy Gruzd; Philip Mai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs
P
Homophobia Detection Dataset (Twitter/X) Dataset
paperswithcode.com
Updated May 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Homophobia Detection Dataset (Twitter/X) Dataset [Dataset]. https://paperswithcode.com/dataset/homophobia-detection-dataset-twitter-x
Explore at:
Dataset updated
May 14, 2024
Description
Dataset Description

Paper: TBC Point of Contact: Josh McGiff (Josh.McGiff@ul.ie)

Dataset Summary This dataset was developed to address the significant gap in online hate speech detection, particularly focusing on homophobia, which is often neglected in sentiment analysis research. It comprises tweets scraped from X (formerly Twitter), which have been labeled for the presence of homophobic content by volunteers from diverse backgrounds. This dataset is the largest open-source labelled English dataset for homophobia detection known to the authors and aims to enhance online safety and inclusivity.

Supported Tasks

Task: Homophobic hate speech detection.

Languages English.

Dataset Structure

Data Fields: tweet_text: The text content of the tweet. label: Binary label indicating the presence of homophobic content (0 = no homophobic content, 1 = homophobic content). 'language': The language of the tweet, as tagged by X/Twitter.

Dataset Creation

Curation Rationale: The dataset was curated to enhance the detection and classification of homophobic content on social media platforms, particularly focusing on the gap where homophobia is underrepresented in current research. Source Data: Data was scraped from X (formerly Twitter) focusing on terms and accounts associated with the LGBTQIA+ community. Annotation Process: Annotations were made by three volunteers from different sexualities and gender identities using a majority vote for label assignment. Annotations were conducted in Microsoft Excel over several days. Personal and Sensitive Information: Usernames and other personal identifiers have been anonymized or removed. URLs have also been removed. The dataset contains sensitive content related to homophobia.

Considerations for Using the Data

Social Impact: The dataset is intended for research purposes to combat online hate speech and improve inclusivity and safety on digital platforms. Ethical Considerations: Given the sensitive nature of hate speech, researchers should consider the impact of their work on marginalised communities and ensure that their use of the dataset aims to reduce harm and promote inclusivity. Legal and Privacy Concerns: Researchers should comply with legal standards and ethical guidelines regarding hate speech and data privacy.

Additional Information

License: CC-BY-4.0 Citation: TBC

Acknowledgements This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Artificial Intelligence under Grant No. 18/CRT/6223.
Top 5 sources of place-tagged tweets in our data set.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rudy Arthur; Hywel T. P. Williams (2023). Top 5 sources of place-tagged tweets in our data set. [Dataset]. http://doi.org/10.1371/journal.pone.0218454.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0218454.t002
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rudy Arthur; Hywel T. P. Williams
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There are 18,421,520 tweets in total.
g
Geotagged Twitter posts from the United States: A tweet collection to...
search.gesis.org
datacatalogue.cessda.eu
+1more
Updated Mar 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeffer, Jürgen; Morstatter, Fred (2021). Geotagged Twitter posts from the United States: A tweet collection to investigate representativeness [Dataset]. http://doi.org/10.7802/1166
Explore at:
Unique identifier
https://doi.org/10.7802/1166
Dataset updated
Mar 4, 2021
Dataset provided by
GESIS, Köln
GESIS search
Authors
Pfeffer, Jürgen; Morstatter, Fred
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Area covered
United States
Description
This dataset consists of IDs of geotagged Twitter posts from within the United States. They are provided as files per day and state as well as per day and county. In addition, files containing the aggregated number of hashtags from these tweets are provided per day and state and per day and county. This data is organized as a ZIP-file per month containing several zip-files per day which hold the txt-files with the ID/hash information.

Also part of the dataset are two shapefiles for the US counties and states and Python scripts for the data collection and sorting geotags into counties.
d
Replication Data for Hashtag Co-occurrence Community Detection
dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhu, Xi; Alan M. MacEachren (2023). Replication Data for Hashtag Co-occurrence Community Detection [Dataset]. https://dataone.org/datasets/sha256%3Ae05ed893fdd0f93eb0847738fc1d2d4f8e95fb6ed5d293a95bd5f3fe04cfe1ea
Explore at:
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Zhu, Xi; Alan M. MacEachren
Time period covered
Jan 1, 2016 - Dec 31, 2017
Description
Geotagged public tweets from Twitter streaming API. Date range: January 1, 2016 to December 31, 2017. Data size:4 GB; about 170 million tweets with hashtags. Attributes: Each tweet is associated with a tweet id, timestamp, anonymized user ID, and a list of hashtags.
Squid Game Netflix Twitter Data
kaggle.com
zip
Updated Oct 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deep Contractor (2021). Squid Game Netflix Twitter Data [Dataset]. https://www.kaggle.com/datasets/deepcontractor/squid-game-netflix-twitter-data/versions/6
Explore at:
zip(6803403 bytes)Available download formats
Dataset updated
Oct 16, 2021
Authors
Deep Contractor
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://www.looper.com/img/gallery/the-ending-of-squid-game-season-1-explained/intro-1632168234.jpg" alt="">

The dataset contains the recent tweets about the record-breaking Netflix show "Squid Game"

The data is collected using tweepy Python package to access Twitter API.
f
Selected tweets of a detected migrant who moved from Virginia to New York on...
plos.figshare.com
figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock (2023). Selected tweets of a detected migrant who moved from Virginia to New York on 2014-09-04 based on our approach. [Dataset]. http://doi.org/10.1371/journal.pone.0239408.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0239408.t003
Dataset updated
Jun 14, 2023
Dataset provided by
PLOS ONE
Authors
Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York
Description
Selected tweets of a detected migrant who moved from Virginia to New York on 2014-09-04 based on our approach.
r
2011 UK Riots Tweets
researchdata.edu.au
Updated May 30, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RMIT University, Australia (2017). 2011 UK Riots Tweets [Dataset]. http://doi.org/10.4225/61/593f17d319bc1
Explore at:
Unique identifier
https://doi.org/10.4225/61/593f17d319bc1
Dataset updated
May 30, 2017
Dataset provided by
RMIT University, Australia
License
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
Time period covered
Aug 1, 2011 - Aug 31, 2011
Area covered
United Kingdom
Description
Collection of tweets captured at the time of the 2011 UK Riots. This collection is only partial, retrieved via the streaming API.

The data provides a historical record of public discussion on Twitter during a significant social happening. It also represents a useful resources for experimentation and methodological development. The data is both a social and an informational resource, enabling the analysis of a significant social event and the development/application of computational tools for, among other aims, natural language processing, information retrieval, meta data analysis. In addition to the principle collection of tweets (UK Riots Database), a sub-collection has been extracted that includes only the geo-tagged tweets. Finally, these databases are stored on MongoDB and are made queryable using a special interface (see the UK Riots Database for access and instructions) that allows queries to be stored in another dataset, shared, and re-executed.
Tweets Tagged with #NoJusticeNoLeBron
data.4tu.nl
4tu.edu.hpc.n-helix.com
zip
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timothy J Piper (2023). Tweets Tagged with #NoJusticeNoLeBron [Dataset]. http://doi.org/10.4121/uuid:57b4f590-8e3e-475c-a7c7-56052303d5cb
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:57b4f590-8e3e-475c-a7c7-56052303d5cb
Dataset updated
Oct 29, 2023
Dataset provided by
4TUhttps://www.4tu.nl/
Authors
Timothy J Piper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a table describing Tweets posted from 12/28/2016 through 12/29/2016 that were tagged with #NoJusticeNoLeBron
Performance of the six frequency-based algorithms and our proposed...
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock (2023). Performance of the six frequency-based algorithms and our proposed segment-based algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0239408.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0239408.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Guanghua Chi; Fengyang Lin; Guangqing Chi; Joshua Blumenstock
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
See section “Traditional frequency-based methods” for the details of the six frequency-based methods.
Z
Data from: Detecting East Asian Prejudice on Social Media
data.niaid.nih.gov
zenodo.org
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bertie Vidgen (2024). Detecting East Asian Prejudice on Social Media [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3816666
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
David Broniatowski
Scott Hale
Rebekah Tromble
Matthew Hall
Austin Botelho
Ella Guest
Bertie Vidgen
Helen Margetts
Zeerak Waseem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
East Asia
Description
This repository contains:

A deep learning model which distinguishes between Hostililty against East Asia, Criticism of East Asia, Discussion of East Asian prejudice and Neutral content. The F1 score is 0.83.

A detailed annotation codebook used for marking up the tweets.

A labelled dataset with 20,000 entries.

A dataset with all 40,000 annotations, which can be used to investigate annotation processes for abusive content moderation.

A list of thematic hashtag replacements.

Three sets of annotations for the 1,000 most used hashtags in the original database of COVID-19 related tweets. Hashtags were annotated for COVID-19 relevance, East Asian relevance and stance.

The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostility directed against East Asia. This data repository is for a classifier that detects and categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice and a neutral class. The classifier achieves an F1 score of 0.83 across all four classes. We provide our final model (coded in Python), as well as a new 20,000 tweet training dataset used to make the classifier, two analyses of hashtags associated with East Asian prejudice and the annotation codebook. The classifier can be implemented by other researchers, assisting with both online content moderation processes and further research into the dynamics, prevalence and impact of East Asian prejudice online during this global pandemic.

This work is a collaboration between The Alan Turing Institute and the Oxford Internet Institute. It was funded by the Criminal JusticeTheme of the Alan Turing Institute under Wave 1 of The UKRI Strategic Priorities Fund, EPSRC Grant EP/T001569/1
c
Data from: Twitter corpus Janes-Tweet 1.0
clarin.si
live.european-language-grid.eu
Updated Sep 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikola Ljubešić; Tomaž Erjavec; Darja Fišer (2017). Twitter corpus Janes-Tweet 1.0 [Dataset]. https://www.clarin.si/repository/xmlui/handle/11356/1142
Explore at:
Dataset updated
Sep 5, 2017
Authors
Nikola Ljubešić; Tomaž Erjavec; Darja Fišer
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Janes-Tweet is an annotated corpus of almost 10 million tweets posted from 2013-06 to 2017-06 by approx. 9,000 users that tweet mostly in Slovene. The corpus is structured into individual tweets, together with their metadata. The tweets in the corpus are tokenised, sentence segmented, word normalised, morphosyntactically tagged, lemmatised and annotated with named entities. Due to Twitter terms-of-service, the corpus is distributed in an encoded version. The included tweetpub program (also available and documented on https://github.com/clarinsi/tweetpub) should be used to decode it, which it does by fetching the original tweets and applying a diff operation on the distributed corpus. Note that the retrieved corpus can have fewer tweets than the distributed version if some have been removed from Twitter by their authors in the meantime.
u
Hashtags used by museums Twitter accounts from REMED
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Hashtags used by museums Twitter accounts from REMED [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed1aea56d4af0485daa
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
This study consists of quantitative, explanatory, and non-experimental research using inductive inference longitudinally. Thus, the use of hashtags by the Twitter accounts of the set of museums that are part of REMED is studied, and the analysis of hashtag trends by Twitter users in Spanish is performed.The primary variable is the favorite count, and it is hypothesized from this study that it is possible to predict the primary variable five weeks later. The field of study is formed by the 104 Twitter accounts of the museums that are part of REMED (Red de Museos y Estrategias Digitales).Seven analysis variables explain the information related to the use of hashtags, both in the size of the Twitter accounts of museums of the sample chosen (prefix "m_" in the variables) and Twitter users in Spanish in general (prefix "tw_" in variables). All variables represent the data in count mode, which means that they sum up the total of the data collected for each tweet of each hashtag processed:Number of tweets (variable name "num_tweets")Number of retweets (variable name "retweet_count") Number of favorites (variable name "favorite_count")Number of followers of tweeters (variable name "user_num_followers")Number of tweets published by tweeters (variable name "user_num_tweets")Age in days of tweeters' Twitter accounts (variable name "user_age")Number of tweets including a URL (variable name "url_inclusion")With the variables above, an investigation has been carried out by checking the correlations between the variables and performing a regression analysis. Thus, the relationships between the variables are ascertained and analyzed to determine if it is possible to predict the number of favorites of the hashtags used by museums. The first initial intake is presented in the file cimed-2021-ini.csv, and the intake made 5 weeks later is presented in the file cimed-2021-end.csv.This dataset has contributed to the elaboration of the book chapter:Yeste Moreno, V.; Calduch-Losa, Á.; Serrano-Cobos, J. (2022). Estudio predictivo del uso colectivo de hashtags en museos de la red REMED. En CIMED21 - I Congreso internacional de museos y estrategias digitales. Editorial Universitat Politècnica de València. 251-265. https://doi.org/10.4995/CIMED21.2021.12281
f
An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT
city.figshare.com
html
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ernesto Priego (2023). An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT [Dataset]. http://doi.org/10.6084/m9.figshare.3487103.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3487103.v1
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Ernesto Priego
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe Digital Humanities 2016 conference is taking/took place in Kraków, Poland, between Sunday 11 July and Saturday 16 July 2016. #DH2016 is/was the conference official hashtag.What This Output IsThis is a CSV file containing a total of 3717 Tweets publicly published with the hashtag #DH2016 on Thursday 14 July 2016 GMT.The

archive starts with a Tweet published on Thursday July 14 2016 at 00:01:04 +0000 and ends with a Tweet published on Thursday July 14 2016 at 23:49:14 +0000 (GMT). Previous days have been shared on a different output. A breakdown of Tweets per day so far:Sunday 10 July 2016: 179 TweetsMonday 11 July 2016: 981 TweetsTuesday 12 July 2016: 2318 TweetsWednesday 13 July 2016: 4175 TweetsThursday 14 July 2016: 3717 Tweets Methodology and LimitationsThe Tweets contained in this file were collected by Ernesto Priego using Martin Hawksey's TAGS 6.0. Only users with at least 1 follower were included in the archive. Retweets have been included (Retweets count as Tweets). The collection spreadsheet was customised to reflect the time zone and geographical location of the conference.The profile_image_url and entities_str metadata were removed before public sharing in this archive. Please bear in mind that the conference hashtag has been spammed so some Tweets colllected may be from spam accounts. Some automated refining has been performed to remove Tweets not related to the conference but the data is likely to require further refining and deduplication. Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (Gonzalez-Bailon, Sandra, et al. 2012).Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet tagged with #dh2016 during the indicated period, and the dataset is shared for archival, comparative and indicative educational research purposes only.Only content from public accounts is included and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.Each Tweet and its contents were published openly on the Web with the queried hashtag and are responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually. No private personal information is shared in this dataset. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road. This dataset is shared to archive, document and encourage open educational research into scholarly activity on Twitter. Other ConsiderationsTweets published publicly by scholars during academic conferences are often tagged (labeled) with a hashtag dedicated to the conference in question.The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour. In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies. Professional associations like the Modern Language Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.Beyond individual tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. To date, collecting in real time is the only relatively accurate method to archive tweets at a small scale. Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time.The CC-BY license has been applied to the output in the repository as a curated dataset. Authorial/curatorial/collection work has been performed on the file in order to make it available as part of the scholarly record. The data contained in the deposited file is otherwise freely available elsewhere through different methods and anyone not wishing to attribute the data to the creator of this output is needless to say free to do their own collection and clean their own data.
Tweets Targeting Isis
kaggle.com
zip
Updated Nov 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ActiveGalaXy (2019). Tweets Targeting Isis [Dataset]. https://www.kaggle.com/activegalaxy/isis-related-tweets
Explore at:
zip(10419329 bytes)Available download formats
Dataset updated
Nov 17, 2019
Authors
ActiveGalaXy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The image at the top of the page is a frame from today's (7/26/2016) Isis #TweetMovie from twitter, a "normal" day when two Isis operatives murdered a priest saying mass in a French church. (You can see this in the center left). A selection of data from this site is being made available here to Kaggle users.

UPDATE: An excellent study by Audrey Alexander titled Digital Decay? is now available which traces the "change over time among English-language Islamic State sympathizers on Twitter.

Intent

This data set is intended to be a counterpoise to the How Isis Uses Twitter data set. That data set contains 17k tweets alleged to originate with "100+ pro-ISIS fanboys". This new set contains 122k tweets collected on two separate days, 7/4/2016 and 7/11/2016, which contained any of the following terms, with no further editing or selection:

isis

isil

daesh

islamicstate

raqqa

Mosul

"islamic state"

This is not a perfect counterpoise as it almost surely contains a small number of pro-Isis fanboy tweets. However, unless some entity, such as Kaggle, is willing to expend significant resources on a service something like an expert level Mechanical Turk or Zooniverse, a high quality counterpoise is out of reach.

A counterpoise provides a balance or backdrop against which to measure a primary object, in this case the original pro-Isis data. So if anyone wants to discriminate between pro-Isis tweets and other tweets concerning Isis you will need to model the original pro-Isis data or signal against the counterpoise which is signal + noise. Further background and some analysis can be found in this forum thread.

This data comes from postmodernnews.com/token-tv.aspx which daily collects about 25MB of Isis tweets for the purposes of graphical display. PLEASE NOTE: This server is not currently active.

Data Details

There are several differences between the format of this data set and the pro-ISIS fanboy dataset. 1. All the twitter t.co tags have been expanded where possible 2. There are no "description, location, followers, numberstatuses" data columns.

I have also included my version of the original pro-ISIS fanboy set. This version has all the t.co links expanded where possible.
Data from: Dataset: tweets and analysis related to the paper 'Signaling...
ssh.datastations.nl
datacatalogue.cessda.eu
bin, csv, pdf, txt +2
Updated Jun 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DANS Data Station Social Sciences and Humanities (2017). Dataset: tweets and analysis related to the paper 'Signaling sarcasm: From hyperbole to hashtag' [Dataset]. http://doi.org/10.17026/dans-2ce-mcr3
Explore at:
zip(21511), xlsx(43028), pdf(56107), bin(144), txt(7549586), csv(1969)Available download formats
Unique identifier
https://doi.org/10.17026/dans-2ce-mcr3
Dataset updated
Jun 8, 2017
Dataset provided by
Data Archiving and Networked Services
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Date: Collection period: start=2013-06-01; end=2013-06-30;
r
Raw data on use on #neoEBM on Twitter: 2018-2021
researchdata.edu.au
adelaide.figshare.com
Updated Apr 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy Keir (2021). Raw data on use on #neoEBM on Twitter: 2018-2021 [Dataset]. http://doi.org/10.25909/14329754.V1
Explore at:
Unique identifier
https://doi.org/10.25909/14329754.V1
Dataset updated
Apr 1, 2021
Dataset provided by
The University of Adelaide
Authors
Amy Keir
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data describing the use of the hashtag #neoEBM on Twitter (social media platform) by numbers of Twitter users and use of the hashtag (monthly).

The datasheet includes the top 20 users of the hashtag and details about each one (publicly available information about each of these users available on the social media platform).

Facebook

Twitter

Click to copy link

Link copied

Cite

Hubert Wassner (2016). Twitter Friends [Dataset]. https://www.kaggle.com/hwassner/TwitterFriends/discussion

Twitter Friends

40k full Twitter user profile data (including who they follow!)

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 2, 2016

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Hubert Wassner

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Twitter Friends and hashtags

Context

This datasets is an extract of a wider database aimed at collecting Twitter user's friends (other accound one follows). The global goal is to study user's interest thru who they follow and connection to the hashtag they've used.

Content

It's a list of Twitter user's informations. In the JSON format one twitter user is stored in one object of this more that 40.000 objects list. Each object holds :

avatar : URL to the profile picture
followerCount : the number of followers of this user
friendsCount : the number of people following this user.
friendName : stores the @name (without the '@') of the user (beware this name can be changed by the user)
id : user ID, this number can not change (you can retrieve screen name with this service : https://tweeterid.com/)
friends : the list of IDs the user follows (data stored is IDs of users followed by this user)
lang : the language declared by the user (in this dataset there is only "en" (english))
lastSeen : the time stamp of the date when this user have post his last tweet.
tags : the hashtags (whith or without #) used by the user. It's the "trending topic" the user tweeted about.
tweetID : Id of the last tweet posted by this user.

You also have the CSV format which uses the same naming convention.

These users are selected because they tweeted on Twitter trending topics, I've selected users that have at least 100 followers and following at least 100 other account (in order to filter out spam and non-informative/empty accounts).

Acknowledgements

This data set is build by Hubert Wassner (me) using the Twitter public API. More data can be obtained on request (hubert.wassner AT gmail.com), at this time I've collected over 5 milions in different languages. Some more information can be found here (in french only) : http://wassner.blogspot.fr/2016/06/recuperer-des-profils-twitter-par.html

Past Research

No public research have been done (until now) on this dataset. I made a private application which is described here : http://wassner.blogspot.fr/2016/09/twitter-profiling.html (in French) which uses the full dataset (Millions of full profiles).

Inspiration

On can analyse a lot of stuff with this datasets :

stats about followers & followings
manyfold learning or unsupervised learning from friend list
hashtag prediction from friend list

Contact

Feel free to ask any question (or help request) via Twitter : @hwassner

Enjoy! ;)

Clear search

Close search

Google apps

Main menu

Twitter Friends

Twitter Friends and hashtags

Context

Content

Acknowledgements

Past Research

Inspiration

Contact

Twitter Dataset

Data from: Exploratory Twitter hashtag analysis of movie premieres in the...

COVID-19 Twitter Dataset

Homophobia Detection Dataset (Twitter/X) Dataset

Top 5 sources of place-tagged tweets in our data set.

Geotagged Twitter posts from the United States: A tweet collection to...

Replication Data for Hashtag Co-occurrence Community Detection

Squid Game Netflix Twitter Data

Selected tweets of a detected migrant who moved from Virginia to New York on...

2011 UK Riots Tweets

Tweets Tagged with #NoJusticeNoLeBron

Performance of the six frequency-based algorithms and our proposed...

Data from: Detecting East Asian Prejudice on Social Media

Data from: Twitter corpus Janes-Tweet 1.0

Hashtags used by museums Twitter accounts from REMED

An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT

Tweets Targeting Isis

Context

Intent

Data Details

Data from: Dataset: tweets and analysis related to the paper 'Signaling...

Raw data on use on #neoEBM on Twitter: 2018-2021

Twitter Friends

40k full Twitter user profile data (including who they follow!)

Twitter Friends and hashtags

Context

Content

Acknowledgements

Past Research

Inspiration

Contact