35 datasets found

u
Google Analytics & Twitter dataset from a movies, TV series and videogames...
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
Z
Dataset used in the paper: "Scaling laws and dynamics of hashtags on...
data.niaid.nih.gov
Updated Apr 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diego F. M.Oliveira (2020). Dataset used in the paper: "Scaling laws and dynamics of hashtags on Twitter" [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3673743
Explore at:
Dataset updated
Apr 27, 2020
Dataset provided by
Tristram J. Alexander
Diego F. M.Oliveira
Hongjia H. Chen
Eduardo G. Altmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was used in the manuscript "Scaling laws and dynamics of hashtags on Twitter"..

The Twitter data was obtained from a sample of 10% of all public tweets, provided by the Twitter streaming application programming interface. We extracted the hashtags from each tweet and counted how many times they were used in different time intervals. Time intervals of three different lengths were used: days, hours, and minutes. The tweets were published between November 1st 2015 and November 30th 2016, but not all time intervals between these dates are available.

The four files in this dataset correspond each to one folder (collected using tar). Each folder contains compressed .csv files (compressed using gzip). The content of the .csv files in each folder are:

hashtags_frequency_day.tar Counts of hashtags in each day. The name of each file in the folder indicates the date (GMT). The entries in each file are the hashtag and the count in the interval.

hashtags_frequency_hour.tar Counts of hashtags in each hour. The name of each file in the folder indicates the date (GMT). The entries in each file are the hashtag and the count in the interval.

hashtags_frequency_minutes.tar Counts of hashtags in each minute. The name of each file in the folder indicates the date (GMT, only a fraction of all days is available). The entries in each file are the hashtag and the count in the interval.

number_of_tweets.tar Counts of the number of tweets in each minute. The name of each file in the folder indicates the day. The entries in each file are the minute in the day (GMT) and count of tweets in our dataset.
Media usage in an online minute 2024
statista.com
ai-chatbox.pro
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Media usage in an online minute 2024 [Dataset]. https://www.statista.com/statistics/195140/new-user-generated-content-uploaded-by-users-per-minute/
Explore at:
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
In the digital age, every minute counts as billions of users engage with online platforms worldwide. The year 2024 saw an astounding 251.1 million emails sent, 138.9 million Reels played on Facebook and Instagram, and 5.9 million Google searches conducted every 60 seconds. Social media's continued dominance Social media platforms remain at the forefront of online interactions, with Facebook leading the pack at over three billion monthly active users. The broader Meta ecosystem, including Instagram and WhatsApp, further solidifies its position in the digital landscape. TikTok, a relative newcomer, has rapidly gained traction, generating 186 million downloads in the fourth quarter of 2024 alone. Evolving digital consumption patterns While traditional streaming services like Netflix continue to dominate, with 362,962 hours streamed every minute, the digital media landscape is experiencing shifts in user preferences. Netflix recorded over 300 million paid subscribers worldwide as of the fourth quarter of 2024.
d
Women's March Tweet Ids
dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Littman, Justin; Park, Soomin (2023). Women's March Tweet Ids [Dataset]. http://doi.org/10.7910/DVN/5ZVMOR
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5ZVMOR
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Littman, Justin; Park, Soomin
Time period covered
Dec 19, 2016 - Jan 23, 2017
Description
This dataset contains the tweet ids of 7,275,228 tweets related to the Women's March on January 21, 2017. They were collected between December 19, 2016 and January 23, 2017 from the Twitter API using Social Feed Manager. These tweets were collected using the POST statuses/filter method of the Twitter Stream API. There is a README.txt file containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. When hydrating be aware that: Twitter limits hydration to 900 requests of 100 tweet ids per 15 minute window per set of user credentials. The Twitter API will not return tweets that have been deleted or belong to accounts that have been suspended, deleted, or made private. You should expect a large number of these tweets to be unavailable. For tweets collected from the Twitter filter stream, this is not a complete set of tweets that match the filter. Gaps may exist because: Twitter limits the number of tweets returned by the filter at any point in time. Social Feed Manager stops and starts the Twitter filter stream every 30 minutes. In Social Feed Manager, collecting is turned off while a user is making changes to the collection criteria. There were some operational issues, e.g., network interruptions, during the collection period. Per Twitter’s Developer Policy, tweet ids may be publicly shared; tweets may not. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets. This work is supported by grant #NARDI-14-50017-14 from the National Historical Publications and Records Commission.
f
101 Twitter users
figshare.com
application/x-rar
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hsien-Tsung Chang; Clief Hendro Sengkey; Minh-Khoi Le (2023). 101 Twitter users [Dataset]. http://doi.org/10.6084/m9.figshare.12643865.v2
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12643865.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Hsien-Tsung Chang; Clief Hendro Sengkey; Minh-Khoi Le
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We collected the data of a Twitter user using Tweepy to access the Twitter API. We crawled the list of each user account’s followers. Twitter allowed a request of a maximum of 200 tweets per time window and because of limitations of the Twitter API, we could only make a request every 15 minutes. Next, we obtained the most recent tweets of each user in the study. We extracted the most common hashtags used in the sample tweets and crawled the most recent 50 tweets that contained each hashtag and tweets that mentioned a particular user, for example ’@username.’ Initially, we chose 101 user accounts and documented the attributes of each user’s account (number of followers, a list of followers, and the recent tweets of each follower).
Twitter Geospatial Data
kaggle.com
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yash Dogra (2025). Twitter Geospatial Data [Dataset]. https://www.kaggle.com/datasets/yashdogra/ibhf-jhjbjh
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yash Dogra
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
PLEASE UPVOTE IF YOU FOUND THIS DATASET USEFUL

This dataset comprises seven days of geo-tagged Tweets from the contiguous United States, collected between January 12 and January 18, 2013. Each Tweet includes exact GPS coordinates (longitude and latitude) and a timestamp (hour, minute, second) reported in Central Standard Time (CST). The data is suitable for tasks such as classification, regression, and clustering, offering insights into spatiotemporal trends in social media activity.
H
2016 United States Presidential Election Tweet Ids
dataverse.harvard.edu
txt
Updated Dec 13, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2016). 2016 United States Presidential Election Tweet Ids [Dataset]. http://doi.org/10.7910/DVN/PDI7IN
Explore at:
txt(56709566), txt(1851323), txt(722114), txt(4188), txt(78213975), txt(950000000), txt(118847413), txt(7078), txt(415549), txt(26155191), txt(1986), txt(407094), txt(1888), txt(4460), txt(1578), txt(13123), txt(2897), txt(6853), txt(158529008), txt(60316032), txt(68984611), txt(2256), txt(3740), txt(60480838), txt(3985), txt(1894)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/PDI7IN
Dataset updated
Dec 13, 2016
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
This dataset contains the tweet ids of approximately 280 million tweets related to the 2016 United States presidential election. They were collected between July 13, 2016 and November 10, 2016 from the Twitter API using Social Feed Manager. These tweet ids are broken up into 12 collections. Each collection was collected either from the GET statuses/user_timeline method of the Twitter REST API or the POST statuses/filter method of the Twitter Stream API. The collections are: Candidates and key election hashtags (Twitter filter): election-filter[1-6].txt Democratic candidates (Twitter user timeline): democratic-candidate-timelines.txt Democratic Convention (Twitter filter): democratic-convention-filter.txt Democratic Party (Twitter user timeline): democratic-party-timelines.txt Election Day (Twitter filter): election-day.txt First presidential debate (Twitter filter): first-debate.txt GOP Convention (Twitter filter): republican-convention-filter.txt Republican candidates (Twitter user timeline): republican-candidate-timelines.txt Republican Party (Twitter user timeline): republican-party-timelines.txt Second presidential debate (Twitter filter): second-debate.txt Third presidential debate (Twitter filter): third-debate.txt Vice Presidential debate (Twitter filter): vp-debate.txt There is also a README.txt file for each collection containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. When hydrating be aware that: Twitter limits hydration to 900 requests of 100 tweet ids per 15 minute window per set of user credentials. This works out to 8,640,000 tweets per day, so hydrating this entire dataset will take 32 days. The Twitter API will not return tweets that have been deleted or belong to accounts that have been suspended, deleted, or made private. You should expect a large number of these tweets to be unavailable. There may be duplicate tweets across collections. Also, according to the Twitter documentation, duplicate tweets are possible for tweets collected from the Twitter filter stream. For tweets collected from the Twitter filter stream, this is not a complete set of tweets that match the filter. Gaps may exist because: Twitter limits the number of tweets returned by the filter at any point in time. Social Feed Manager stops and starts the Twitter filter stream every 30 minutes. In Social Feed Manager, collecting is turned off while a user is making changes to the collection criteria. There were some operational issues, e.g., network interruptions, during the collection period. Since some of the terms used to collect from the Twitter filter stream were broad (e.g., “election”), it may contain tweets from elections other than the U.S. presidential election, including state elections, local elections, or elections in other countries. Per Twitter’s Developer Policy, tweet ids may be publicly shared; tweets may not. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets. This work is supported by grant #NARDI-14-50017-14 from the National Historical Publications and Records Commission.
U.S. users daily engagement with leading social media platforms 2023
statista.com
ai-chatbox.pro
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. users daily engagement with leading social media platforms 2023 [Dataset]. https://www.statista.com/statistics/1301075/us-daily-time-spent-social-media-platforms/
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
According to a survey conducted in June 2023, adults in the United States spent more time per day on TikTok than on any other leading social media platform. Overall, respondents reported spending an average of 53.8 minutes per day on the social video app. YouTube and Twitter ranked second and third, each with an average of 48 minutes and 34 minutes spent on the platforms per day, respectively.

U.S. teens have time for certain platforms

Different social media platforms attract different demographics, with teenagers in the United States being more drawn to TikTok and YouTube over Facebook. In 2023, teenagers in the United States spent an average of almost two hours on YouTube and 1.5 hours on TikTok every day, 1451257 while Facebook was used by teens for less than half an hour per day. Furthermore, social media habits differ between genders, as teen girls were more likely to spend more time than boys on Instagram.

TikTok is king for teens and Gen Z

Although spending 1.5 hours on the Generation Z app of choice may sound rather modest, some TikTok users devote much more of their time to the platform . According to a survey conducted in the United States in 2022, around eight percent of teenagers in the United States spent over five hours a day on TikTok. 1417187 whereas another 22 percent reported spending between two and three hours daily on the video-based app.
Self-Reported Myers-Briggs Personality Types on Twitter
figshare.com
txt
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua Watt (2023). Self-Reported Myers-Briggs Personality Types on Twitter [Dataset]. http://doi.org/10.6084/m9.figshare.23620554.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23620554.v1
Dataset updated
Jul 3, 2023
Dataset provided by
figshare
Authors
Joshua Watt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We collected the data for our analysis by utilising the academic Twitter API (V2). The four-letter acronyms associated with the Myers-Briggs Type Indicator (MBTI) give people a short categorisation of their personality that is easily self-reported on social media in the form of a regular expression. As a result, people are much more likely to self-report their categorical MBTI rather than other personality types. The four letter MBTI acronyms are also unique to the Myers-Briggs questionnaire, meaning they can be easily queried using the Twitter API. This also means these personality types won't be confused with any other acronym or word, reducing the likelihood we incorrectly classify any users. When we initially explored Twitter, we found that some users self-reported their personality type in their biography and other users would self-report their personality types in their tweets. As a result, we formulated two methods for querying and labelling the Myers-Briggs personality type of accounts. We describe the two methods below:

Firstly, we used Tweepy's 'search_users' endpoint to obtain the set of users who currently self-report their MBTI in their username or biography. Due to the rate limits associated with this endpoint we were limited to obtaining no more than 1000 users for each unique search query. Secondly, we used the Twitter API's 'full_archive_search' endpoint to obtain the set of users who self-reported their Myers-Briggs personality type in a Tweet since Twitter's creation (March 26, 2006). We searched for users who tweeted any of the three regular expressions, followed by their personality type: 'I am...', 'I am a...' or 'I am an...'. Note that we only searched for self-reports in Tweets and excluded Retweets, Quotes and Replies in our query due to these having a much higher potential of incorrectly labelling an account. Furthermore, we were bound by rate limits of 300 requests per 15-minute window, however there were no hard bounds on the number of tweets or users we could obtain. As a result, we ran this query for each personality type until the search was exhausted.

Note that in both cases, the queries were not case-sensitive. In the attached dataset, we provide both the Twitter User IDs and the Myers-Briggs Personality Types associated with the 68,958 users obtained using the two methods discussed above. We provide this dataset prior to any preprocessing steps performed in our paper.
e
Tweets from the RER B line
data.europa.eu
csv/utf8, html
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antoine Augusti (2025). Tweets from the RER B line [Dataset]. https://data.europa.eu/data/datasets/5c092a3a8b4c4167c9c66089/?locale=en
Explore at:
csv/utf8, htmlAvailable download formats
Dataset updated
Apr 18, 2025
Dataset authored and provided by
Antoine Augusti
License
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Description
Public tweets from the account @rerb on Twitter.

Description of columns

— ‘ID’: Unique identifier of the tweet — ‘created_at’: Date, time, minute, seconds of tweet in UTC time zone — ‘text’: Tweet text — ‘retweet_count’: Number of RT of tweet — ‘favorite_count’: Number of times the tweet was added in the favorites — ‘tweet_mentionne_excuse’: Does the tweet mention the word “excuse” (0 or 1) — ‘tweet_mentionne_regulation’: Does the tweet mention the word “regulation” (0 or 1) — ‘tweet_mentionne_bon_courage’: Does the tweet mention the word “good courage” (0 or 1)
Trump-related tweets (US Election Day 2020)
kaggle.com
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YewLee Wong (2020). Trump-related tweets (US Election Day 2020) [Dataset]. https://www.kaggle.com/wyewlee/trumprelated-tweets-us-election-day-2020/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 9, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
YewLee Wong
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
[READ THIS FIRST! DATASETS FOR Academic/Learning/Non-commercial purpose]

Context

US Election 2020 is very interesting to look into as it is an election in the middle of a pandemic. Me and my teammate created a twitter crawler using Twitter API and Tweepy for my Artificial Intelligence coursework. We chose Donald Trump as a subject of interest as President Trump was known for his twitter interaction.

I decided to deploy my crawler on post-voting day to conduct a sentiment analysis.

Tweet text in this datasets is suitable for Sentiment Analysis usage.

Content

This raw datasets is crawled using Tweepy library and Twitter API. 2500 tweets were gathered per 15 minutes. There are total of 247,500 row of entries and 13 columns, with the total of 3,217,500 cells of data. Data cleaning is needed to perform before doing any analysis.

Datasets date range: 4th November 2020 - 11th November 2020 Tweets with "Trump", "DonalTrump", "realDonalTrump" were capture.

(The User = user of the particular row) username: Twitter User handle accDesc: Description of the user on profile location: Location of the tweet following: Total number of account the user is following followers: Total number of followers of the user totaltweets: Total tweets created of the user usercreated: Date of the user registered his/her Twitter account tweetcreated: Date of the tweet created favouritecount: tweet <3 count (equivalent to like on Facebook) retweetcount: Total tweet's retweet (equivalent to share on Facebook) text: Text body of the tweet tweetsource: Device used to create this tweet hashtags: hashtag of the tweet in JSON format

Acknowledgements and Disclaimers

Banner and thumbnail courtesy of > visuals < from unsplash.com

Much thanks to my teammate Jiacheng Loh and ChenZhen Li for the efforts.

Please do not use this datasets for any malicious attempts, any damage done is not under the responsible of me.

This datasets were gathered for the purpose of learning and not for commercial purposes.

Data were public in the public domain, therefore i assume these data is open for all.

Limitations

Datasets are gathered with at least 15 minutes interval, therefore datecreated distribution is not equal and may not include all tweets created within the date range.
#IndiaNeedsOxygen Tweets
kaggle.com
opendatabay.com
zip
Updated Nov 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2021). #IndiaNeedsOxygen Tweets [Dataset]. https://www.kaggle.com/kaushiksuresh147/indianeedsoxygen-tweets
Explore at:
zip(4441094 bytes)Available download formats
Dataset updated
Nov 14, 2021
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
India marks one COVID-19 death every 5 minutes

https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">

Content

People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.

For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.

India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.

https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">

Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source

Dataset

The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.

The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |

Acknowledgements

https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters

Inspiration

The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.

And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
Data from: Temporal Validity Change Prediction - Dataset
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georg Wenzel; Georg Wenzel (2025). Temporal Validity Change Prediction - Dataset [Dataset]. http://doi.org/10.5281/zenodo.8340858
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8340858
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georg Wenzel; Georg Wenzel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns.

target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text.

follow_up - A synthetic follow-up tweet that semantically relates to the target tweet.

context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation.

combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet.

change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet.

The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution:
[no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month]

Different dataset splits are provided.

"dataset.csv" contains the full dataset.

"train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split.

"train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split.
Social TV : nombre de tweets par minute selon la tranche horaire en France...
ai-chatbox.pro
fr.statista.com
Updated Aug 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Social TV : nombre de tweets par minute selon la tranche horaire en France 2012-2013 [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstatistiques%2F623126%2Fsocial-tv-nombre-de-tweet-par-tranche-horaire-twitter-france%2F%23XgboD02vawLKoDs%2BT%2BQLIV8B6B4Q9itA
Explore at:
Dataset updated
Aug 18, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
France
Description
Ce diagramme représente les tranches horaires où le nombre de commentaires à la minute à propos d'une émission est le plus important sur Twitter en France entre 2012 et 2013. On constate que 21 heures était l'heure où le plus d'utilisateurs ont posté des tweets.
m
Tracking the Global Pulse: The first public Twitter dataset from FIFA World...
data.mendeley.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kheir eddine daouadi (2025). Tracking the Global Pulse: The first public Twitter dataset from FIFA World Cup [Dataset]. http://doi.org/10.17632/gw3mcnbkwr.2
Explore at:
Unique identifier
https://doi.org/10.17632/gw3mcnbkwr.2
Dataset updated
May 27, 2025
Authors
kheir eddine daouadi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
The first public large-scale multilingual Twitter dataset related to the FIFA World Cup 2022, comprising over 28 million posts in 69 unique spoken languages, including Arabic, English, Spanish, French, and many others. This dataset aims to facilitate research in future sentiment analysis, cross-linguistic studies, event-based analytics, meme and hate speech detection, fake news detection, and social manipulation detection.

The file 🚨Qatar22WC.csv🚨 🚀Codebook for | Column Name | Description| |-------------------------------- | day, | hou, | age_of_the_user_account | tweet_count | location | follower_count | following_count | follower_to_Following | favouite_count | verified | Avg_tweet_count | list_count | Tweet_Id | is_reply_tweet | is_quote | retid | lang | hashtags | is_image, | is_video |------------------------ contains tweet-level and user-level metadata for our collected tweets. FIFA World Cup 2022 Twitter Dataset🚀 |----------------------------------------------------------------------------------------| month, year | The date where the tweet posted | min, sec | Hour, minute, and second of tweet timestamp | | User Account age in days | | Total number of tweets posted by the user | | User-defined location field | | Number of followers the user has | | Number of accounts the user is following | | Follower-following ratio | | Number of likes the user did| | Boolean indicating if the user is verified (1 = Verified, 0 = Not Verified) | | Average tweets per day for the user activity| | Number of lists the user is a member | | Tweet ID | | ID of the tweet being replied to (if applicable) | | boolean representing if the tweet is a quote | | Retweet ID if it's a retweet; NaN otherwise | | Language of the tweet | | The keyword or hashtag used to collect the tweet | | Boolean indicating if the tweet associated with image| | Boolean indicating if the tweet associated with video | -------|----------------------------------------------------------------------------------------|

Examples of use case queries are described in the file 🚨fifa_wc_qatar22_examples_of_use_case_queries.ipynb🚨 and accessible via: https://github.com/khairied/Qata_FIFA_World_Cup_22

🚀 Please Cite This as: Daouadi, K. E., Boualleg, Y., Guehairia, O. & Taleb-Ahmed, A. (2025). Tracking the Global Pulse: The first public Twitter dataset from FIFA World Cup, Journal of Computational Social Science.
Twitter users in the United States 2019-2028
statista.com
ai-chatbox.pro
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jun 13, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
f
File S1 - Rising Tides or Rising Stars?: Dynamics of Shared Attention on...
plos.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu-Ru Lin; Brian Keegan; Drew Margolin; David Lazer (2023). File S1 - Rising Tides or Rising Stars?: Dynamics of Shared Attention on Twitter during Media Events [Dataset]. http://doi.org/10.1371/journal.pone.0094093.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0094093.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Yu-Ru Lin; Brian Keegan; Drew Margolin; David Lazer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supporting figures and table. Figure S1, Tweet volume per minute. Number of tweets per minute in the 12 datasets. (a–d) The six hours during the four debate events (“DEB”). For other categories, we plot the six hour volume centering around the peak within the data range: (e–h) Normal period prior to the debate evenings (“PRE”). (i,j) National convention events including RNC and DNC (“CONV”). (k,l) Breaking political news events including Benghazi attack and Romney's 47-percent video (“NEWS”). Figure S2, Changes in communication volume. Diamond shapes indicate the mean value of each category. This figure shows the ratio of tweets mentioning a user to the total tweets at the peak hour. Figure S3, Lorentz curves for cumulative degree distributions of activity. Increasing equality converges toward diagonal line from the origin to the upper-right and increasing inequality converges toward a hyperbola rising to 100% of volume at the 100th percentile. Figure S4, Connectivity-concentration state spaces. For each of the twelve observed events, the Gini coefficient for the network's degree distribution is plotted on the -axis and the average degree of the network is plotted on the -axis. Table S1, Kolmogorov-Smirnov test (K-S test) for comparing the PRE curves with the remaining three curves in other conditions. (PDF)
z
Data from: Spanish Twitter Dataset for Pride Day (2015–2024)
zenodo.org
csv
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
María del Mar Ramiro Ortega; Samer Hassan; Samer Hassan; María del Mar Ramiro Ortega (2025). Spanish Twitter Dataset for Pride Day (2015–2024) [Dataset]. http://doi.org/10.5281/zenodo.15639492
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15639492
Dataset updated
Jun 16, 2025
Dataset provided by
Universidad Complutense de Madrid
Authors
María del Mar Ramiro Ortega; Samer Hassan; Samer Hassan; María del Mar Ramiro Ortega
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
English Version:

Two datasets are published as part of my Bachelor's final thesis on hate speech, titled Hate Speech on Twitter: Analysis of LGBTIQ-phobia Before and After Elon Musk:

Colectivo.csv: This dataset contains 653,000 tweets in Spanish related to the LGBTIQ+ community, collected using specific keywords. The tweets correspond to each June 28th of every year from 2015 to 2024.

Aleatorio.csv: This dataset includes 395,000 random tweets in Spanish, obtained through a selection of keywords. The tweets represent a 6-minute sample from every hour, corresponding to each June 28th from 2015 to 2024.

Both datasets aim to provide a detailed view of interactions on Twitter on the specified days.

The columns include: id, createdAt, source, lang, retweetCount, replyCount, likeCount, quoteCount, viewCount, bookmarkCount, isReply, conversationId, author_verified, author_blue_verified, author_followers, author_following, author_tweets, author_createdAt, hashtags, author_isAutomated, author_fastFollowersCount, author_favouritesCount, texto_analisis, toxicity, severe_toxicity, identity_attack, insult, profanity, threat. The 'texto_analisis' column contains the content of the tweet, with all user mentions removed to comply with privacy regulations such as GDPR. The 'toxicity', 'severe_toxicity', 'identity_attack', 'insult', 'profanity', and 'threat' columns have values ranging from 0 to 1, where 0 indicates the attribute is not present and 1 indicates it is strongly present. The 'createdAt' column represents the tweet's publication date.

For further details, you can find the code for processing and analysis in the project's GitHub repository.

Acknowledgements

We would like to acknowledge the use of tools and support provided by twitterapi.io for data extraction, as well as the Perspective API, which played a crucial role in analyzing tweet toxicity. These resources were indispensable for the successful completion of this project.

Versión en Español:

Se publican dos conjuntos de datos como parte de mi trabajo de fin de grado (TFG) sobre el discurso de odio, titulado Discurso de odio en Twitter: Análisis de la LGTBIQ-fobia antes y después de Elon Musk:

Colectivo.csv: Este conjunto de datos contiene 653,000 tuits en español relacionados con la comunidad LGTBIQ+, recopilados mediante el uso de palabras clave. Los tuits corresponden a cada 28 de junio de cada año, desde 2015 hasta 2024.

Aleatorio.csv: Este conjunto de datos incluye 395,000 tuits aleatorios en español, obtenidos a partir de una selección de palabras clave. Los tuits representan una muestra de 6 minutos de cada hora, correspondiente a cada 28 de junio, desde 2015 hasta 2024.

Ambos conjuntos de datos tienen como objetivo proporcionar una visión detallada de las interacciones en Twitter en los días señalados.

Las columnas incluyen: id, createdAt, source, lang, retweetCount, replyCount, likeCount, quoteCount, viewCount, bookmarkCount, isReply, conversationId, author_verified, author_blue_verified, author_followers, author_following, author_tweets, author_createdAt, hashtags, author_isAutomated, author_fastFollowersCount, author_favouritesCount, texto_analisis, toxicity, severe_toxicity, identity_attack, insult, profanity, threat. La columna 'texto_analisis' contiene el contenido del tuit, una vez eliminadas todas las menciones a usuarios para cumplir con las normativas de privacidad, como la GDPR. Las columnas 'toxicity', 'severe_toxicity', 'identity_attack', 'insult', 'profanity' y 'threat' tienen valores que van del 0 al 1, donde 0 indica que el atributo no está presente y 1 indica que está muy presente. La columna 'createdAt' representa la fecha de publicación del tuit.

Para más detalles, puede consultar el código de procesamiento y análisis de los datos en el repositorio de GitHub del proyecto.

Agradecimientos

Queremos agradecer el apoyo y las herramientas proporcionadas por twitterapi.io para la extracción de datos, así como la Perspective API, que jugó un papel crucial en el análisis de la toxicidad de los tuits. Estos recursos fueron indispensables para la realización exitosa de este proyecto.
f
Minute-by-minute EEG, Twitter volume, and TV viewership values for each TV...
plos.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avgusta Y. Shestyuk; Karthik Kasinathan; Viswajith Karapoondinott; Robert T. Knight; Ram Gurumoorthy (2023). Minute-by-minute EEG, Twitter volume, and TV viewership values for each TV show episode tested. [Dataset]. http://doi.org/10.1371/journal.pone.0214507.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0214507.s001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Avgusta Y. Shestyuk; Karthik Kasinathan; Viswajith Karapoondinott; Robert T. Knight; Ram Gurumoorthy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each spreadsheet in the Excel file corresponds to one of the tested episodes and contains minute-by-minute values Twitter volume (raw), TV viewership (raw), and EEG metrics (pre-processed) associated with Attention, Motivation, and Memory, as well as the composite EEG metric (the average of the other three metrics). Ad breaks and missing values are indicated in the Notes field on each spreadsheet. (XLSX)
f
Twits per minute, 20N elections in Spain
figshare.com
txt
Updated Mar 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan J. Merelo (2016). Twits per minute, 20N elections in Spain [Dataset]. http://doi.org/10.6084/m9.figshare.2064069.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2064069.v1
Dataset updated
Mar 2, 2016
Dataset provided by
figshare
Authors
Juan J. Merelo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain
Description
A set of tuits of a few hours during a day, 26 Oct 2011. These were tuits that either mentioned Rajoy or Rubalcaba. The set includes files with times for all tuits, and processed number of tuits, total and per minute.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8

Google Analytics & Twitter dataset from a movies, TV series and videogames website

Explore at:

Dataset updated

2024

Authors

Yeste, Víctor; Yeste, Víctor

Description

Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio

Clear search

Close search

Google apps

Main menu

Google Analytics & Twitter dataset from a movies, TV series and videogames...

Dataset used in the paper: "Scaling laws and dynamics of hashtags on...

Media usage in an online minute 2024

Women's March Tweet Ids

101 Twitter users

Twitter Geospatial Data

2016 United States Presidential Election Tweet Ids

U.S. users daily engagement with leading social media platforms 2023

Self-Reported Myers-Briggs Personality Types on Twitter

Tweets from the RER B line

Description of columns

Trump-related tweets (US Election Day 2020)

Context

Content

Acknowledgements and Disclaimers

Limitations

#IndiaNeedsOxygen Tweets

India marks one COVID-19 death every 5 minutes

Content

Dataset

Acknowledgements

Inspiration

Data from: Temporal Validity Change Prediction - Dataset

Social TV : nombre de tweets par minute selon la tranche horaire en France...

Tracking the Global Pulse: The first public Twitter dataset from FIFA World...

Twitter users in the United States 2019-2028

File S1 - Rising Tides or Rising Stars?: Dynamics of Shared Attention on...

Data from: Spanish Twitter Dataset for Pride Day (2015–2024)

English Version:

Versión en Español:

Minute-by-minute EEG, Twitter volume, and TV viewership values for each TV...

Twits per minute, 20N elections in Spain

Google Analytics & Twitter dataset from a movies, TV series and videogames website