65 datasets found
  1. f

    Data_Sheet_5_What Does Twitter Say About Self-Regulated Learning? Mapping...

    • frontiersin.figshare.com
    txt
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Khalil; Gleb Belokrys (2023). Data_Sheet_5_What Does Twitter Say About Self-Regulated Learning? Mapping Tweets From 2011 to 2021.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2022.820813.s005
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Frontiers
    Authors
    Mohammad Khalil; Gleb Belokrys
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Social network services such as Twitter are important venues that can be used as rich data sources to mine public opinions about various topics. In this study, we used Twitter to collect data on one of the most growing theories in education, namely Self-Regulated Learning (SRL) and carry out further analysis to investigate What Twitter says about SRL? This work uses three main analysis methods, descriptive, topic modeling, and geocoding analysis. The searched and collected dataset consists of a large volume of relevant SRL tweets equal to 54,070 tweets between 2011 and 2021. The descriptive analysis uncovers a growing discussion on SRL on Twitter from 2011 till 2018 and then markedly decreased till the collection day. For topic modeling, the text mining technique of Latent Dirichlet allocation (LDA) was applied and revealed insights on computationally processed topics. Finally, the geocoding analysis uncovers a diverse community from all over the world, yet a higher density representation of users from the Global North was identified. Further implications are discussed in the paper.

  2. Data from: Mapping English-Language AI Research Controversies on Twitter,...

    • beta.ukdataservice.ac.uk
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noortje Suzanne Marres (2025). Mapping English-Language AI Research Controversies on Twitter, 2022 [Dataset]. http://doi.org/10.5255/ukda-sn-857742
    Explore at:
    Dataset updated
    2025
    Dataset provided by
    DataCitehttps://www.datacite.org/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Noortje Suzanne Marres
    Description

    This submission consists of 12 data sets containing Twitter IDs pertaining to 6 AI controversies identified by UK-based experts in AI and Society as especially significant during the period 2012-2021. The data sets were collected by researchers at the University of Warwick as part of the 3-year international project “Shaping AI” which mapped controversies about “Artificial Intelligence” (AI) during 2012-2022. Research teams in the UK, France, Germany and Canada analysed controversies about AI in their countries across different spheres: research, policy and the media during this 10-year period. The UK team at the University of Warwick designed and undertook an analysis of research controversies about AI in the relevant period following a standpoint methodology. Our study began with an online consultation that took place in the Autumn of 2021, in which we asked UK-based experts in AI from across disciplines to identify what are the most important concerns, disputes and problematics that have arisen in the last 10 years in relation to AI as a strategic area of research.

    Based on the responses to this expert consultation—described in detail in Marres et al (2024) and Poletti et al (forthcoming)—we identified a broad range of relevant controversy topics, objects and problems. To select controversies for further analysis, we considered their research intensity, in the form of a frequency count of research publications mentioned by respondents in relation to controversy topics.

    On this basis, we selected 6 AI research controversies for further research: COMPAS; NHS+Deepmind; Gaydar; Facial recognition; Stochastic Parrots (LLMs) & Deeplearning as a solution for AI. For each of these controversies, we collected Twitter data by submitting queries to Twitter's academic API using TWARC between January 2022 and June 2022. Further details of the methods of data collection and curation can be found in the methods file with further detail of the queries in the ReadMe file.

  3. Data from: GeoCoV19: A Dataset of Hundreds of Millions of Multilingual...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Qazi; Muhammad Imran; Muhammad Imran; Ferda Ofli; Ferda Ofli; Umair Qazi (2020). GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information [Dataset]. http://doi.org/10.5281/zenodo.3878599
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Umair Qazi; Muhammad Imran; Muhammad Imran; Ferda Ofli; Ferda Ofli; Umair Qazi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.

  4. i

    Data from: GeoCoV19: A Dataset of Hundreds of Millions of Multilingual...

    • ieee-dataport.org
    Updated Jun 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Imran (2020). GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information [Dataset]. https://ieee-dataport.org/open-access/geocov19-dataset-hundreds-millions-multilingual-covid-19-tweets-location-information
    Explore at:
    Dataset updated
    Jun 24, 2020
    Authors
    Muhammad Imran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present GeoCoV19

  5. World - Twitter Sentiment By Country

    • kaggle.com
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Jiang (2020). World - Twitter Sentiment By Country [Dataset]. https://www.kaggle.com/wjia26/twittersentimentbycountry/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    William Jiang
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    World
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1041505%2F0625876b77e55a56422bb5a37d881e0d%2Fawdasdw.jpg?generation=1595666545033847&alt=media" alt="">

    Introduction

    Ever wondered what people are saying about certain countries? Whether it's in a positive/negative light? What are the most commonly used phrases/words to describe the country? In this dataset I present tweets where a certain country gets mentioned in the hashtags (e.g. #HongKong, #NewZealand). It contains around 150 countries in the world. I've added an additional field called polarity which has the sentiment computed from the text field. Feel free to explore! Feedback is much appreciated!

    Content

    Each row represents a tweet. Creation Dates of Tweets Range from 12/07/2020 to 25/07/2020. Will update on a Monthly cadence. - The Country can be derived from the file_name field. (this field is very Tableau friendly when it comes to plotting maps) - The Date at which the tweet was created can be got from created_at field. - The Search Query used to query the Twitter Search Engine can be got from search_query field. - The Tweet Full Text can be got from the text field. - The Sentiment can be got from polarity field. (I've used the Vader Model from NLTK to compute this.)

    Notes

    There maybe slight duplications in tweet id's before 22/07/2020. I have since fixed this bug.

    Acknowledgements

    Thanks to the tweepy package for making the data extraction via Twitter API so easy.

    Shameless Plug

    Feel free to checkout my blog if you want to learn how I built the datalake via AWS or for other data shenanigans.

    Here's an App I built using a live version of this data.

  6. d

    Replication Data for: Analysing the performance of a location inference...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Serere, Helen Ngonidzashe (2023). Replication Data for: Analysing the performance of a location inference method on various Twitter source distribution [Dataset]. http://doi.org/10.7910/DVN/LOTEGM
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Serere, Helen Ngonidzashe
    Description

    Sample of tweets generated within a USA bounding box between August 2019 and April 2020. The data was used for the paper titled: Enhanced geocoding precision for location inferennce of tweet text using spaCy, Nominatim and Google Maps. A comparative analysis of the influence of data selection. The paper was submitted in PLoS ONE journal. Two datasets have been submitted. 1. Dataset A; Consisting of 133,577 geocoded tweets 2. Dataset B: Consisting of 133,587 geocoded tweets

  7. i

    Coronavirus (COVID-19) Geo-tagged Tweets Dataset

    • ieee-dataport.org
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabindra Lamsal (2025). Coronavirus (COVID-19) Geo-tagged Tweets Dataset [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset
    Explore at:
    Dataset updated
    May 18, 2025
    Authors
    Rabindra Lamsal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    only the tweet IDs are shared. The tweet IDs in this dataset belong to the tweets created providing an exact location.

  8. Twitter Emoji Prediction

    • kaggle.com
    Updated Feb 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HariAS (2019). Twitter Emoji Prediction [Dataset]. https://www.kaggle.com/hariharasudhanas/twitter-emoji-prediction/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2019
    Dataset provided by
    Kaggle
    Authors
    HariAS
    Description

    Content

    Train.csv contains tweets and labels are emojis. You can find the emoji-label mapping in Mapping.csv. Predict emoji's to use for the test set.

    Approaches

    Best method among those tried was Bi-directional LSTM with Glove embeddings (42B)

    License

    Belongs to the original author on Twitter

  9. w

    Street and Traffic SRs Web/Twitter Activity Map

    • data.wu.ac.at
    csv, json, xml
    Updated Apr 15, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KCMO Information Technology People Soft CRM cases (2014). Street and Traffic SRs Web/Twitter Activity Map [Dataset]. https://data.wu.ac.at/odso/data_kcmo_org/bTliay1ua3k1
    Explore at:
    json, xml, csvAvailable download formats
    Dataset updated
    Apr 15, 2014
    Dataset provided by
    KCMO Information Technology People Soft CRM cases
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Updated daily

  10. a

    Twitter Sentiment Geographical Index (MIT & Harvard)

    • sdgstoday-sdsn.hub.arcgis.com
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sustainable Development Solutions Network (2023). Twitter Sentiment Geographical Index (MIT & Harvard) [Dataset]. https://sdgstoday-sdsn.hub.arcgis.com/maps/a49e84eca1694e6fad9eda6e8ecc86af
    Explore at:
    Dataset updated
    May 31, 2023
    Dataset authored and provided by
    Sustainable Development Solutions Network
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Description

    This web map is part of SDGs Today. Please see sdgstoday.orgPromoting well-being is one of the key targets of Sustainable Development Goals at the United Nations. Many governments worldwide are incorporating subjective well-being (SWB) indicators to complement traditional objective and economic metrics. Our Twitter Sentiment Geographical Index (TSGI) can provide a high granularity monitor of well-being worldwide.This dataset is a joint effort of the Sustainable Urbanization Lab at MIT and Center for Geographic Analysis at Harvard.

  11. #nowplaying

    • zenodo.org
    • explore.openaire.eu
    • +1more
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eva Zangerle; Eva Zangerle (2020). #nowplaying [Dataset]. http://doi.org/10.5281/zenodo.2594483
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eva Zangerle; Eva Zangerle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a dump of the #nowplaying dataset which contains so-called listening events of users who publish the music they are currently listening to on Twitter. In particular, this dataset includes tracks which have been tweeted using the hashtags #nowplaying, #listento or #listeningto. In this dataset, we provide the track and artist of a listening event and metadata on the tweet (date sent, user, source). Furthermore, we provide a mapping of tracks to its respective Musicbrainz identifiers. The dataset features a total of 126 mio listening events.

    This archive contains the nowplaying.csv file, the main file which contains the following fields:

    • user id (each user is identified by a unique hash value)
    • source of the tweet (how it was sent; as provided by the Twitter API)
    • timestamp of the time the tweet underlying the listening event was sent
    • track title
    • artist name
    • musicbrainz identifier of the recording (cf. https://musicbrainz.org/)

    In case you make use of our dataset in a scientific setting, we kindly ask you to cite the following paper:


    Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26.

    If you have any questions or suggestions regarding the dataset, please do not hesitate to contact Eva Zangerle (eva.zangerle@uibk.ac.at).

  12. Data from: Analyzing Mentions of Death in Covid-19 Tweets

    • zenodo.org
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Divya Mani Adhikari; Divya Mani Adhikari; Muhammad Imran; Muhammad Imran; Umair Qazi; Umair Qazi; Ingmar Weber; Ingmar Weber (2024). Analyzing Mentions of Death in Covid-19 Tweets [Dataset]. http://doi.org/10.5281/zenodo.10839649
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Divya Mani Adhikari; Divya Mani Adhikari; Muhammad Imran; Muhammad Imran; Umair Qazi; Umair Qazi; Ingmar Weber; Ingmar Weber
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset preparation and annotation

    The dataset is a subset of the TBCOV dataset collected at QCRI filtered for mentions of personally related COVID-19 deaths. The filtering was done using regular expressions such as my * passed, my * died, my * succumbed & lost * battle. A sample of the dataset was annotated on Appen. Please see 'annotation-instructions.txt' for the full instructions provided to the annotators.

    Dataset description

    The "classifier_filtered_english.csv" file contains 33k deduplicated and classifier-filtered tweets (following X's content redistribution policy). for the 6 countries (Australia, Canada, India, Italy, United Kingdom, and United States) from March 2020 to March 2021 with classifier-labeled death labels, regular expression-filtered gender and relationship labels, and the user device label. The full 57k regex-filtered collection of tweets can be made available on special cases for Academics and Researchers.


    date: the date of the tweet

    country_name: the country name from Nominatim API

    tweet_id: the ID of the tweet

    url: the full URL of the tweet

    full_text: the full-text content of the tweet (also includes the URL of any media attached)

    does_the_tweet_refer_to_the_covidrelated_death_of_one_or_more_individuals_personally_known_to_the_tweets_author: the classifier predicted label for the death (also includes the original labels for the annotated samples)

    what_is_the_relationship_between_the_tweets_author_and_the_victim_mentioned: the annotated relationship labels

    relative_to_the_time_of_the_tweet_when_did_the_mentioned_death_occur: the annotated relative time labels

    user_is_verified: if the user is verified or not

    user_gender: the gender of the Twitter user (from the user profile)

    user_device: the Twitter client the user uses

    has_media: if the tweet has any attached media

    has_url: if the tweet text contains a URL

    matched_device: the device (Apple or Android) based on the Twitter client

    regex_gender: the gender inferred from regular expression-based filtering

    regex_relationship: the relationship label from regular expression-based filtering

    Inferring gender using regular expressions

    We first determine the mapping between different relationship labels mentioned in the tweet to the gender. We do not use any relationship like "cousin" from which we cannot easily infer the gender.

    Male relationships: 'father', 'dad', 'daddy', 'papa', 'pop', 'pa', 'son', 'brother', 'uncle', 'nephew', 'grandfather', 'grandpa', 'gramps', 'husband', 'boyfriend', 'fiancé', 'groom', 'partner', 'beau', 'friend', 'buddy', 'pal', 'mate', 'companion', 'boy', 'gentleman', 'man', 'father-in-law', 'brother-in-law', 'stepfather', 'stepbrother'

    Female relationships: 'mother', 'mom', 'mama', 'mum', 'ma', 'daughter', 'sister', 'aunt', 'niece', 'grandmother', 'grandma', 'granny', 'wife', 'girlfriend', 'fiancée', 'bride', 'partner', 'girl', 'lady', 'woman', 'miss', 'mother-in-law', 'sister-in-law', 'stepmother', 'stepsister'

    Based on these mappings, we used the following regex for each gender label to determine the gender of the deceased mentioned in the tweet.

    "[m|M]y\s(" + "|".join([r + "s?" for r in relationships]) + ")\s(died|succumbed|deceased)"

    Age groups from relationship labels

    First, we get the relationship labels using regex filtering, and then we group them into different age-group categories as shown in the following table. The UK and the US use different age groups because of the different age group definitions in the official data.

    CategoryRelationship (from tweets)Age Group (UK)Age Group (US)
    Grandparentsgrandfather, grandmother65+65+
    Parentsfather, mother, uncle, aunt45-6435-64
    Siblingsbrother, sister, cousin15-4415-34
    Childrenson, daughter, nephew, niece0-140-14

    Training the classifier

    The 'english-training.csv' file contains about 13k deduplicated human-annotated tweets. We use a random seed (42) to create the train/test split. The model Covid-Bert-V2 was fine-tuned on the training set for 2 epochs with the following hyperparameters (obtained using 10-fold CV): random_seed: 42, batch_size: 32, dropout: 0.1. We obtained a F1-score of 0.81 on the test set. We used about 5% (671) of the combined and deduplicated annotated tweets as the test set, about 2% (255) as the validation set, and the remaining 12,494 tweets were used for fine-tuning the model. The tweets were preprocessed to replace mentions, URLs, emojis, etc with generic keywords. The model was trained on a system with a single Nvidia A4000 16GB GPU. The fine-tuned model is also available as the 'model.bin' file. The code for finetuning the model as well as reproducing the experiments are available in this GitHub repository.

    Datasheet

    We also include a datasheet for the dataset following the recommendation of "Datasheets for Datasets" (Gebru et. al.) which provides more information about how the dataset was created and how it can be used. Please see "Datasheet.pdf".

    NOTE: We recommend that researchers try to rehydrate the individual tweets to ensure that the user has not deleted the tweet since posting. This gives users a mechanism to opt out of having their data analyzed.

    Please only use your institutional email when requesting the dataset as anything else (like gmail.com) will be rejected. The dataset will only be made available on reasonable request for Academics and Researchers. Please mention why you need the dataset and how you plan to use the dataset when making a request.

  13. Z

    Data from: French Entity-Linking dataset between annotated tweets collected...

    • data.niaid.nih.gov
    Updated Mar 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caillaut (2023). French Entity-Linking dataset between annotated tweets collected during major crises in France and French Wikipedia corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7767293
    Explore at:
    Dataset updated
    Mar 25, 2023
    Dataset authored and provided by
    Caillaut
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France, French
    Description

    Most of the available datasets are not particularly adapted to our target application: geolocate natural disasters from social networks. First, social media posts are largely underrepresented in these datasets, and the only Twitter dataset lacks Entity-Linking annotations. Second, none of the datasets focuses on a crisis or natural disaster event.

    To mitigate these issues, we extracted a collection of French tweets written during earthquakes and major floods that have occurred in France in recent years. We set up Label-Studio in order to annotate these tweets. A total of 4617 tweets were annotated, including 1678 tweets posted during earthquakes and 2939 during floods. For each annotated tweet, mentions were annotated using the set of labels described earlier in the paper as well as, when possible, the target Wikipedia title.

    Named “RéSoCIO” in reference to the research project in which it was carried out, the dataset resulting from this work contains a total of 12 828 annotated mentions and 1 513 distinct Wikipedia entities. 85% of mentions were associated with a Wikipedia page and 94 % if we ignore the RISKNAT and DAMAGES labels, which are often difficult to map to an existing entity.

        Labels
        #Mentions
        #Linked
        #Entities
    
    
        PERSON
        315
        263
        136
    
    
        ORG
        863
        790
        281
    
    
        GEOLOC
        4375
        4234
        701
    
    
        TRANSPORT
        250
        203
        101
    
    
        EVENT
        35
        21
        16
    
    
        FACILITY
        129
        94
        49
    
    
        RISKNAT
        5502
        4994
        128
    
    
        DAMAGES
        1136
        121
        56
    
    
        OTHER
        223
        200
        46
    
    
        Total
        12828
        1322
        1513
    

    Overview of the mentions annotated in the Twitter dataset. #Mentions shows the total number of mentions per label, #Linked the number of mentions linked to an entity and #Entities the number of distinct entities per label present in the dataset.

        Labels
        #Mentions
        #Linked
        #Entities
    
    
        PERSON
        1100102
        1098406
        557697
    
    
        ORG
        750925
        749504
        130394
    
    
        GEOLOC
        2729702
        2728296
        215924
    
    
        TRANSPORT
        161539
        160487
        53405
    
    
        EVENT
        798433
        798251
        86471
    
    
        FACILITY
        258835
        258513
        109867
    
    
        RISKNAT
        5502
        4994
        127
    
    
        DAMAGES
        1136
        121
        56
    
    
        OTHER
        4340621
        4339658
        682458
    
    
        Total
        10146795
        10138230
        1836399
    

    Overview of the mentions annotated in the full dataset. #Mentions shows the total number of mentions per label, #Linked the number of mentions linked to an entity and #Entities the number of distinct entities per label present in the dataset.

  14. u

    Twitter data for "Remapping and visualizing baseball labor"

    • iro.uiowa.edu
    zip
    Updated Dec 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Walden (2017). Twitter data for "Remapping and visualizing baseball labor" [Dataset]. https://iro.uiowa.edu/esploro/outputs/dataset/Twitter-data-for-Remapping-and-visualizing/9983736668802771
    Explore at:
    zip(470983 bytes)Available download formats
    Dataset updated
    Dec 13, 2017
    Dataset provided by
    University of Iowa
    Authors
    Katherine Walden
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Time period covered
    2019
    Description

    Recent baseball scholarship has drawn attention to U.S. professional baseball’s complex twentieth century labor dynamics and expanding global presence. From debates around desegregation to discussions about the sport’s increasingly multicultural identity and global presence, the cultural politics of U.S. professional baseball is connected to the problem of baseball labor. However, most scholars address these topics by focusing on Major League Baseball (MLB), ignoring other teams and leagues—Minor League Baseball (MiLB)—that develop players for Major League teams. Considering Minor League Baseball is critical to understanding the professional game in the United States, since players who populate Major League rosters constitute a fraction of U.S. professional baseball’s entire labor force. As a digital humanities dissertation on baseball labor and globalization, this project uses digital humanities approaches and tools to analyze and visualize a quantitative data set, exploring how Minor League Baseball relates to and complicates MLB-dominated narratives around globalization and diversity in U.S. professional baseball labor. This project addresses how MiLB demographics and global dimensions shifted over time, as well as how the timeline and movement of foreign-born players through the Minor Leagues differs from their U.S.-born counterparts. This project emphasizes the centrality and necessity of including MiLB data in studies of baseball’s labor and ideological significance or cultural meaning, making that argument by drawing on data analysis, visualization, and mapping to address how MiLB labor complicates or supplements existing understandings of the relationship between U.S. professional baseball’s global reach and “national pastime” claims.

  15. u

    A dataset of Spanish tweets on people and communities LGBTQI+ during the...

    • produccioncientifica.uhu.es
    • zenodo.org
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mata, Jacinto; Gualda, Estrella; Mata, Jacinto; Gualda, Estrella (2025). A dataset of Spanish tweets on people and communities LGBTQI+ during the COVID-19 pandemic 2020-2022 [LGBTQI+ Dataset 2020-2022_es] [Dataset]. https://produccioncientifica.uhu.es/documentos/67bc32b7478fbf5d29390ca9
    Explore at:
    Dataset updated
    2025
    Authors
    Mata, Jacinto; Gualda, Estrella; Mata, Jacinto; Gualda, Estrella
    Description

    The LGBTQI+ Dataset 2020-2022_es is a collection of 410,015 original tweets extracted from the social network Twitter between January 1, 2020, and December 31, 2022. To ensure data quality and relevance, retweets, replies, and other duplicate content were excluded, retaining only original tweets. The tweets were collected by Jacinto Mata (University of Huelva, I2C/CITES) with the support of the Python programming language and using the twarc2 tool and the Academic API v2 of Twitter. Tbis data collection is part of the project “Conspiracy Theories and Hate Speech Online: Comparison of patterns in narratives and social networks about COVID-19, immigrants and refugees and LGBTI people [NON-CONSPIRA-HATE!]”, PID2021-123983OB-I00, funded by MCIN/AEI/10.13039/501100011033/ by FEDER/EU.

    The search criteria (words and hashtags) used for the data collection followed the objectives of the aforementioned project and were defined by Estrella Gualda, Francisco Javier Santos Fernández and Jacinto Mata (University of Huelva, Spain). Terms and hashtags used for the search and extraction of tweets were: #orgullogay, #orgullotrans, #OrgulloLGTB, #OrgulloLGTBI, #Díadelorgullo, #TRANSFOBIA, #transexuales, #LGTB, #LGTBI, #LGTBIQ, #LGTBQ, #LGTBQ+, anti-gay, "anti gay", anti-trans, "anti trans", "Ley Anti-LGTB", "ley trans", "anti-ley trans".

    This dataset collected in the frame of the NON-CONSPIRA-HATE! project had the aim of identifying and mapping online hate speech narratives and conspiracy theories towards LGBTIQ+ people and community. Additionally, the dataset is intended to compare communication patterns in social media (rhetoric, language, micro-discourses, semantic networks, emotions, etc.) deployed in different datasets collected in this project. This dataset also contributes to mapping the actors, communities, and networks that spread hate messages and conspiracy theories, aiming to understand the patterns and strategies implemented by extremist sectors on social media. he dataset includes messages that address a wide range of topics related to the LGBTQI+ community, such as rights, visibility, the fight against discrimination and transphobia, as well as debates surrounding the Trans Law and other related issues. It includes expressions of support and celebration of Pride as well as hate speech and opposition to LGBTQI+ rights, along with debates and controversies surrounding these issues.

    This dataset offers a wide range of possibilities for research in various disciplines, as the following examples express:

    Social Sciences & Digital Humanities:- Analysis of opinions, attitudes, and trends toward the LGBTIQ+ people and community.- Studies on the evolution of public discourse and polarization around issues such as transphobia, hate speech, disinformation, LGBTIQ+ rights and pride, and others.- Analysis on social and political actors, leaders or organizations disseminating diverse narratives on LGBTIQ+ - Research on the impact of specific events (e.g., Pride Day) on social media conversations.- Investigations on social and semantic networks around LGBTIQ+ people and community.- Analysis of narratives, discourses and rethoric around gender identity and sexual diversity.- Comparative studies on the representation of the LGBTIQ+ people and community in different cultural or geographic contexts.

    Computer Science and Artificial Intelligence:- Development of algorithms for the automatic detection of hate speech, discriminatory language, or offensive content.- Training natural language processing (NLP) models to analyze sentiments and emotions in texts related to the LGBTIQ+ people and community.

    For more information on other technical details of the dataset and the structure of the .jsonl data, see the “Readme.txt” file.

  16. X/Twitter: Countries with the largest audience 2025

    • statista.com
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.

  17. Live Maps (Mature)

    • data-salemva.opendata.arcgis.com
    Updated Jun 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    esri_en (2016). Live Maps (Mature) [Dataset]. https://data-salemva.opendata.arcgis.com/items/d74ca978920a4f31aab9fdbc4ff1ef1a
    Explore at:
    Dataset updated
    Jun 16, 2016
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    esri_en
    Description

    Live Maps is a configurable app template that provides the ability to consume a live data feeds from a variety of sources.Use CasesProvide a map that shows locations of health care facilities and the reported cases of the influenza.Present the locations of political campaign events with related tweets.Configurable OptionsLive Maps is used to combine social media feeds with your operational content, it can be configured using the following options:Map: Choose the web map used in your application.Title: The application name displayed in the header.Subitle: The application subtitle displayed in the header.Color: Choose the color scheme for the application.Feed: The live feed to use in the application, currently supports: Twitter, Flickr, SickWeather.Keyword: Optional search keyword for feeds like Twitter and Flickr.Interval: The interval in minutes to switch between records.Refresh interval: The interval in minutes to refresh the feed.Supported DevicesThis application is responsively designed to support use in browsers on desktops, mobile phones, and tablets.Data RequirementsThis application has no data requirements.Get Started This application can be created in the following ways:Click the Create a Web App button on this pageShare a map and choose to Create a Web AppOn the Content page, click Create - App - From Template Click the Download button to access the source code. Do this if you want to host the app on your own server and optionally customize it to add features or change styling.

  18. f

    Mapping ecological concepts using twitter

    • figshare.com
    xml
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothée Poisot (2023). Mapping ecological concepts using twitter [Dataset]. http://doi.org/10.6084/m9.figshare.827286.v1
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Timothée Poisot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactions between key concepts mentionned on Twitter in tweets containing words from the field of ecology. See the URL for more details on the methodology. These data come from a series of relatively short sampling sessions.

  19. H

    Replication Tweet Data of "Does the rich man’s club employ social media to...

    • dataverse.harvard.edu
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Saidur Rahman Khan (2025). Replication Tweet Data of "Does the rich man’s club employ social media to advance digital public diplomacy? Mapping the interactional network dynamics of OECD leaders’ cross-border communication on X (formerly Twitter)" [Dataset]. http://doi.org/10.7910/DVN/STHR49
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Md. Saidur Rahman Khan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the tweet, mention, identity of mentioned persons, hashtags and X URL's posted by OECD leaders during the study period

  20. Z

    DeepCube: Post-processing and annotated datasets of social media data

    • data.niaid.nih.gov
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Mokas (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7732930
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Eleni Kamateri
    Giannis Tsampoulatidis
    Alexandros Mokas
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Researcher(s): Alexandros Mokas, Eleni Kamateri

    Supervisor: Ioannis Tsampoulatidis

    This repository contains 3 social media datasets:

    2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

    The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.

    The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

    The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

    For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

    After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

    The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohammad Khalil; Gleb Belokrys (2023). Data_Sheet_5_What Does Twitter Say About Self-Regulated Learning? Mapping Tweets From 2011 to 2021.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2022.820813.s005

Data_Sheet_5_What Does Twitter Say About Self-Regulated Learning? Mapping Tweets From 2011 to 2021.CSV

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 15, 2023
Dataset provided by
Frontiers
Authors
Mohammad Khalil; Gleb Belokrys
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Social network services such as Twitter are important venues that can be used as rich data sources to mine public opinions about various topics. In this study, we used Twitter to collect data on one of the most growing theories in education, namely Self-Regulated Learning (SRL) and carry out further analysis to investigate What Twitter says about SRL? This work uses three main analysis methods, descriptive, topic modeling, and geocoding analysis. The searched and collected dataset consists of a large volume of relevant SRL tweets equal to 54,070 tweets between 2011 and 2021. The descriptive analysis uncovers a growing discussion on SRL on Twitter from 2011 till 2018 and then markedly decreased till the collection day. For topic modeling, the text mining technique of Latent Dirichlet allocation (LDA) was applied and revealed insights on computationally processed topics. Finally, the geocoding analysis uncovers a diverse community from all over the world, yet a higher density representation of users from the Global North was identified. Further implications are discussed in the paper.

Search
Clear search
Close search
Google apps
Main menu