49 datasets found
  1. H

    Data from: The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jan 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cory Cascalheira (2024). The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning [Dataset]. http://doi.org/10.7910/DVN/GPRSXH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Cory Cascalheira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 18, 2012 - Sep 18, 2021
    Dataset funded by
    New Mexico State University Adams-Cahill Graduate Research Award
    American Psychological Association Early Graduate Research Award
    National Institute on Alcohol Abuse and Alcoholism
    Ewha Women’s University
    National Institute on Drug Abuse
    National Institutes of Health
    National Science Foundation
    American Psychological Association Michael Sullivan Diversity Scholarship
    Description

    Minority stress is the leading theoretical construct for understanding LGBTQ+ health disparities. As such, there is an urgent need to develop innovative policies and technologies to reduce minority stress. To spur technological innovation, we created the largest labeled datasets on minority stress using natural language from subreddits related to sexual and gender minority people. A team of mental health clinicians, LGBTQ+ health experts, and computer scientists developed two datasets: (1) the publicly available LGBTQ+ Minority Stress on Social Media (MiSSoM) dataset and (2) the advanced request-only version of the dataset, LGBTQ+ MiSSoM+. Both datasets have seven labels related to minority stress, including an overall composite label and six sublabels. LGBTQ+ MiSSoM (N = 27,709) includes both human- and machine-annotated la-bels and comes preprocessed with features (e.g., topic models, psycholinguistic attributes, sentiment, clinical keywords, word embeddings, n-grams, lexicons). LGBTQ+ MiSSoM+ includes all the characteristics of the open-access dataset, but also includes the original Reddit text and sentence-level labeling for a subset of posts (N = 5,772). Benchmark supervised machine learning analyses revealed that features of the LGBTQ+ MiSSoM datasets can predict overall minority stress quite well (F1 = 0.869). Benchmark performance metrics yielded in the prediction of the other labels, namely prejudiced events (F1 = 0.942), expected rejection (F1 = 0.964), internalized stigma (F1 = 0.952), identity concealment (F1 = 0.971), gender dysphoria (F1 = 0.947), and minority coping (F1 = 0.917), were excellent.

  2. Adverse effects of using the Internet and social networking websites or apps...

    • pilot.open.canada.ca
    • www150.statcan.gc.ca
    • +2more
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Adverse effects of using the Internet and social networking websites or apps by gender and age group, inactive [Dataset]. https://pilot.open.canada.ca/data/dataset/80c88ac9-8ea1-4ff7-856e-560f7683d660
    Explore at:
    html, csv, xmlAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Percentage of Internet users who have experienced selected personal effects in their life because of the Internet and the use of social networking websites or apps, during the past 12 months.

  3. G

    Selected social outcomes of using the Internet and social networking...

    • ouvert.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Selected social outcomes of using the Internet and social networking websites or apps by gender and age group [Dataset]. https://ouvert.canada.ca/data/dataset/971e1d31-a88f-41f6-a68d-1e1f236da491
    Explore at:
    xml, csv, htmlAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Percentage of Canadians who have experienced selected personal effects in their life because of the Internet and the use of social networking websites or apps, during the past 12 months.

  4. Data from: Dataset relating the Social activity of Open Research Data on...

    • zenodo.org
    bin, csv
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juliana Elisa Raffaghelli; Juliana Elisa Raffaghelli; Stefania Manca; Stefania Manca (2024). Dataset relating the Social activity of Open Research Data on ResearchGate [Dataset]. http://doi.org/10.5281/zenodo.5554565
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juliana Elisa Raffaghelli; Juliana Elisa Raffaghelli; Stefania Manca; Stefania Manca
    Description

    The potential of Open Research Data (ORD) within the context of open science and digital scholarship can be frustrated if data remains unused. Although current research has investigated the way ORD is being published, researchers’ behaviour of ORD publishing and sharing on academic social networks (ASN) remains insufficiently explored. The research to which this dataset is connected aims to illustrate some parameters of social activity around self-archived ORDs on ResearchGate. The study analyses whether the ORDs publication leads to social activity (reads and citations) around the ORDs and their linked published articles, including eventual associations between the social activity and the researchers’ profile (scientific domain, gender, region, professional position, reputation) as well as the quality of the ORD published.

    The current dataset is composed by:

    A- The .csv file, extracted as a random sample of 752 ORD items from ResearchGate. The dataset has been polished and anonymized. The variables relating the researchers' profiles and citations to the ORD lined research were got from the researchers' profiles and published research. However, for the purpose of anonymisation, these variables have been coded and the original information removed.

    B- The codebook explaining the variables and metrics contained in the file (A)

    C- The R script. This script contains a number of explorations not reported in the final paper. The quantitative techniques applied include descriptive statistics, logistic regression and K-means cluster analysis.

    D- Five tables, three figures and two annexes (Logistic Regression I and II) created over the basis of the dataset (A)

    The results have been interpreted in terms of three main aspects.

    • Firstly, there is still an underdeveloped social activity around self-archived ORD in ResearchGate (operationalized as reads and citations) overall and in spite of the published ORDs quality.
    • Secondly, it was found an uncovering of the relevance of the moderating effects over ORD, which spots traditional dynamics within the “innovative” practice of engaging with data practices.
    • Thirdly, a rather similar situation of ResearchGate as ASN with regard to other data platforms and repositories in terms of social activity around ORD was detected.

    The potential of Open Research Data (ORD) within the context of open science and digital scholarship can be frustrated if data remains unused.

  5. Instagram: distribution of global audiences 2024, by age and gender

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.

                  Teens and social media
    
                  As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
                  Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
    
  6. Instagram: distribution of global audiences 2024, by gender

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.

                  Instagram’s Global Audience
    
                  As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
                  As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
    
                  Who is winning over the generations?
    
                  Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
    
  7. f

    Navigating News Narratives: A Media Bias Analysis Dataset

    • figshare.com
    txt
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaina Raza (2023). Navigating News Narratives: A Media Bias Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.24422122.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    figshare
    Authors
    Shaina Raza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).The data structure is tabulated as follows:Text: The main content.Dimension: Descriptive category of the text.Biased_Words: A compilation of words regarded as biased.Aspect: Specific sub-topic within the main content.Label: Indicates the presence (True) or absence (False) of bias. The label is ternary - highly biased, slightly biased and neutralToxicity: Indicates the presence (True) or absence (False) of bias.Identity_mention: Mention of any identity based on words match.Annotation SchemeThe labels and annotations in the dataset are generated through a system of Active Learning, cycling through:Manual LabelingSemi-Supervised LearningHuman VerificationThe scheme comprises:Bias Label: Specifies the degree of bias (e.g., no bias, mild, or strong).Words/Phrases Level Biases: Pinpoints specific biased terms or phrases.Subjective Bias (Aspect): Highlights biases pertinent to content dimensions.Due to the nuances of semantic match algorithms, certain labels such as 'identity' and 'aspect' may appear distinctively different.List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.We also utilize publicly available data from the following links. Our Attribution to others.MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detectionToxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtVSocial biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.If you use this dataset, please cite us.Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

  8. s

    Which Gender Uses Social Media More By Region?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Which Gender Uses Social Media More By Region? [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Regional use of social media has a significant effect on the male and female social media statistics.

  9. P

    NAMEXTEND Dataset

    • paperswithcode.com
    Updated Feb 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Drechsel; Steffen Herbold (2025). NAMEXTEND Dataset [Dataset]. https://paperswithcode.com/dataset/namextend
    Explore at:
    Dataset updated
    Feb 2, 2025
    Authors
    Jonathan Drechsel; Steffen Herbold
    Description

    This dataset extends NAMEXACT by including words that can be used as names, but may not exclusively be used as names in every context.

    Dataset Details Dataset Description

    Unlike NAMEXACT, this datasets contains words that are mostly used as names, but may also be used in other contexts, such as

    Christian (believer in Christianity) Drew (simple past of the verb to draw) Florence (an Italian city) Henry (the SI unit of inductance) Mercedes (a car brand)

    In addition, names with ambiguous gender are included - once for each gender. For instance, Skyler is included as female (F) name with a probability of 37.3%, and as male (M) name with a probability of 62.7%.

    Dataset Sources [optional]

    Repository: github.com/aieng-lab/gradiend

    Original Dataset: Gender by Name

    Dataset Structure

    name: the name gender: the gender of the name (M for male and F for female) count: the count value of this name (raw value from the original dataset) probability: the probability of this name (raw value from original dataset; not normalized to this dataset!) gender_agreement: a value describing the certainty that this name has an unambiguous gender computed as the maximum probability of that name across both genders, e.g., $max(37.7%, 62.7%)=62.7%$ for Skyler. For names with a unique gender in this dataset, this value is 1.0 primary_gender: is equal to gender for names with a unique gender in this dataset, and equals otherwise the gender of that name with higher probability genders: label B if both genders are contained for this name in this dataset, otherwise equal to gender prob_F: the probability of that name being used as a female name (i.e., 0.0 or 1.0 if genders != B) prob_M: the probability of that name being used as a male name

    Dataset Creation Source Data

    The data is created by filtering Gender by Name.

    Data Collection and Processing

    The original data is filtered to contain only names with a count of at least 100 to remove very rare names. This threshold reduces the total number of names by $72%, from 133910 to 37425.

    Bias, Risks, and Limitations

    The original dataset provides counts of names (with their gender) for male and female babies from open-source government authorities in the US (1880-2019), UK (2011-2018), Canada (2011-2018), and Australia (1944-2019) in these periods

  10. s

    Which Gender Uses Social Media More By Platform?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Which Gender Uses Social Media More By Platform? [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results of which gender uses which platforms are in.

  11. Investigating the Relationship between Instagram and Gender of User

    • figshare.com
    txt
    Updated Sep 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris P Crunch (2016). Investigating the Relationship between Instagram and Gender of User [Dataset]. http://doi.org/10.6084/m9.figshare.3830094.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 15, 2016
    Dataset provided by
    figshare
    Authors
    Chris P Crunch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A random sample of 20 York University students were surveyed. Two questions were asked: 1) for biological sex, and 2) whether or not they have opened an Instagram account, a social network sharing photos and videos through the Instagram app on mobile devices. Of the 20 students, 19 were asked inside Lumbers, a building with laboratories for multiple scientific disciplines.In the collected data, "Gender" refers to biological sex of the participant. F stands for female and M stands for male. The second variable, "Opened an Instagram Account" records if the participant has ever registered for an Instagram account. The response to the second question was only "yes" or "no".

  12. D1.1.ALLINTERACT_RawData

    • zenodo.org
    zip
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marta Soler-Gallart; Marta Soler-Gallart (2023). D1.1.ALLINTERACT_RawData [Dataset]. http://doi.org/10.5281/zenodo.6586938
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marta Soler-Gallart; Marta Soler-Gallart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of EC Horizon 2020 project ALLINTERACT Widening and diversifying citizen engagement in science (872396).
    It contains the raw data obtained from the fieldwork, which consists of: 1) Literature Review, 2) Social Media Analytics, 3) Focus Groups, 4) Survey and 5) Social Media Communicative Observation.
    1) Literature Review
    The objective of the literature review was to address the following topics in gender and education: a) How citizens’ benefit from scientific research, b) Citizen awareness of the impact of scientific research, c) Awareness-raising initiatives succeeding at engaging citizens in scientific participation, including the Open Access movement and citizen science initiatives, d) Awareness-raising actions that foster the recruitment of new talent in sciences and e) Policies that promote awareness-raising actions and citizen engagement in science.
    In order to do so, the searches were carried out in the top scientific databases, namely Web of Science (mainly in those journals indexed in Journal Citation Reports) and Scopus. The articles were published between 2010-2021 in journals indexed Q1 or Q2 in JCR or in Q1 journals indexed in Scopus. Relevant reports from EU-funded research projects and official EU documents were also included.
    We provide one word file with the following information of each topic (a-e) in gender and education.
    - Keywords used
    - Criteria of selection
    - Identified sources
    - Outcomes
    - Annexes: Grids with the details of the identified socurces

    2) Social Media Analytic
    It is the raw data obtained from social media interactions (Twitter, Facebook, Instagram and Reddit) among citizens about citizen participation in science and research with social impact related to two Sustainable Development Goals: Quality Education and Gender Equality.
    The data collection followed a twofold strategy 1) Top-Down, in which researchers identified and selected relevant Twitter and Instagram hashtags and Facebook and Reddit pages and 2) Bottom-Up, in which Twitter hashtags were selected based on daily Trending Topics.
    The data was collected between March 9th and March 16th 2021 and has been obtained, cleaned and anonymized following Allinteract - Social Media Analytics Protocol (Flecha & Pulido, 2021).
    We provide five Excel files (one for each social network explored). Each file contains the main information of the extracted messages, however the information extracted in each case is slightly different.
    -Twitter: Tweet ID, Time, Tweet Type, Retweeted By, Number of Retweets, Hashtags
    -Facebook: Post ID, Video, Type, Likes, Created Time, Updated Time, Comment ID, Comment Likes, Comment Time, Page Likes
    -Instagram: Likes, comments, date
    -Reddit: Row ID, sub_id, sub_title, sub_score, sub_date, comment_id, comment_score, comment_date

    3) Focus Groups
    This data file contains the pseudonymized transcription of a total of 6 focus groups in gender and 6 in education, which were conducted between October 2021 and February 2022. These focus groups are the pre-test and therefore, the groups are distributed in control group or experimental group. The participants of the gender focus groups were women (including vulnerable women) from a women’s group, members of an LGBTQI group and women (including young women) from a women’s group. The participants of the education focus groups were parents, teachers and students.
    We provide a word file with the literal transcriptions of the focus groups in the language in which the focus groups were conducted (English, Spanish or Portuguese).

    4) Survey
    This data file contains the anonym answers of the survey conducted with participants from 12 countries, through a CATI/CAWI method. The survey was conducted between November 2021 and February 2022 and consists of 59 questions. The exploitation of this data has been carried out with the SPSS software.
    We provide an excel file with the 59 questions and the answers of 7507 participants.

    5) Social Media Communicative Observation
    The Social Media Communicative Observation aims to explore the effects of introducing scientific pieces of evidence in social media interactions as an initiative to increase participation through awareness. In order to do so, scientific evidence on gender and education were introduced in 10 Facebook groups (5 related to gender and 5 to education), 10 Reddit communities (5 related to gender and 5 to education) and 2 Social Impact Platforms (Sappho and Adhyayana).
    We provide an excel file with the anonymized interactions among users around the introduced piece of evidence. This Excel file contains the following information: Group of documents, document name, code, start, final, weight, segment, changed by, changed, created, comment, area and percentage (%).

    Funding: We acknowledge support of this work by the project "ALLINTERACT Widening and diversifying citizen engagement in science” (872396) from the European Commission Horizon 2020 programme.

    Contact information
    Ramón Flecha (PI): ramon.flecha@ub.edu
    Marta Soler Gallart (KMC Coordinator): marta.soler@ub.edu
    Pavel Oveiko (Ethics Chair): pavel.ovseiko@rdm.ox.ac.uk
    ALLINTERACT Project: allinteract@ub.edu

    References
    Flecha, R., & Pulido, C. (2021). Allinteract - Social Media Analytics Protocol is licensed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License is available in https://archive.org/details/@crea_research

    How to cite this dataset
    Soler-Gallart, M. (2021). D1.1.Allinteract Raw Data is licensed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License

  13. Twitter users in the United States 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

  14. d

    Replication Data for: On the Mechanics of NFT Valuation: AI Ethics and...

    • dataone.org
    • dataverse.harvard.edu
    • +2more
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang, Luyao; Yutong Sun; Yutong Quan; Jiaxun Cao; Tong Xin (2023). Replication Data for: On the Mechanics of NFT Valuation: AI Ethics and Social Media [Dataset]. http://doi.org/10.7910/DVN/YMZC30
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zhang, Luyao; Yutong Sun; Yutong Quan; Jiaxun Cao; Tong Xin
    Description

    As CryptoPunks pioneers the innovation of non-fungible tokens (NFTs) in AI and art, the valuation mechanics of NFTs has become a trending topic. Earlier research identifies the impact of ethics and society on the price prediction of CryptoPunks. Since the booming year of the NFT market in 2021, the discussion of CryptoPunks has propagated on social media. Still, existing literature hasn't considered the social sentiment factors after the historical turning point on NFT valuation. In this paper, we study how sentiments in social media, together with gender and skin tone, contribute to NFT valuations by an empirical analysis of social media, blockchain, and crypto exchange data. We evidence social sentiments as a significant contributor to the price prediction of CryptoPunks. Furthermore, we document structure changes in the valuation mechanics before and after 2021. Although people's attitudes towards Cryptopunks are primarily positive, our findings reflect imbalances in transaction activities and pricing based on gender and skin tone. Our result is consistent and robust, controlling for the rarity of an NFT based on the set of human-readable attributes, including gender and skin tone. Our research contributes to the interdisciplinary study at the intersection of AI, Ethics, and Society, focusing on the ecosystem of decentralized AI or blockchain. We provide our data and code for replicability as open access on GitHub.

  15. m

    The Climate Change Twitter Dataset

    • data.mendeley.com
    Updated May 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. http://doi.org/10.17632/mw8yd7z9wc.2
    Explore at:
    Dataset updated
    May 19, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  16. s

    Social Media Usage By Country

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Usage By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results might surprise you when looking at internet users that are active on social media in each country.

  17. 🌟 Emoji Trends Dataset

    • kaggle.com
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2024). 🌟 Emoji Trends Dataset [Dataset]. https://www.kaggle.com/datasets/waqi786/emoji-trends-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a detailed analysis of emoji usage across various social media platforms. It captures how different emojis are used in different contexts, reflecting emotions, trends, and user demographics.

    With emojis becoming a universal digital language, this dataset helps researchers, marketers, and data analysts explore how people express emotions online and identify patterns in social media communication.

    📌 Key Features: 😊 Emoji Details: Emoji 🎭: The specific emoji used in a post, comment, or message. Context 💬: The meaning or emotion associated with the emoji (e.g., Happy, Love, Funny, Sad). Platform 🌐: The social media platform where the emoji was used (e.g., Facebook, Instagram, Twitter). 👤 User Demographics: User Age 🎂: Age of the user who posted the emoji (ranges from 13 to 65 years). User Gender 🚻: Gender of the user (Male/Female). 📈 Additional Insights: Emoji Popularity 🔥: Frequency of each emoji’s usage across platforms. Trends Over Time 📅: How emoji usage changes based on trends or events. Regional Usage Patterns 🌍: How different cultures and regions use emojis differently. 📊 Use Cases & Applications: 🔹 Understanding emoji trends across social media 🔹 Analyzing emotional expression through digital communication 🔹 Exploring demographic differences in emoji usage 🔹 Identifying platform-specific emoji preferences 🔹 Enhancing sentiment analysis models with emoji insights

    ⚠️ Important Note: This dataset is synthetically generated for educational and analytical purposes. It does not contain real user data but is designed to reflect real-world trends in emoji usage.

  18. N

    ENDGBV Social Media Outreach, Paid Advertising, and the NYC HOPE Resource...

    • data.cityofnewyork.us
    • datasets.ai
    • +1more
    application/rdfxml +5
    Updated Oct 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayor's Office to End Domestic and Gender-Based Violence (ENDGBV) (2021). ENDGBV Social Media Outreach, Paid Advertising, and the NYC HOPE Resource Directory during COVID-19 [Dataset]. https://data.cityofnewyork.us/Public-Safety/ENDGBV-Social-Media-Outreach-Paid-Advertising-and-/q7bn-wnne
    Explore at:
    csv, application/rssxml, application/rdfxml, tsv, json, xmlAvailable download formats
    Dataset updated
    Oct 12, 2021
    Dataset authored and provided by
    Mayor's Office to End Domestic and Gender-Based Violence (ENDGBV)
    Area covered
    New York
    Description

    This data set contains information on the number of visits and new visitors to the NYC HOPE website (https://www1.nyc.gov/nychope/site/page/home). The website provides information on domestic and gender-based violence, including resources and services that are available in New York City.

  19. s

    Social Media Usage By Age

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Usage By Age [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gen Z and Millennials are the biggest social media users of all age groups.

  20. s

    Social Media Worldwide Usage Statistics

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Worldwide Usage Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    56.8% of the world’s total population is active on social media.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Cory Cascalheira (2024). The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning [Dataset]. http://doi.org/10.7910/DVN/GPRSXH

Data from: The LGBTQ+ Minority Stress on Social Media (MiSSoM) Dataset: A Labeled Dataset for Natural Language Processing and Machine Learning

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2024
Dataset provided by
Harvard Dataverse
Authors
Cory Cascalheira
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Aug 18, 2012 - Sep 18, 2021
Dataset funded by
New Mexico State University Adams-Cahill Graduate Research Award
American Psychological Association Early Graduate Research Award
National Institute on Alcohol Abuse and Alcoholism
Ewha Women’s University
National Institute on Drug Abuse
National Institutes of Health
National Science Foundation
American Psychological Association Michael Sullivan Diversity Scholarship
Description

Minority stress is the leading theoretical construct for understanding LGBTQ+ health disparities. As such, there is an urgent need to develop innovative policies and technologies to reduce minority stress. To spur technological innovation, we created the largest labeled datasets on minority stress using natural language from subreddits related to sexual and gender minority people. A team of mental health clinicians, LGBTQ+ health experts, and computer scientists developed two datasets: (1) the publicly available LGBTQ+ Minority Stress on Social Media (MiSSoM) dataset and (2) the advanced request-only version of the dataset, LGBTQ+ MiSSoM+. Both datasets have seven labels related to minority stress, including an overall composite label and six sublabels. LGBTQ+ MiSSoM (N = 27,709) includes both human- and machine-annotated la-bels and comes preprocessed with features (e.g., topic models, psycholinguistic attributes, sentiment, clinical keywords, word embeddings, n-grams, lexicons). LGBTQ+ MiSSoM+ includes all the characteristics of the open-access dataset, but also includes the original Reddit text and sentence-level labeling for a subset of posts (N = 5,772). Benchmark supervised machine learning analyses revealed that features of the LGBTQ+ MiSSoM datasets can predict overall minority stress quite well (F1 = 0.869). Benchmark performance metrics yielded in the prediction of the other labels, namely prejudiced events (F1 = 0.942), expected rejection (F1 = 0.964), internalized stigma (F1 = 0.952), identity concealment (F1 = 0.971), gender dysphoria (F1 = 0.947), and minority coping (F1 = 0.917), were excellent.

Search
Clear search
Close search
Google apps
Main menu