4 datasets found
  1. f

    Data_Sheet_1_Fake News or Weak Science? Visibility and Characterization of...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadia Arif; Majed Al-Jefri; Isabella Harb Bizzi; Gianni Boitano Perano; Michel Goldman; Inam Haq; Kee Leng Chua; Manuela Mengozzi; Marie Neunez; Helen Smith; Pietro Ghezzi (2023). Data_Sheet_1_Fake News or Weak Science? Visibility and Characterization of Antivaccine Webpages Returned by Google in Different Languages and Countries.XLSX [Dataset]. http://doi.org/10.3389/fimmu.2018.01215.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Nadia Arif; Majed Al-Jefri; Isabella Harb Bizzi; Gianni Boitano Perano; Michel Goldman; Inam Haq; Kee Leng Chua; Manuela Mengozzi; Marie Neunez; Helen Smith; Pietro Ghezzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 1998 Lancet paper by Wakefield et al., despite subsequent retraction and evidence indicating no causal link between vaccinations and autism, triggered significant parental concern. The aim of this study was to analyze the online information available on this topic. Using localized versions of Google, we searched “autism vaccine” in English, French, Italian, Portuguese, Mandarin, and Arabic and analyzed 200 websites for each search engine result page (SERP). A common feature was the newsworthiness of the topic, with news outlets representing 25–50% of the SERP, followed by unaffiliated websites (blogs, social media) that represented 27–41% and included most of the vaccine-negative websites. Between 12 and 24% of websites had a negative stance on vaccines, while most websites were pro-vaccine (43–70%). However, their ranking by Google varied. While in Google.com, the first vaccine-negative website was the 43rd in the SERP, there was one vaccine-negative webpage in the top 10 websites in both the British and Australian localized versions and in French and two in Italian, Portuguese, and Mandarin, suggesting that the information quality algorithm used by Google may work better in English. Many webpages mentioned celebrities in the context of the link between vaccines and autism, with Donald Trump most frequently. Few websites (1–5%) promoted complementary and alternative medicine (CAM) but 50–100% of these were also vaccine-negative suggesting that CAM users are more exposed to vaccine-negative information. This analysis highlights the need for monitoring the web for information impacting on vaccine uptake.

  2. Data from: Five Years of COVID-19 Discourse on Instagram: A Labeled...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, Ph.D.; Nirmalya Thakur, Ph.D. (2024). Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.13896353
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nirmalya Thakur, Ph.D.; Nirmalya Thakur, Ph.D.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 6, 2024
    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset

    • Post ID: Unique ID of each Instagram post
    • Post Description: Complete description of each post in the language in which it was originally published
    • Date: Date of publication in MM/DD/YYYY format
    • Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API
    • Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API
    • Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    1. How does sentiment toward COVID-19 vary across different languages?
    2. How has public sentiment toward COVID-19 evolved from 2020 to the present?
    3. How do cultural differences affect social media discourse about COVID-19 across various languages?
    4. How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
    5. How effective were public health campaigns in shifting public sentiment in different languages?
    6. What patterns of vaccine hesitancy or support are present in different languages?
    7. How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
    8. What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
    9. How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
    10. What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  3. f

    data_sheet_2_Fake News or Weak Science? Visibility and Characterization of...

    • figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadia Arif; Majed Al-Jefri; Isabella Harb Bizzi; Gianni Boitano Perano; Michel Goldman; Inam Haq; Kee Leng Chua; Manuela Mengozzi; Marie Neunez; Helen Smith; Pietro Ghezzi (2023). data_sheet_2_Fake News or Weak Science? Visibility and Characterization of Antivaccine Webpages Returned by Google in Different Languages and Countries.xlsx [Dataset]. http://doi.org/10.3389/fimmu.2018.01215.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Nadia Arif; Majed Al-Jefri; Isabella Harb Bizzi; Gianni Boitano Perano; Michel Goldman; Inam Haq; Kee Leng Chua; Manuela Mengozzi; Marie Neunez; Helen Smith; Pietro Ghezzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 1998 Lancet paper by Wakefield et al., despite subsequent retraction and evidence indicating no causal link between vaccinations and autism, triggered significant parental concern. The aim of this study was to analyze the online information available on this topic. Using localized versions of Google, we searched “autism vaccine” in English, French, Italian, Portuguese, Mandarin, and Arabic and analyzed 200 websites for each search engine result page (SERP). A common feature was the newsworthiness of the topic, with news outlets representing 25–50% of the SERP, followed by unaffiliated websites (blogs, social media) that represented 27–41% and included most of the vaccine-negative websites. Between 12 and 24% of websites had a negative stance on vaccines, while most websites were pro-vaccine (43–70%). However, their ranking by Google varied. While in Google.com, the first vaccine-negative website was the 43rd in the SERP, there was one vaccine-negative webpage in the top 10 websites in both the British and Australian localized versions and in French and two in Italian, Portuguese, and Mandarin, suggesting that the information quality algorithm used by Google may work better in English. Many webpages mentioned celebrities in the context of the link between vaccines and autism, with Donald Trump most frequently. Few websites (1–5%) promoted complementary and alternative medicine (CAM) but 50–100% of these were also vaccine-negative suggesting that CAM users are more exposed to vaccine-negative information. This analysis highlights the need for monitoring the web for information impacting on vaccine uptake.

  4. P

    mC4 Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Mar 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linting Xue; Noah Constant; Adam Roberts; Mihir Kale; Rami Al-Rfou; Aditya Siddhant; Aditya Barua; Colin Raffel (2021). mC4 Dataset [Dataset]. https://paperswithcode.com/dataset/mc4
    Explore at:
    Dataset updated
    Mar 24, 2021
    Authors
    Linting Xue; Noah Constant; Adam Roberts; Mihir Kale; Rami Al-Rfou; Aditya Siddhant; Aditya Barua; Colin Raffel
    Description

    mC4 is a multilingual variant of the C4 dataset called mC4. mC4 comprises natural text in 101 languages drawn from the public Common Crawl web scrape.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nadia Arif; Majed Al-Jefri; Isabella Harb Bizzi; Gianni Boitano Perano; Michel Goldman; Inam Haq; Kee Leng Chua; Manuela Mengozzi; Marie Neunez; Helen Smith; Pietro Ghezzi (2023). Data_Sheet_1_Fake News or Weak Science? Visibility and Characterization of Antivaccine Webpages Returned by Google in Different Languages and Countries.XLSX [Dataset]. http://doi.org/10.3389/fimmu.2018.01215.s001

Data_Sheet_1_Fake News or Weak Science? Visibility and Characterization of Antivaccine Webpages Returned by Google in Different Languages and Countries.XLSX

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Nadia Arif; Majed Al-Jefri; Isabella Harb Bizzi; Gianni Boitano Perano; Michel Goldman; Inam Haq; Kee Leng Chua; Manuela Mengozzi; Marie Neunez; Helen Smith; Pietro Ghezzi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The 1998 Lancet paper by Wakefield et al., despite subsequent retraction and evidence indicating no causal link between vaccinations and autism, triggered significant parental concern. The aim of this study was to analyze the online information available on this topic. Using localized versions of Google, we searched “autism vaccine” in English, French, Italian, Portuguese, Mandarin, and Arabic and analyzed 200 websites for each search engine result page (SERP). A common feature was the newsworthiness of the topic, with news outlets representing 25–50% of the SERP, followed by unaffiliated websites (blogs, social media) that represented 27–41% and included most of the vaccine-negative websites. Between 12 and 24% of websites had a negative stance on vaccines, while most websites were pro-vaccine (43–70%). However, their ranking by Google varied. While in Google.com, the first vaccine-negative website was the 43rd in the SERP, there was one vaccine-negative webpage in the top 10 websites in both the British and Australian localized versions and in French and two in Italian, Portuguese, and Mandarin, suggesting that the information quality algorithm used by Google may work better in English. Many webpages mentioned celebrities in the context of the link between vaccines and autism, with Donald Trump most frequently. Few websites (1–5%) promoted complementary and alternative medicine (CAM) but 50–100% of these were also vaccine-negative suggesting that CAM users are more exposed to vaccine-negative information. This analysis highlights the need for monitoring the web for information impacting on vaccine uptake.

Search
Clear search
Close search
Google apps
Main menu