5 datasets found
  1. Summary of results comparing Google Analytics and SimilarWeb for total...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Summary of results comparing Google Analytics and SimilarWeb for total visits, unique visitors, bounce rate, and average session duration. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Difference uses Google Analytics as the Baseline. Results based on Paired t-Test for Hypotheses Supported.

  2. Data from: Analysis of the Quantitative Impact of Social Networks General...

    • figshare.com
    • produccioncientifica.ucm.es
    doc
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
    Explore at:
    docAvailable download formats
    Dataset updated
    Oct 14, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
    Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

    Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures

  3. Women in Headlines: Bias

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Women in Headlines: Bias [Dataset]. https://www.kaggle.com/datasets/thedevastator/women-in-headlines-bias
    Explore at:
    zip(30108592 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Description

    Women in Headlines: Bias

    Investigating Gendered Language, Temporal Trends, and Themes

    By Amber Thomas [source]

    About this dataset

    This dataset contains all of the data used in the Pudding essay When Women Make Headlines published in January 2022. This dataset was created to analyze gendered language, bias and language themes in news headlines from across the world. It contains headlines from top50 news publications and news agencies from four major countries - USA, UK, India and South Africa - as published by SimilarWeb (as of 2021-06-06).

    To collect this data we used RapidAPI's google news API to query headlines containing one or more of keywords selected based on existing research done by Huimin Xu & team and The Swaddle team. We analyzed words used in headlines manually curating two dictionaries — gendered words about women (words that are explicitly gendered) and words that denote societal/behavioral stereotypes about women. To calculate bias scores, we utilized technology developed through Yasmeen Hitti & team’s research on gender bias text analysis. To categorize words used into themes (violence/crime, empowerment, race/ethnicity/identity etc), we manually curated four dictionaries utilizing Natural Language Processing packages for Python like spacy & nltk for our analysis. Plus, inverting polarity scores with vaderSentiment algorithm helped us shed light on differences between women-centered/non-women centered polarity levels as well as differences between global polarity baselines of each country's most visited publications & news agencies according to SimilarWeb 2020 statistics..

    This dataset enables journalists, researchers and educators researching issues related to gender equity within media outlets around the world further insights into potential disparities with just a few lines of code! Any discoveries made by using this data should provide valuable support for evidence-based argumentation . Let us advocate for greater awareness towards female representation better quality coverage!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a comprehensive look at the portrayal of women in headlines from 2010-2020. Using this dataset, researchers and data scientists can explore a range of topics including language used to describe women, bias associated with different topics or publications, and temporal patterns in headlines about women over time.

    To use this dataset effectively, it is helpful to understand the structure of the data. The columns include headline_no_site (the text of the headline without any information about which publication it is from), time (the date and time that the article was published), country (the country where it was published), bias score (calculated using Gender Bias Taxonomy V1.0) and year (the year that the article was published).

    By exploring these columns individually or combining them into groups such as by publication or by topic, there are many ways to make meaningful discoveries using this data set. For example, one could explore if certain news outlets employ more gender-biased language when writing about female subjects than other outlets or investigate whether female-centric stories have higher/lower bias scores than average for a particular topic across multiple countries over time. This type of analysis helps researchers to gain insight into how our culture's dialogue has evolved over recent years as relates to women in media coverage worldwide

    Research Ideas

    • A comparative, cross-country study of the usage of gendered language and the prevalence of gender bias in headlines to better understand regional differences.
    • Creating an interactive visualization showing the evolution of headline bias scores over time with respect to a certain topic or population group (such as women).
    • Analyzing how different themes are covered in headlines featuring women compared to those without, such as crime or violence versus empowerment or race and ethnicity, to see if there’s any difference in how they are portrayed by the media

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: headlines_reduced_temporal.csv | Column name | Description | |:---------------------|:-------------------------------------------------------------------------------------...

  4. Dynamic web page change content detection

    • zenodo.org
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damir Pozderac; Damir Pozderac; Ehlimana Cogo; Ehlimana Cogo; Irfan Prazina; Irfan Prazina; Emir Cogo; Emir Cogo; Šeila Bećirović; Šeila Bećirović; Vensada Okanovic; Vensada Okanovic (2025). Dynamic web page change content detection [Dataset]. http://doi.org/10.5281/zenodo.12699013
    Explore at:
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Damir Pozderac; Damir Pozderac; Ehlimana Cogo; Ehlimana Cogo; Irfan Prazina; Irfan Prazina; Emir Cogo; Emir Cogo; Šeila Bećirović; Šeila Bećirović; Vensada Okanovic; Vensada Okanovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 4 parts. "SimilarWeb dataset with screenshots" is created by scraping web elements, their CSS, and corresponding screenshots in three different time intervals for around 100 web pages. Based on this data, the "SimilarWeb dataset with SSIM column" is created with the target column containing the structural similarity index measure (SSIM) of the captured screenshots. This part of the dataset is used to train machine learning regression models. To evaluate approach, "Accessible web pages dataset" and "General use web pages dataset" parts of the dataset are used.

  5. Z

    Mapping 'the constructive turn' in comment sections of news websites

    • data.niaid.nih.gov
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monika Mačiulienė; Hannes Cools (2021). Mapping 'the constructive turn' in comment sections of news websites [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5703623
    Explore at:
    Dataset updated
    Nov 16, 2021
    Dataset provided by
    Vilnius Gediminas Technical University
    KU Leuven
    Authors
    Monika Mačiulienė; Hannes Cools
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The research project critically examines the guidelines of the comments sections of the twenty largest online news outlets over the last ten years. Rather than focusing on the familiar negative comments of news consumers and their narratives, we analyze and compare the news outlets’ guidelines and how they have led in what we call ‘a constructive turn’. We propose our own theoretical framework to analyze what is encouraged and what is discouraged in news outlets’ guidelines. Results show an increasing focus on constructiveness in the guidelines of the comment sections and a shift to more positivity, rather than on deleting and filtering negative or toxic comments. Although platforms differ in their views on the role of commenting and the definition of constructiveness, the turn towards the constructive design of the commenting platform is shared among them.

    This dataset contains the commentary guidelines in the top 20 English-language online news websites of December 2020 based on research conducted by Similar Web (Source: Similar Web for Gazette). For each news publication, the current commentary guidelines were scrapped from the internet, alongside earlier versions of their guidelines. In total, three moments were used to map the guidelines: 2021, 2015 and 2010. The content was analysed through coding using Nvivo software. We applied a bottom-up approach - by creating simple codes and eventually grouping them together. Each set of guidelines was coded on what behaviour was encouraged and what was discouraged by the news outlet, and what kind of discussion environment the news outlet expects from their commenters in general (e.g. entertaining, healthy, inclusive etc.).

    This dataset contains coded content for the project. Following logic was used in uploading the documents:

    1 - Nvivo project file - can be opened using Nvivo for Mac - contains all information (files, codes, etc.)

    We also upload more user-friendly data (the following documents are uploaded in MS Word format):

    2 - Codebook (provides the logical structure of coding applied + number of codes for each category) 3 - Code excerpts for discouraged elements found in the content 4 - Code excerpts for encouraged elements found in the content 5 - Code excerpts for discussion environment elements found in the content

    Disclaimer: The user-generated content guidelines of news media companies are their own intellectual property and we do not own any rights to it.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Summary of results comparing Google Analytics and SimilarWeb for total visits, unique visitors, bounce rate, and average session duration. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t006
Organization logo

Summary of results comparing Google Analytics and SimilarWeb for total visits, unique visitors, bounce rate, and average session duration.

Related Article
Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Jun 13, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Difference uses Google Analytics as the Baseline. Results based on Paired t-Test for Hypotheses Supported.

Search
Clear search
Close search
Google apps
Main menu