32 datasets found
  1. Data from: Analysis of the Quantitative Impact of Social Networks General...

    • figshare.com
    • produccioncientifica.ucm.es
    doc
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
    Explore at:
    docAvailable download formats
    Dataset updated
    Oct 14, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
    Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

    Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures

  2. A

    Alternative Data Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2024). Alternative Data Market Report [Dataset]. https://www.archivemarketresearch.com/reports/alternative-data-market-5021
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Dec 8, 2024
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    global
    Variables measured
    Market Size
    Description

    The Alternative Data Market size was valued at USD 7.20 billion in 2023 and is projected to reach USD 126.50 billion by 2032, exhibiting a CAGR of 50.6 % during the forecasts period. The use and processing of information that is not in financial databases is known as the alternative data market. Such data involves posts in social networks, satellite images, credit card transactions, web traffic and many others. It is mostly used in financial field to make the investment decisions, managing risks and analyzing competitors, giving a more general view on market trends as well as consumers’ attitude. It has been found that there is increasing requirement for the obtaining of data from unconventional sources as firms strive to nose ahead in highly competitive markets. Some current trend are the finding of AI and machine learning to drive large sets of data and the broadening utilization of the so called “Alternative Data” across industries that are not only the finance industry. Recent developments include: In April 2023, Thinknum Alternative Data launched new data fields to its employee sentiment datasets for people analytics teams and investors to use this as an 'employee NPS' proxy, and support highly-rated employers set up interviews through employee referrals. , In September 2022, Thinknum Alternative Data announced its plan to combine data Similarweb, SensorTower, Thinknum, Caplight, and Pathmatics with Lagoon, a sophisticated infrastructure platform to deliver an alternative data source for investment research, due diligence, deal sourcing and origination, and post-acquisition strategies in private markets. , In May 2022, M Science LLC launched a consumer spending trends platform, providing daily, weekly, monthly, and semi-annual visibility into consumer behaviors and competitive benchmarking. The consumer spending platform provided real-time insights into consumer spending patterns for Australian brands and an unparalleled business performance analysis. .

  3. Similarweb's Surge: A Sign of Digital Dominance? (SMWB) (Forecast)

    • kappasignal.com
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2024). Similarweb's Surge: A Sign of Digital Dominance? (SMWB) (Forecast) [Dataset]. https://www.kappasignal.com/2024/05/similarwebs-surge-sign-of-digital.html
    Explore at:
    Dataset updated
    May 22, 2024
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    Similarweb's Surge: A Sign of Digital Dominance? (SMWB)

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  4. Host country of organization for 86 websites in study.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Host country of organization for 86 websites in study. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Host country of organization for 86 websites in study.

  5. f

    Summary of results comparing Google Analytics and SimilarWeb for total...

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Summary of results comparing Google Analytics and SimilarWeb for total visits, unique visitors, bounce rate, and average session duration. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Difference uses Google Analytics as the Baseline. Results based on Paired t-Test for Hypotheses Supported.

  6. A

    ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

    Methodology

    The data collected originates from SimilarWeb.com.

    Source

    For the analysis and study, go to The Concept Center

    This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

    How to use this dataset

    • Analyze 11/1/2016 in relation to 2/1/2017
    • Study the influence of 4/1/2017 on 1/1/2017
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Chase Willden

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  7. Traffic Acquisition to LAMs Websites

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioannis C. Drivas; Ioannis C. Drivas; Dimitrios Kouis; Dimitrios Kouis (2022). Traffic Acquisition to LAMs Websites [Dataset]. http://doi.org/10.5281/zenodo.6505277
    Explore at:
    Dataset updated
    Apr 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ioannis C. Drivas; Ioannis C. Drivas; Dimitrios Kouis; Dimitrios Kouis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preliminary research efforts regarding Social Media Platforms and their contribution to website traffic in LAMs. Through the Similar Web API, the leading social networks (Facebook, Twitter, Youtube, Instagram, Reddit, Pinterest, LinkedIn) that drove traffic to each one of the 220 cases in our dataset were identified and analyzed in the first sheet. Aggregated results proved that Facebook platform was responsible for 46.1% of social traffic (second sheet).

  8. f

    Comparison of definitions of total visits, unique visitors, bounce rate, and...

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Comparison of definitions of total visits, unique visitors, bounce rate, and session duration conceptually and for the two analytics platforms: Google Analytics and SimilarWeb. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of definitions of total visits, unique visitors, bounce rate, and session duration conceptually and for the two analytics platforms: Google Analytics and SimilarWeb.

  9. SimilarWeb (SMWB) - Tracking Digital Trends: Will it Drive Growth?...

    • kappasignal.com
    Updated Oct 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2024). SimilarWeb (SMWB) - Tracking Digital Trends: Will it Drive Growth? (Forecast) [Dataset]. https://www.kappasignal.com/2024/10/similarweb-smwb-tracking-digital-trends.html
    Explore at:
    Dataset updated
    Oct 5, 2024
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    SimilarWeb (SMWB) - Tracking Digital Trends: Will it Drive Growth?

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  10. SMWB Similarweb Ltd. Ordinary Shares (Forecast)

    • kappasignal.com
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2022). SMWB Similarweb Ltd. Ordinary Shares (Forecast) [Dataset]. https://www.kappasignal.com/2022/12/smwb-similarweb-ltd-ordinary-shares.html
    Explore at:
    Dataset updated
    Dec 7, 2022
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    SMWB Similarweb Ltd. Ordinary Shares

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  11. f

    Website type for the 86 websites in study.

    • figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Website type for the 86 websites in study. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Website type for the 86 websites in study.

  12. f

    Industry vertical of organization for 86 websites in study.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Industry vertical of organization for 86 websites in study. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Industry vertical of organization for 86 websites in study.

  13. Dynamic web page change content detection

    • zenodo.org
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damir Pozderac; Damir Pozderac; Ehlimana Cogo; Ehlimana Cogo; Irfan Prazina; Irfan Prazina; Emir Cogo; Emir Cogo; Šeila Bećirović; Šeila Bećirović; Vensada Okanovic; Vensada Okanovic (2025). Dynamic web page change content detection [Dataset]. http://doi.org/10.5281/zenodo.12699013
    Explore at:
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Damir Pozderac; Damir Pozderac; Ehlimana Cogo; Ehlimana Cogo; Irfan Prazina; Irfan Prazina; Emir Cogo; Emir Cogo; Šeila Bećirović; Šeila Bećirović; Vensada Okanovic; Vensada Okanovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 4 parts. "SimilarWeb dataset with screenshots" is created by scraping web elements, their CSS, and corresponding screenshots in three different time intervals for around 100 web pages. Based on this data, the "SimilarWeb dataset with SSIM column" is created with the target column containing the structural similarity index measure (SSIM) of the captured screenshots. This part of the dataset is used to train machine learning regression models. To evaluate approach, "Accessible web pages dataset" and "General use web pages dataset" parts of the dataset are used.

  14. f

    Comparison of user, site, and network-centric approaches to web analytics...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen (2023). Comparison of user, site, and network-centric approaches to web analytics data collection showing advantages, disadvantages, and examples of each approach at the time of the study. [Dataset]. http://doi.org/10.1371/journal.pone.0268212.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bernard J. Jansen; Soon-gyo Jung; Joni Salminen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of user, site, and network-centric approaches to web analytics data collection showing advantages, disadvantages, and examples of each approach at the time of the study.

  15. Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK

    • data.gov.hk
    Updated Oct 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2018). Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-hkma-banksvf-fraudulent-bank-scams
    Explore at:
    Dataset updated
    Oct 27, 2018
    Dataset provided by
    data.gov.hk
    Description

    This API is providing the information of press releases issued by the authorized institutions and other similar press releases issued by the HKMA in the past regarding fraudulent bank websites, phishing E-mails and similar scams information.

  16. Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Nutter; Mika Senghaas; Ludek Cizinsky; Peter Nutter; Mika Senghaas; Ludek Cizinsky (2023). Curlie Enhanced with LLM Annotations: Two Datasets for Advancing Homepage2Vec's Multilingual Website Classification [Dataset]. http://doi.org/10.5281/zenodo.10413068
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Nutter; Mika Senghaas; Ludek Cizinsky; Peter Nutter; Mika Senghaas; Ludek Cizinsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification

    This dataset contains two subsets of labeled website data, specifically created to enhance the performance of Homepage2Vec, a multi-label model for website classification. The datasets were generated using Large Language Models (LLMs) to provide more accurate and diverse topic annotations for websites, addressing a limitation of existing Homepage2Vec training data.

    Key Features:

    • LLM-generated annotations: Both datasets feature website topic labels generated using LLMs, a novel approach to creating high-quality training data for website classification models.
    • Improved multi-label classification: Fine-tuning Homepage2Vec with these datasets has been shown to improve its macro F1 score from 38% to 43% evaluated on a human-labeled dataset, demonstrating their effectiveness in capturing a broader range of website topics.
    • Multilingual applicability: The datasets facilitate classification of websites in multiple languages, reflecting the inherent multilingual nature of Homepage2Vec.

    Dataset Composition:

    • curlie-gpt3.5-10k: 10,000 websites labeled using GPT-3.5, context 2 and 1-shot
    • curlie-gpt4-10k: 10,000 websites labeled using GPT-4, context 2 and zero-shot

    Intended Use:

    • Fine-tuning and advancing Homepage2Vec or similar website classification models
    • Research on LLM-generated datasets for text classification tasks
    • Exploration of multilingual website classification

    Additional Information:

    Acknowledgments:

    This dataset was created as part of a project at EPFL's Data Science Lab (DLab) in collaboration with Prof. Robert West and Tiziano Piccardi.

  17. e

    Optimising website design for people with learning disabilities using...

    • b2find.eudat.eu
    Updated Oct 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Optimising website design for people with learning disabilities using 'trade-off' analysis - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/bada5cbe-7f8c-5bfb-ae6a-7b709926c464
    Explore at:
    Dataset updated
    Oct 22, 2023
    Description

    Although the internet may greatly assist information provision for people with learning disabilities (LD), much material is out of reach of this cohort, partly because of difficulties in navigating an electronic environment. However, there little evidence regarding what design features aid website use, and advice on the subject is conflicting. The question this study seeks to determine how web-mediated information can be optimally presented and organised for this cohort.The study will involve website usability testing with people with LD, comparing various designs and focusing on different attributes (text size, layout and navigation). Tasks will be suitable for undertaking by people with low literacy skills, involving only one action. 'Trade-off' techniques will be employed to analyse the data a tool of market research never before exploited in this context. They will both determine which attributes of a site have the greatest impact on performance, and provide a relative weighting to the importance of each. The findings will provide a rich picture of how information for people with LD may be best presented electronically. Time-on-task for information retrieval tasks using different web page interface designs Observational and interview fieldnotes

  18. Web Analytics Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Web Analytics Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/web-analytics-market-global-industry-analysis
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Web Analytics Market Outlook



    According to our latest research, the global web analytics market size was valued at USD 8.4 billion in 2024, reflecting robust growth driven by the increasing adoption of digital platforms across industries. The market is projected to expand at a compound annual growth rate (CAGR) of 17.2% from 2025 to 2033, reaching an estimated USD 36.8 billion by 2033. This significant upsurge is primarily attributed to the escalating demand for actionable insights, data-driven decision-making, and the proliferation of online consumer activity. As per the latest research, enterprises worldwide are leveraging advanced web analytics tools to enhance customer engagement, improve marketing strategies, and drive business outcomes.




    One of the principal growth factors fueling the web analytics market is the exponential increase in digitalization and internet penetration. Organizations across various sectors are rapidly transitioning their operations online, resulting in a surge of data generation through multiple digital touchpoints. This digital transformation has heightened the need for sophisticated web analytics solutions that can process vast volumes of data, extract meaningful patterns, and provide actionable insights. Moreover, the rise in e-commerce activities, coupled with the growing popularity of social media platforms, has created a fertile environment for the adoption of web analytics, enabling businesses to track consumer behavior, measure campaign effectiveness, and optimize user experiences.




    Another critical driver for the web analytics market is the integration of artificial intelligence (AI) and machine learning (ML) technologies. These advanced technologies are revolutionizing the way organizations analyze web data by enabling predictive analytics, real-time reporting, and personalized recommendations. AI-powered web analytics tools can automatically identify trends, anomalies, and customer preferences, empowering businesses to make data-driven decisions faster and more accurately. Furthermore, the increasing focus on omnichannel marketing strategies and the need to unify customer data across different platforms have further accelerated the demand for comprehensive web analytics solutions.




    The regulatory landscape and growing emphasis on data privacy and compliance are also shaping the web analytics market. With the implementation of stringent data protection regulations such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States, organizations are compelled to adopt web analytics tools that ensure data security and privacy. This has led to the development of privacy-centric analytics platforms that offer enhanced data governance features, enabling businesses to comply with global regulatory requirements while still deriving valuable insights from web data. The ability to balance data-driven innovation with privacy considerations is becoming a key differentiator for vendors in this dynamic market.




    From a regional perspective, North America continues to dominate the web analytics market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The region’s leadership is attributed to the presence of major technology providers, a mature digital ecosystem, and high levels of investment in analytics infrastructure. However, Asia Pacific is expected to witness the fastest growth during the forecast period, driven by the rapid adoption of digital technologies, expanding internet user base, and increasing investments in e-commerce and digital marketing. The growing awareness among businesses in emerging economies about the benefits of web analytics is further propelling market growth in this region.





    Component Analysis



    The web analytics market by component is bifurcated into software and services, with each segment playing a pivotal role in market expansion. The software segment holds the lion’s share of the market, driven by the continuous evolution of analytics plat

  19. Real time Advertiser's Auction

    • kaggle.com
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurav Anand (2020). Real time Advertiser's Auction [Dataset]. https://www.kaggle.com/saurav9786/real-time-advertisers-auction/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    Kaggle
    Authors
    Saurav Anand
    Description

    INTRODUCTION

    60% of the digital ad inventory is sold by publishers in Real Time first price Auctions.

    Once a user lands on a webpage, bidders (advertisers) bid for different ad slots on the page and the one with the highest winning bid displays their ad in the ad space and pays the amount he bid. This process encourages bid shading – bidding lesser than the perceived value of the ad space to maximize utilization for self while maintaining a particular win rate at lowest prices.

    Hence, for publishers, it becomes important to value their inventory (all the users that visit their website * all the ad slots they have on their websites) correctly so that a reserve price, or a minimum price can be set up for the auctions.

    In a first price auction, the highest bidder wins and pays the price they bid if it exceeds the reserve price. The optimal strategy of a bidder is to shade their bids (bid less than their true value of the inventory). However, bidder needs to win a certain amount to achieve their goals. This suggests they need to shade as much possible while maintaining a certain win rate.

    A bidder perceives a certain value out of every impression they win. Each bidder would like to maintain the value they derived out of this set of websites (given in the dataset) in June with a maximum deviation of 20%.

    Setting a reserve price induces this by causing bidders to lose at lower bids which encourages higher bidding and more publisher revenue. However, since most of these takes place through automated systems, there might be an unknown delay in setting reserve prices & reducing win rate of bidder & bidder changing their bid shading algorithm & increased publisher revenue.

    IMPORTANT TERMS: o Publisher – person who owns and publishes content on the website o Inventory – all the users that visit the website * all the ad slots present in the website for the observation period o Impressions - showing an ad to a user constitutes one impression. If the ad slot is present but an ad is not shown, it falls as “unfilled impression”. Inventory is the sum of impressions + unfilled impressions. o CPM – cost per Mille. This is one of the most important ways to measure performance. It is. Calculated as revenue/impressions * 1000. 'bids' and 'price' are measured in terms of CPM

  20. C

    Competitor Analysis Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Competitor Analysis Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/competitor-analysis-tools-1943431
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The market for competitor analysis tools is experiencing robust growth, driven by the increasing importance of competitive intelligence in today's dynamic business landscape. The surge in digital marketing and the need for businesses, both SMEs and large enterprises, to understand their competitive positioning fuels demand for sophisticated tools offering comprehensive data analysis and actionable insights. Cloud-based solutions are dominating the market due to their scalability, accessibility, and cost-effectiveness compared to on-premises deployments. Key players like SEMrush, Ahrefs, and SimilarWeb are establishing strong market presence through continuous innovation, comprehensive feature sets, and targeted marketing strategies. However, the market also faces challenges, including the rising costs of data acquisition and the complexity of integrating various tools into existing workflows. The competitive landscape is characterized by a mix of established players and emerging niche providers. Differentiation is achieved through unique data sources, specialized analytics capabilities, and the ability to integrate seamlessly with other marketing and business intelligence platforms. The North American and European markets currently hold a significant share, owing to high technology adoption and established digital marketing ecosystems. However, growth is expected in Asia-Pacific regions as businesses in developing economies increasingly adopt digital strategies and seek competitive advantages. The forecast period (2025-2033) suggests continued expansion, propelled by technological advancements like AI-powered insights and the expanding use of social media analytics within competitor analysis. The market's segmentation reflects varying needs across different business sizes and deployment preferences. While large enterprises typically opt for comprehensive, feature-rich solutions capable of handling large datasets and integrating with various systems, SMEs often prioritize cost-effective, user-friendly tools providing essential insights. The choice between cloud-based and on-premises solutions depends on factors like IT infrastructure, security considerations, and budget constraints. As the market matures, we anticipate further consolidation through mergers and acquisitions, and the emergence of more specialized tools catering to specific industry needs. The overall trajectory indicates continued strong growth, with a focus on enhanced data analysis, improved user experiences, and seamless integration within broader business intelligence platforms.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz (2022). Analysis of the Quantitative Impact of Social Networks General Data.doc [Dataset]. http://doi.org/10.6084/m9.figshare.21329421.v1
Organization logoOrganization logo

Data from: Analysis of the Quantitative Impact of Social Networks General Data.doc

Related Article
Explore at:
docAvailable download formats
Dataset updated
Oct 14, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
David Parra; Santiago Martínez Arias; Sergio Mena Muñoz
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

General data recollected for the studio " Analysis of the Quantitative Impact of Social Networks on Web Traffic of Cybermedia in the 27 Countries of the European Union". Four research questions are posed: what percentage of the total web traffic generated by cybermedia in the European Union comes from social networks? Is said percentage higher or lower than that provided through direct traffic and through the use of search engines via SEO positioning? Which social networks have a greater impact? And is there any degree of relationship between the specific weight of social networks in the web traffic of a cybermedia and circumstances such as the average duration of the user's visit, the number of page views or the bounce rate understood in its formal aspect of not performing any kind of interaction on the visited page beyond reading its content? To answer these questions, we have first proceeded to a selection of the cybermedia with the highest web traffic of the 27 countries that are currently part of the European Union after the United Kingdom left on December 31, 2020. In each nation we have selected five media using a combination of the global web traffic metrics provided by the tools Alexa (https://www.alexa.com/), which ceased to be operational on May 1, 2022, and SimilarWeb (https:// www.similarweb.com/). We have not used local metrics by country since the results obtained with these first two tools were sufficiently significant and our objective is not to establish a ranking of cybermedia by nation but to examine the relevance of social networks in their web traffic. In all cases, cybermedia whose property corresponds to a journalistic company have been selected, ruling out those belonging to telecommunications portals or service providers; in some cases they correspond to classic information companies (both newspapers and televisions) while in others they refer to digital natives, without this circumstance affecting the nature of the research proposed.
Below we have proceeded to examine the web traffic data of said cybermedia. The period corresponding to the months of October, November and December 2021 and January, February and March 2022 has been selected. We believe that this six-month stretch allows possible one-time variations to be overcome for a month, reinforcing the precision of the data obtained. To secure this data, we have used the SimilarWeb tool, currently the most precise tool that exists when examining the web traffic of a portal, although it is limited to that coming from desktops and laptops, without taking into account those that come from mobile devices, currently impossible to determine with existing measurement tools on the market. It includes:

Web traffic general data: average visit duration, pages per visit and bounce rate Web traffic origin by country Percentage of traffic generated from social media over total web traffic Distribution of web traffic generated from social networks Comparison of web traffic generated from social netwoks with direct and search procedures

Search
Clear search
Close search
Google apps
Main menu