100+ datasets found

O
Corporate Website — Analytics — Top 100 search terms
data.qld.gov.au
researchdata.edu.au
html
Updated Aug 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brisbane City Council (2025). Corporate Website — Analytics — Top 100 search terms [Dataset]. https://www.data.qld.gov.au/dataset/corporate-website-analytics-top-100-search-terms
Explore at:
htmlAvailable download formats
Dataset updated
Aug 16, 2025
Dataset authored and provided by
Brisbane City Council
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.

Monthly analytics reports for the Brisbane City Council website

Information regarding the sessions for Brisbane City Council website during the month including search terms used.
Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat...
verifiedmarketresearch.com
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat Map Analytics), By Application (Social Media Management, Display Advertising Optimization), By Vertical (Baking, Financial Services And Insurance (BFSI), Retail), And Region for 2026-2032 [Dataset]. https://www.verifiedmarketresearch.com/product/web-analytics-market/
Explore at:
Dataset updated
Nov 15, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Web Analytics Market was valued at USD 6.16 Billion in 2024 and is projected to reach USD 13.6 Billion by 2032, growing at a CAGR of 18.58% from 2026 to 2032.

Web Analytics Market Drivers

Data-Driven Decision Making: Businesses increasingly rely on data-driven insights to optimize their online strategies. Web analytics provides valuable data on website traffic, user behavior, and conversion rates, enabling data-driven decision-making.

E-commerce Growth: The rapid growth of e-commerce has fueled the demand for web analytics tools to track online sales, customer behavior, and marketing campaign effectiveness.

Mobile Dominance: The increasing use of mobile devices for internet browsing has made mobile analytics a crucial aspect of web analytics. Businesses need to understand how users interact with their websites and apps on mobile devices.

analytics tools can be complex to implement and use, requiring technical expertise.
Z
Network Traffic Analysis: Data and Code
data.niaid.nih.gov
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honig, Joshua (2024). Network Traffic Analysis: Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11479410
Explore at:
Dataset updated
Jun 12, 2024
Dataset provided by
Ferrell, Nathan
Soni, Shreena
Homan, Sophia
Moran, Madeline
Honig, Joshua
Chan-Tin, Eric
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code:

Packet_Features_Generator.py & Features.py

To run this code:

pkt_features.py [-h] -i TXTFILE [-x X] [-y Y] [-z Z] [-ml] [-s S] -j

-h, --help show this help message and exit -i TXTFILE input text file -x X Add first X number of total packets as features. -y Y Add first Y number of negative packets as features. -z Z Add first Z number of positive packets as features. -ml Output to text file all websites in the format of websiteNumber1,feature1,feature2,... -s S Generate samples using size s. -j

Purpose:

Turns a text file containing lists of incomeing and outgoing network packet sizes into separate website objects with associative features.

Uses Features.py to calcualte the features.

startMachineLearning.sh & machineLearning.py

To run this code:

bash startMachineLearning.sh

This code then runs machineLearning.py in a tmux session with the nessisary file paths and flags

Options (to be edited within this file):

--evaluate-only to test 5 fold cross validation accuracy

--test-scaling-normalization to test 6 different combinations of scalers and normalizers

Note: once the best combination is determined, it should be added to the data_preprocessing function in machineLearning.py for future use

--grid-search to test the best grid search hyperparameters - note: the possible hyperparameters must be added to train_model under 'if not evaluateOnly:' - once best hyperparameters are determined, add them to train_model under 'if evaluateOnly:'

Purpose:

Using the .ml file generated by Packet_Features_Generator.py & Features.py, this program trains a RandomForest Classifier on the provided data and provides results using cross validation. These results include the best scaling and normailzation options for each data set as well as the best grid search hyperparameters based on the provided ranges.

Data

Encrypted network traffic was collected on an isolated computer visiting different Wikipedia and New York Times articles, different Google search queres (collected in the form of their autocomplete results and their results page), and different actions taken on a Virtual Reality head set.

Data for this experiment was stored and analyzed in the form of a txt file for each experiment which contains:

First number is a classification number to denote what website, query, or vr action is taking place.

The remaining numbers in each line denote:

The size of a packet,

and the direction it is traveling.

negative numbers denote incoming packets

positive numbers denote outgoing packets

Figure 4 Data

This data uses specific lines from the Virtual Reality.txt file.

The action 'LongText Search' refers to a user searching for "Saint Basils Cathedral" with text in the Wander app.

The action 'ShortText Search' refers to a user searching for "Mexico" with text in the Wander app.

The .xlsx and .csv file are identical

Each file includes (from right to left):

The origional packet data,

each line of data organized from smallest to largest packet size in order to calculate the mean and standard deviation of each packet capture,

and the final Cumulative Distrubution Function (CDF) caluclation that generated the Figure 4 Graph.
Website Statistics
data.wu.ac.at
data.europa.eu
csv, pdf
Updated Jun 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
Explore at:
csv, pdfAvailable download formats
Dataset updated
Jun 11, 2018
Dataset provided by
Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
DataForSEO Labs API for keyword research and search analytics, real-time...
datarade.ai
.json
Updated Jun 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 4, 2021
Dataset provided by
Authors
DataForSEO
Area covered
Korea (Democratic People's Republic of), Tokelau, Armenia, Azerbaijan, Kenya, Cocos (Keeling) Islands, Mauritania, Micronesia (Federated States of), Morocco, Isle of Man
Description
DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

You will find well-rounded ways to scout the competitors:

• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.
google.com Website Traffic, Ranking, Analytics [July 2025]
semrush.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). google.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/google.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
google.com is ranked #1 in US with 101.35B Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!
m
Enterprise Website Analytics Software Market Size And Projections
marketresearchintellect.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Intellect (2025). Enterprise Website Analytics Software Market Size And Projections [Dataset]. https://www.marketresearchintellect.com/product/global-enterprise-website-analytics-software-market-size-and-forecast/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Market Research Intellect
License
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Area covered
Global
Description
Check out Market Research Intellect's Enterprise Website Analytics Software Market Report, valued at USD 3.5 billion in 2024, with a projected growth to USD 8.1 billion by 2033 at a CAGR of 12.8% (2026-2033).
youtube.com Website Traffic, Ranking, Analytics [July 2025]
semrush.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). youtube.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/youtube.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
YouTube, Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
youtube.com is ranked #1 in KR with 47.12B Traffic. Categories: Newspapers, Online Services. Learn more about website traffic, market share, and more!
Total global visitor traffic to Google.com 2024
statista.com
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.
health-check.jp Website Traffic, Ranking, Analytics [June 2025]
fadxfab.com
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). health-check.jp Website Traffic, Ranking, Analytics [June 2025] [Dataset]. https://fadxfab.com/website/health-check.jp/overview/
Explore at:
Dataset updated
Jul 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://fadxfab.com/company/legal/terms-of-service/https://fadxfab.com/company/legal/terms-of-service/
Time period covered
Jul 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
health-check.jp is ranked #8145 in JP with 320.84K Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!
D
Site Analytics: Catalog Search Terms
data.transportation.gov
application/rdfxml +5
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Site Analytics: Catalog Search Terms [Dataset]. https://data.transportation.gov/Administrative/Site-Analytics-Catalog-Search-Terms/nvqb-c5pv
Explore at:
csv, json, application/rssxml, tsv, application/rdfxml, xmlAvailable download formats
Dataset updated
Aug 11, 2025
Description
The Catalog Search Terms dataset captures the words and phrases input by users in search bars that look through the data catalog for relevant information. Data can also be categorized by user segments.
Impact of AI on website traffic anticipated by digital marketers worldwide...
statista.com
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Impact of AI on website traffic anticipated by digital marketers worldwide 2023 [Dataset]. https://www.statista.com/statistics/1410386/impact-ai-website-traffic-worldwide/
Explore at:
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
According to the results of a survey conducted worldwide in 2023, nearly **** of responding digital marketers believed artificial intelligence (AI) would have a positive impact on website search traffic in the next five years. Some ** percent stated AI would have a neutral effect, while ** percent agreed that the technology would negatively impact search traffic.
W
Website Speed Test Report
datainsightsmarket.com
doc, pdf, ppt
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Website Speed Test Report [Dataset]. https://www.datainsightsmarket.com/reports/website-speed-test-1981921
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Feb 10, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global website speed test market size was valued at USD 363.9 million in 2022 and is projected to expand at a compound annual growth rate (CAGR) of 15.3% from 2023 to 2032. The growth of the market is attributed to the increasing adoption of online platforms and the need for businesses to optimize their websites for better user experience. Key drivers of the website speed test market include the growing demand for mobile web browsing, the proliferation of content-heavy websites, and the increasing use of personalized content. Additionally, the increasing adoption of cloud-based solutions and the growing awareness of the importance of website performance are expected to drive the growth of the market over the forecast period.
W
Website Down Checker Report
marketresearchforecast.com
doc, pdf, ppt
Updated Feb 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Website Down Checker Report [Dataset]. https://www.marketresearchforecast.com/reports/website-down-checker-17553
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Feb 1, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Website Down Checker Market Overview: The global website down checker market is expected to reach a value of approximately USD 65.6 million by 2033, exhibiting a CAGR of 6.7% during the forecast period (2025-2033). This growth can be attributed to the increasing reliance on websites for business operations and online transactions, coupled with the growing prevalence of sophisticated cyber threats. The demand for website down checkers is also driven by the need for proactive monitoring and rapid response to website outages. Cloud-based solutions are gaining traction due to their flexibility, scalability, and cost-effectiveness. The market is segmented based on type (cloud-based and on-premises) and application (personal and enterprise). Competitive Landscape and Regional Insights: Major players in the website down checker market include: • Website Down Checker • Website Status Checker • IsSiteDown.co.uk • Downdetector • Is My Website Down • Domsignal • Is Site Down • OnlineOrNot • Uptrends • Site24x7 Check Website Availability North America is expected to retain its dominant position in the market due to the high adoption of advanced technologies and stringent data privacy regulations. The Asia Pacific region is projected to exhibit significant growth potential due to the rapidly expanding digital landscape and increasing awareness about the importance of website uptime. Key factors shaping the market include the rise of mobile internet, growing adoption of the Internet of Things (IoT), and the need for ensuring uninterrupted online services.
looka.com Website Traffic, Ranking, Analytics [July 2025]
semrush.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). looka.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/looka.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
looka.com is ranked #5490 in IN with 3.33M Traffic. Categories: AI, Online Services. Learn more about website traffic, market share, and more!
d
TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR -...
datarade.ai
.json, .csv, .xls
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TagX (2024). TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR - CCPA Compliant [Dataset]. https://datarade.ai/data-products/tagx-web-browsing-clickstream-data-300k-users-north-america-tagx
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Sep 16, 2024
Dataset authored and provided by
TagX
Area covered
United States
Description
TagX Web Browsing Clickstream Data: Unveiling Digital Behavior Across North America and EU Unique Insights into Online User Behavior TagX Web Browsing clickstream Data offers an unparalleled window into the digital lives of 1 million users across North America and the European Union. This comprehensive dataset stands out in the market due to its breadth, depth, and stringent compliance with data protection regulations. What Makes Our Data Unique?

Extensive Geographic Coverage: Spanning two major markets, our data provides a holistic view of web browsing patterns in developed economies. Large User Base: With 300K active users, our dataset offers statistically significant insights across various demographics and user segments. GDPR and CCPA Compliance: We prioritize user privacy and data protection, ensuring that our data collection and processing methods adhere to the strictest regulatory standards. Real-time Updates: Our clickstream data is continuously refreshed, providing up-to-the-minute insights into evolving online trends and user behaviors. Granular Data Points: We capture a wide array of metrics, including time spent on websites, click patterns, search queries, and user journey flows.

Data Sourcing: Ethical and Transparent Our web browsing clickstream data is sourced through a network of partnered websites and applications. Users explicitly opt-in to data collection, ensuring transparency and consent. We employ advanced anonymization techniques to protect individual privacy while maintaining the integrity and value of the aggregated data. Key aspects of our data sourcing process include:

Voluntary user participation through clear opt-in mechanisms Regular audits of data collection methods to ensure ongoing compliance Collaboration with privacy experts to implement best practices in data anonymization Continuous monitoring of regulatory landscapes to adapt our processes as needed

Primary Use Cases and Verticals TagX Web Browsing clickstream Data serves a multitude of industries and use cases, including but not limited to:

Digital Marketing and Advertising:

Audience segmentation and targeting Campaign performance optimization Competitor analysis and benchmarking

E-commerce and Retail:

Customer journey mapping Product recommendation enhancements Cart abandonment analysis

Media and Entertainment:

Content consumption trends Audience engagement metrics Cross-platform user behavior analysis

Financial Services:

Risk assessment based on online behavior Fraud detection through anomaly identification Investment trend analysis

Technology and Software:

User experience optimization Feature adoption tracking Competitive intelligence

Market Research and Consulting:

Consumer behavior studies Industry trend analysis Digital transformation strategies

Integration with Broader Data Offering TagX Web Browsing clickstream Data is a cornerstone of our comprehensive digital intelligence suite. It seamlessly integrates with our other data products to provide a 360-degree view of online user behavior:

Social Media Engagement Data: Combine clickstream insights with social media interactions for a holistic understanding of digital footprints. Mobile App Usage Data: Cross-reference web browsing patterns with mobile app usage to map the complete digital journey. Purchase Intent Signals: Enrich clickstream data with purchase intent indicators to power predictive analytics and targeted marketing efforts. Demographic Overlays: Enhance web browsing data with demographic information for more precise audience segmentation and targeting.

By leveraging these complementary datasets, businesses can unlock deeper insights and drive more impactful strategies across their digital initiatives. Data Quality and Scale We pride ourselves on delivering high-quality, reliable data at scale:

Rigorous Data Cleaning: Advanced algorithms filter out bot traffic, VPNs, and other non-human interactions. Regular Quality Checks: Our data science team conducts ongoing audits to ensure data accuracy and consistency. Scalable Infrastructure: Our robust data processing pipeline can handle billions of daily events, ensuring comprehensive coverage. Historical Data Availability: Access up to 24 months of historical data for trend analysis and longitudinal studies. Customizable Data Feeds: Tailor the data delivery to your specific needs, from raw clickstream events to aggregated insights.

Empowering Data-Driven Decision Making In today's digital-first world, understanding online user behavior is crucial for businesses across all sectors. TagX Web Browsing clickstream Data empowers organizations to make informed decisions, optimize their digital strategies, and stay ahead of the competition. Whether you're a marketer looking to refine your targeting, a product manager seeking to enhance user experience, or a researcher exploring digital trends, our cli...
d
Custom Built Data Collection Tools
datarade.ai
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Decision Software (2023). Custom Built Data Collection Tools [Dataset]. https://datarade.ai/data-categories/web-browsing-data/datasets
Explore at:
.json, .xml, .csv, .xls, .txtAvailable download formats
Dataset updated
Nov 23, 2023
Dataset authored and provided by
Decision Software
Area covered
Kuwait, Jamaica, Iceland, Tanzania, Mongolia, Estonia, Saint Barthélemy, Mozambique, Aruba, Hungary
Description
Our advanced data extraction tool is designed to empower businesses, researchers, and developers by providing an efficient and reliable way to collect and organize information from any online source. Whether you're gathering market insights, monitoring competitors, tracking trends, or building data-driven applications, our platform offers a perfect solution for automating the extraction and processing of structured data from websites. With seamless integration of AI, our tool takes the process a step further, enabling smarter, more refined data extraction that adapts to your needs over time.

In a digital world where information is continuously updated, timely access to data is critical. Our tool allows you to set up automated data extraction schedules, ensuring that you always have access to the most current information. Whether you're tracking stock prices, monitoring social media trends, or gathering product information, you can configure extraction schedules to suit your needs. Our AI-powered system also allows the tool to learn and optimize based on the data it collects, improving efficiency and accuracy with repeated use. From frequent updates by the minute to less frequent daily, weekly, or monthly collections, our platform handles it all seamlessly.

Our tool doesn’t just gather data—it organizes it. The extracted information is automatically structured into easily usable formats like CSV, JSON, or XML, making it ready for immediate use in applications, databases, or reports. We offer flexibility in the output format to ensure smooth integration with your existing tools and workflows. With AI-enhanced data parsing, the system recognizes and categorizes information more effectively, providing higher quality data for analysis, visualization, or importing into third-party systems.

Whether you’re collecting data from a handful of pages or millions, our system is built to scale. We can handle both small and large-scale extraction tasks with high reliability and performance. Our infrastructure ensures fast, efficient processing, even for the most demanding tasks. With parallel extraction capabilities, you can gather data from multiple sources simultaneously, reducing the time it takes to compile large datasets. AI-powered optimization further improves performance, making the extraction process faster and more adaptive to fluctuating data volumes.

Our tool doesn’t stop at extraction. We provide options for enriching the data by cross-referencing it with other sources or applying custom rules to transform raw information into more meaningful insights. This leads to a more insightful and actionable dataset, giving you a competitive edge through superior data-driven decision-making.

Modern websites often use dynamic content generated by JavaScript, which can be challenging to extract. Our tool, enhanced with AI, is designed to handle even the most complex web architectures, including dynamic loading, infinite scrolling, and paginated content.

Finally, our platform provides detailed logs of all extraction activities, giving you full visibility into the process. With built-in analytics, AI-powered insights can help you monitor progress, and identify issues.

In today’s fast-paced digital world, access to accurate, real-time data is critical for success. Our AI-integrated data extraction tool offers a reliable, flexible, and scalable solution to help you gather and organize the information you need with minimal effort. Whether you’re looking to gain a competitive edge, conduct in-depth research, or build sophisticated applications, our platform is designed to meet your needs and exceed expectations.
Information Organizations and Websites Performance
kaggle.com
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Information Management Research Lab (2020). Information Organizations and Websites Performance [Dataset]. http://doi.org/10.34740/kaggle/dsv/1494933
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1494933
Dataset updated
Sep 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Information Management Research Lab
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Notice: You can check the new version 0.9.6 at the official page of Information Management Lab and at the Google Data Studio as well.

Description of the Report and Topic Justification

Now that the ICTs have matured, Information Organizations such as Libraries, Archives and Museums, also known as LAMs, proceed into the utilization of web technologies that are capable to expand the visibility and findability of their content. Within the current flourishing era of the semantic web, LAMs have voluminous amounts of web-based collections that are presented and digitally preserved through their websites. However, prior efforts indicate that LAMs suffer from fragmentation regarding the determination of well-informed strategies for improving the visibility and findability of their content on the Web (Vállez and Ventura, 2020; Krstić and Masliković, 2019; Voorbij, 2010). Several reasons related to this drawback. As such, administrators’ lack of data analytics competency in extracting and utilizing technical and behavioral datasets for improving visibility and awareness from analytics platforms; the difficulties in understanding web metrics that integrated into performance measurement systems; and hence the reduced capabilities in defining key performance indicators for greater usability, visibility, and awareness.

In this enriched and updated technical report, the authors proceed into an examination of 504 unique websites of Libraries, Archives and Museums from all over the world. It is noted that the current report has been expanded by up to 14,81% of the prior one Version 0.9.5 of 439 domains examinations. The report aims to visualize the performance of the websites in terms of technical aspects such as their adequacy to metadata description of their content and collections, their loading speed, and security. This constitutes an important stepping-stone for optimization, as the higher the alignment with the technical compliencies, the greater the users’ behavior and usability within the examined websites, and thus their findability and visibility level in search engines (Drivas et al. 2020; Mavridis and Symeonidis 2015; Agarwal et al. 2012).

One step further, within this version, we include behavioral analytics about users engagement with the content of the LAMs websites. More specifically, web analytics metrics are included such as Visit Duration, Pages per Visit, and Bounce Rates for 121 domains. We also include web analytics regarding the channels that these websites acquire their users, such as Direct traffic, Search Engines, Referral, Social Media, Email, and Display Advertising. SimilarWeb API was used to gather web data about the involved metrics.

In the first pages of this report, general information is presented regarding the names of the examined organizations. This also includes their type, their geographical location, information about the adopted Content Management Systems (CMSs), and web server software types of integration per website. Furthermore, several other data are visualized related to the size of the examined Information Organizations in terms of the number of unique webpages within a website, the number of images, internal and external links and so on.

Moreover, as a team, we proceed into the development of several factors that are capable to quantify the performance of websites. Reliability analysis takes place for measuring the internal consistency and discriminant validity of the proposed factors and their included variables. For testing the reliability, cohesion, and consistency of the included metrics, Cronbach’s Alpha (a), McDonald’s ω and Guttman λ-2 and λ-6 are used.
- For Cronbach’s, a range of .550 up to .750 indicates an acceptable level of reliability and .800 or higher a very good level (Ursachi, Horodnic, and Zait, 2015). - McDonald’s ω indicator has the advantage to measure the strength of the association between the proposed variables. More specifically, the closer to .999 the higher the strength association between the variables and vice versa (Şimşek and Noyan, 2013). - Gutman’s λ-2 and λ-6 work verifiably to Cronbach’s a as they estimate the trustworthiness of variance of the gathered web analytics metrics. Low values less than .450 indicate high bias among the harvested web metrics, while values higher than .600 and above increase the trustworthiness of the sample (Callender and Osburn, 1979). -Kaiser–Meyer–Olkin (KMO) and Bartlett’s Test of Sphericity indicators are used for measuring the cohesion of the involved metrics. KMO and Bartlett’s test indicates that the closer the value is to .999 amongst the involved items, the higher the cohesion and consistency of them for potential categorization (Dziuban and S...
Global market share of leading desktop search engines 2015-2025
statista.com
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global market share of leading desktop search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
Explore at:
Dataset updated
Apr 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2015 - Mar 2025
Area covered
Worldwide
Description
As of March 2025, Google represented 79.1 percent of the global online search engine market on desktop devices. Despite being much ahead of its competitors, this represents the lowest share ever recorded by the search engine in these devices for over two decades. Meanwhile, its long-time competitor Bing accounted for 12.21 percent, as tools like Yahoo and Yandex held shares of over 2.9 percent each. Google and the global search market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2024, with a market capitalization of 2.02 trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2024 with roughly 348.16 billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than 63 percent of internet users in Russia used Yandex, whereas Google users represented little over 33 percent. Meanwhile, Baidu was the most used search engine in China, despite a strong decrease in the percentage of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. By the end of 2024, nearly half of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over 21 percent of users in Mexico said they used Yahoo.
n
Repository Analytics and Metrics Portal (RAMP) 2018 data
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2018 data [Dataset]. http://doi.org/10.5061/dryad.ffbg79cvp
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ffbg79cvp
Dataset updated
Jul 27, 2021
Dataset provided by
University of New Mexico
Montana State University
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.

Methods

RAMP Data Documentation – January 1, 2017 through August 18, 2018

Data Collection

RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search.

Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

The data in these CSV files include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.

Data Collection from August 19, 2018 Onward

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository

Facebook

Twitter

Click to copy link

Link copied

Cite

Brisbane City Council (2025). Corporate Website — Analytics — Top 100 search terms [Dataset]. https://www.data.qld.gov.au/dataset/corporate-website-analytics-top-100-search-terms

Corporate Website — Analytics — Top 100 search terms

Explore at:

htmlAvailable download formats

Dataset updated

Aug 16, 2025

Dataset authored and provided by

Brisbane City Council

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.

Monthly analytics reports for the Brisbane City Council website

Information regarding the sessions for Brisbane City Council website during the month including search terms used.

Clear search

Close search

Google apps

Main menu

Corporate Website — Analytics — Top 100 search terms

Web Analytics Market By Solution (Search Engine Tracking And Ranking, Heat...

Network Traffic Analysis: Data and Code

Website Statistics

DataForSEO Labs API for keyword research and search analytics, real-time...

google.com Website Traffic, Ranking, Analytics [July 2025]

Enterprise Website Analytics Software Market Size And Projections

youtube.com Website Traffic, Ranking, Analytics [July 2025]

Total global visitor traffic to Google.com 2024

health-check.jp Website Traffic, Ranking, Analytics [June 2025]

Site Analytics: Catalog Search Terms

Impact of AI on website traffic anticipated by digital marketers worldwide...

Website Speed Test Report

Website Down Checker Report

looka.com Website Traffic, Ranking, Analytics [July 2025]

TagX Web Browsing clickstream Data - 300K Users North America, EU - GDPR -...

Custom Built Data Collection Tools

Information Organizations and Websites Performance

Description of the Report and Topic Justification

Global market share of leading desktop search engines 2015-2025

Repository Analytics and Metrics Portal (RAMP) 2018 data

Corporate Website — Analytics — Top 100 search termsSee More Versions

Corporate Website — Analytics — Top 100 search terms