22 datasets found

Leading websites worldwide 2024, by monthly visits
statista.com
flwrdeptvarieties.store
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
Explore at:
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
World
Description
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
DataForSEO Labs API for keyword research and search analytics, real-time...
datarade.ai
.json
Updated Jun 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 4, 2021
Dataset provided by
Authors
DataForSEO
Area covered
Tokelau, Korea (Democratic People's Republic of), Kenya, Morocco, Isle of Man, Cocos (Keeling) Islands, Mauritania, Micronesia (Federated States of), Azerbaijan, Armenia
Description
DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

You will find well-rounded ways to scout the competitors:

• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.
ScrapeHero Data Cloud - Free and Easy to use
datarade.ai
.json, .csv
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scrapehero (2022). ScrapeHero Data Cloud - Free and Easy to use [Dataset]. https://datarade.ai/data-products/scrapehero-data-cloud-free-and-easy-to-use-scrapehero
Explore at:
.json, .csvAvailable download formats
Dataset updated
Feb 8, 2022
Dataset provided by
ScrapeHero
Authors
Scrapehero
Area covered
Bhutan, Bahamas, Dominica, Ghana, Slovakia, Anguilla, Niue, Portugal, Bahrain, Chad
Description
The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs

We have made it as simple as possible to collect data from websites

Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.

Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.

Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.

Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.

Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.

Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.

Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.

Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.

Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.

Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.

Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.

Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.

Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.

Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.

LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.

Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.

Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.

Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.

Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.

Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.

Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.

Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.

Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.

Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Share of global mobile website traffic 2015-2024
statista.com
flwrdeptvarieties.store
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of global mobile website traffic 2015-2024 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
Explore at:
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.

eCommerce events history in electronics store

kaggle.com

Updated Mar 29, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Michael Kechinov (2021). eCommerce events history in electronics store [Dataset]. https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-electronics-store/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 29, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Michael Kechinov

Description

About

This file contains behavior data for 5 months (Oct 2019 – Feb 2020) from a large electronics online store.

Each row in the file represents an event. All events are related to products and users. Each event is like many-to-many relation between products and users.

Data collected by Open CDP project. Feel free to use open source customer data platform.

More datasets

Checkout another datasets:

How to read it

There are different types of events. See below.

Semantics (or how to read it):

User user_id during session user_session added to shopping cart (property event_type is equal cart) product product_id of brand brand of category category_code (category_code) with price price at event_time

File structure

Property	Description
event_time	Time when event happened at (in UTC).
event_type	Only one kind of event: purchase.
product_id	ID of a product
category_id	Product's category ID
category_code	Product's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.
brand	Downcased string of brand name. Can be missed.
price	Float price of a product. Present.
user_id	Permanent user ID.
user_session	Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.

Event types

Events can be:

view - a user viewed a product
cart - a user added a product to shopping cart
remove_from_cart - a user removed a product from shopping cart
purchase - a user purchased a product

Multiple purchases per session

A session can have multiple purchase events. It's ok, because it's a single order.

Many thanks

Thanks to REES46 Marketing Platform for this dataset.

Using datasets in your works, books, education materials

You can use this dataset for free. Just mention the source of it: link to this page and link to REES46 Marketing Platform.

Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE...
datarade.ai
.json, .xml, .csv
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PromptCloud (2023). Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE Sample Available | Custom Scraping Services | PromptCloud [Dataset]. https://datarade.ai/data-products/ecommerce-data-product-and-customer-review-dataset-from-eco-promptcloud
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Nov 30, 2023
Dataset authored and provided by
PromptCloud
Area covered
Peru, Mayotte, Bahrain, Monaco, Wallis and Futuna, Nigeria, Botswana, Latvia, Niger, Estonia
Description
PromptCloud offers specialized data extraction services for eCommerce businesses, focusing on acquiring detailed product and customer review datasets from a variety of eCommerce websites. This service is instrumental for businesses aiming to refine their eCommerce strategies through in-depth market analysis, competitive research, and enhanced customer insights.

Customization is a key aspect of PromptCloud's offerings. PromptCloud provides bespoke scraping services, tailored to the unique requirements of each business. This adaptability is especially beneficial for companies seeking a competitive advantage in the dynamic eCommerce market. A distinctive feature of PromptCloud's approach is the provision of a free sample, allowing potential clients to experience the quality and accuracy of their data firsthand. This commitment to quality is reflected in their use of advanced technologies that ensure the delivery of precise, up-to-date data.

PromptCloud's versatility extends to data delivery, offering various formats like JSON, CSV, and XML. This flexibility facilitates seamless integration of data into different business systems, highlighting their focus on creating user-friendly and effective solutions.

PromptCloud positions itself as a vital resource for eCommerce businesses looking to utilize data for strategic planning and customer understanding. Their tailored scraping services, combined with a commitment to delivering current and accurate data, make PromptCloud the best option for businesses seeking to improve their market presence and deepen their understanding of customer behavior.

We are committed to putting data at the heart of your business. Reach out for a no-frills PromptCloud experience- professional, technologically ahead and reliable.

Play Store Apps

kaggle.com

Updated Sep 16, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2022). Play Store Apps [Dataset]. https://www.kaggle.com/datasets/whenamancodes/play-store-apps

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 16, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aman Chauhan

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.

Each app (row) has values for catergory, rating, size, and more.

The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!

googleplaystore.csv

Columns	Description
App	Application name
Category	Category the app belongs to
Ratings	Overall user rating of the app (as when scraped)
Reviews	Number of user reviews for the app (as when scraped)
Size	Size of the app (as when scraped)
Installs	Number of user downloads/installs for the app (as when scraped)
Type	Paid or Free
Price	Price of the app (as when scraped)
Content Rating	Age group the app is targeted at - Children / Mature 21+ / Adult
Genre	An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to
Current Ver	Current version of the app available on Play Store (as when scraped)
Android Ver	Min required Android version (as when scraped)

googleplaystore_user_reviews.csv

Columns	Description
App	Name of app
Translated Reviews	User review (Preprocessed and translated to English)
Sentiment	Positive/Negative/Neutral (Preprocessed)
Sentiment_polarity	Sentiment polarity score
Sentiment_subjectivity	Sentiment subjectivity score

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

e
Web map service DTK50
data.europa.eu
wms
Updated Mar 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Web map service DTK50 [Dataset]. https://data.europa.eu/data/datasets/54c4b80a-66c5-45a5-9e98-e3e9fde2c34b
Explore at:
wmsAvailable download formats
Dataset updated
Mar 1, 2022
Description
In the course of the construction of the spatial data infrastructure of the Free State of Thuringia (GDI-Th) selected geodata are made available to internal and external users for free use. From the geobase data of the central spatial data storage and spatial data provision component Geoproxy, data collections of particular public interest are made available to everyone as public data without restriction of access and free of charge via the Geoclient as a viewing service. These are data from the Digital Topographic Map 1:50 000 (DTK50).
An analysis of the current overlay journals
zenodo.org
data.niaid.nih.gov
csv
Updated Oct 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antti M. Rousi; Antti M. Rousi; Mikael Laakso; Mikael Laakso (2022). An analysis of the current overlay journals [Dataset]. http://doi.org/10.5281/zenodo.6617002
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6617002
Dataset updated
Oct 18, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antti M. Rousi; Antti M. Rousi; Mikael Laakso; Mikael Laakso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research data to accommodate the article "Overlay journals: a study of the current landscape" (https://doi.org/10.1177/09610006221125208)

Identifying the sample of overlay journals was an explorative process (occurring during April 2021 to February 2022). The sample of investigated overlay journals were identified by using the websites of Episciences.org (2021), Scholastica (2021), Free Journal Network (2021), Open Journals (2021), PubPub (2022), and Wikipedia (2021). In total, this study identified 34 overlay journals. Please see the paper for more details about the excluded journal types.

The journal ISSN numbers, manuscript source repositories, first overlay volumes, article volumes, publication languages, peer-review type, licence for published articles, author costs, publisher types, submission policy, and preprint availability policy were observed by inspecting journal editorial policies and submission guidelines found from journal websites. The overlay journals’ ISSN numbers were identified by examining journal websites and cross-checking this information with the Ulrich’s periodicals database (Ulrichsweb, 2021). Journals that published review reports, either with reviewers’ names or anonymously, were classified as operating with open peer-review. Publisher types defined by Laakso and Björk (2013) were used to categorise the findings concerning the publishers. If the journal website did not include publisher information, the editorial board was interpreted to publish the journal.

The Organisation for Economic Co-operation and Development (OECD) field of science classification was used to categorise the journals into different domains of science. The journals’ primary OECD field of sciences were defined by the authors through examining the journal websites.

Whether the journals were indexed in the Directory of Open Access Journals (DOAJ), Scopus, or Clarivate Analytics’ Web of Science Core collection’s journal master list was examined by searching the services with journal ISSN numbers and journal titles.

The identified overlay journals were examined from the viewpoint of both qualitative and quantitative journal metrics. The qualitative metrics comprised the Nordic expert panel rankings of scientific journals, namely the Finnish Publication Forum, the Danish Bibliometric Research Indicator and the Norwegian Register for Scientific Journals, Series and Publishers. Searches were conducted from the web portals of the above services with both ISSN numbers and journal titles. Clarivate Analytics’ Journal Citation Reports database was searched with the use of both ISSN numbers and journal titles to identify whether the journals had a Journal Citation Indicator (JCI), Two-Year Impact Factor (IF) and an Impact Factor ranking (IF rank). The examined Journal Impact Factors and Impact Factor rankings were for the year 2020 (as released in 2021).
c
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2016 – VERSION 1)
lindat.mff.cuni.cz
Updated Nov 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Oliver Rüdiger (2024). Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2016 – VERSION 1) [Dataset]. https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-5790
Explore at:
Dataset updated
Nov 12, 2024
Authors
Jan Oliver Rüdiger
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
*** german version see below ***

The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the German-language (visible) internet over time - with the aim of achieving comparability with the DeReKo (‘German Reference Corpus’ of the Leibniz Institute for the German Language - DeReKo volume 57 billion tokens - status: DeReKo Release 2024-I). The corpus is separated by year (here year 2016) and versioned (here version 1). Version 1 comprises (all years 2013-2024) 97.45 billion tokens.

The corpus is based on the data dumps from CommonCrawl (https://commoncrawl.org/). CommonCrawl is a non-profit organisation that provides copies of the visible Internet free of charge for research purposes.

The CommonCrawl WET raw data was first filtered by TLD (top-level domain). Only pages ending in the following TLDs were taken into account: ‘.at; .bayern; .berlin; .ch; .cologne; .de; .gmbh; .hamburg; .koeln; .nrw; .ruhr; .saarland; .swiss; .tirol; .wien; .zuerich’. These are the exclusive German-language TLDs according to ICANN (https://data.iana.org/TLD/tlds-alpha-by-domain.txt) as of 1 June 2024 - TLDs with a purely corporate reference (e.g. ‘.edeka; .bmw; .ford’) were excluded. The language of the individual documents (URLs) was then estimated with the help of NTextCat (https://github.com/ivanakcheurov/ntextcat) (via the CORE14 profile of NTextCat) - only those documents/URLs for which German was the most likely language were processed further (e.g. to exclude foreign-language material such as individual subpages). The third step involved filtering for manual selectors and filtering for 1:1 duplicates (within one year).

The filtering and subsequent processing was carried out using CorpusExplorer (http://hdl.handle.net/11234/1-2634) and our own (supplementary) scripts, and the TreeTagger (http://hdl.handle.net/11372/LRT-323) was used for automatic annotation. The corpus was processed on the HELIX HPC cluster. The author would like to take this opportunity to thank the state of Baden-Württemberg and the German Research Foundation (DFG) for the possibility to use the bwHPC/HELIX HPC cluster - funding code HPC cluster: INST 35/1597-1 FUGG.

Data content: - Tokens and record boundaries - Automatic lemma and POS annotation (using TreeTagger) - Metadata: - GUID - Unique identifier of the document - YEAR - Year of capture (please use this information for data slices) - Url - Full URL - Tld - Top-Level Domain - Domain - Domain without TLD (but with sub-domains if applicable) - DomainFull - Complete domain (incl. TLD) - DomainFull - Complete domain (incl. TLD) - Datum - (System Information): Date of the CorpusExplorer (date of capture by CommonCrawl - not date of creation/modification of the document). - Hash - (System Information): SHA1 hash of the CommonCrawl - Pfad - (System Information): Path of the cluster (raw data) - is supplied by the system.

Please note that the files are saved as *.cec6.gz. These are binary files of the CorpusExplorer (see above). These files ensure efficient archiving. You can use both CorpusExplorer and the ‘CEC6-Converter’ (available for Linux, MacOS and Windows - see: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-5705) to convert the data. The data can be exported in the following formats:

CATMA v6

CoNLL

CSV

CSV (only meta-data)

DTA TCF-XML

DWDS TEI-XML

HTML

IDS I5-XML

IDS KorAP XML

IMS Open Corpus Workbench

JSON

OPUS Corpus Collection XCES

Plaintext

SaltXML

SlashA XML

SketchEngine VERT

SPEEDy/CODEX (JSON)

TLV-XML

TreeTagger

TXM

WebLicht

XML

Please note that an export increases the storage space requirement extensively. The ‘CorpusExplorerConsole’ (https://github.com/notesjor/CorpusExplorer.Terminal.Console - available for Linux, MacOS and Windows) also offers a simple solution for editing and analysing. If you have any questions, please contact the author.

Legal information The data was downloaded on 01.11.2024. The use, processing and distribution is subject to §60d UrhG (german copyright law), which authorises the use for non-commercial purposes in research and teaching. LINDAT/CLARIN is responsible for long-term archiving in accordance with §69d para. 5 and ensures that only authorised persons can access the data. The data has been checked to the best of our knowledge and belief (on a random basis) - should you nevertheless find legal violations (e.g. right to be forgotten, personal rights, etc.), please write an e-mail to the author (amc_report@jan-oliver-ruediger.de) with the following information: 1) why this content is undesirable (please outline only briefly) and 2) how the content can be identified - e.g. file name, URL or domain, etc. The author will endeavour to identify the content. The author will endeavour to remove the content and re-upload the data (modified) within two weeks (new version). If
Most popular travel and tourism websites worldwide 2025
statista.com
flwrdeptvarieties.store
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most popular travel and tourism websites worldwide 2025 [Dataset]. https://www.statista.com/statistics/1215457/most-visited-travel-and-tourism-websites-worldwide/
Explore at:
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
In February 2025, booking.com was the most visited travel and tourism website worldwide. That month, Booking’s web page recorded around 517 million visits. Tripadvisor.com and airbnb.com followed in the ranking, with roughly 120 million and 99 million visits, respectively. Popular online travel agencies in the U.S. Online travel agencies (OTAs), such as Booking.com and Expedia, offer a wide variety of services, including online hotel bookings, flight reservations, and car rentals. According to the Statista Consumer Insights Global survey, when looking at flight search engine online bookings by brand in the United States, Expedia and Booking.com were the most popular options when it came to making online flight reservations in 2024. When focusing on hotel and private accommodation online bookings in the U.S., Booking.com was the most popular brand, followed by Airbnb, Expedia, and Hotels.com. Booking Holdings vs. Expedia Group Booking.com is one of the most popular sites of online travel group Booking Holdings, the leading online travel agency worldwide based on revenue, that also owns brands like Priceline, Kayak, and Agoda. In 2024, Booking Holdings' revenue amounted to almost 24 billion U.S. dollars, the highest figure reported by the company to date. Meanwhile, global revenue of Expedia Group, which manages brands like Expedia, Hotels.com, and Vrbo, reached nearly 14 billion U.S. dollars that year.
Total global visitor traffic to Google.com 2024
statista.com
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.
a
Traffic
open-data-cgvar.hub.arcgis.com
ps-dubai.hub.arcgis.com
+1more
Updated Mar 11, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Conseil Départemental du Var (2014). Traffic [Dataset]. https://open-data-cgvar.hub.arcgis.com/datasets/traffic
Explore at:
Dataset updated
Mar 11, 2014
Dataset authored and provided by
Conseil Départemental du Var
License
http://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdfhttp://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdf
Area covered
Description
The map layers in this service provide color-coded maps of the traffic conditions you can expect for the present time (the default). The map shows present traffic as a blend of live and typical information. Live speeds are used wherever available and are established from real-time sensor readings. Typical speeds come from a record of average speeds, which are collected over several weeks within the last year or so. Layers also show current incident locations where available. By changing the map time, the service can also provide past and future conditions. Live readings from sensors are saved for 12 hours, so setting the map time back within 12 hours allows you to see a actual recorded traffic speeds, supplemented with typical averages by default. You can choose to turn off the average speeds and see only the recorded live traffic speeds for any time within the 12-hour window. Predictive traffic conditions are shown for any time in the future.The color-coded traffic map layer can be used to represent relative traffic speeds; this is a common type of a map for online services and is used to provide context for routing, navigation, and field operations. A color-coded traffic map can be requested for the current time and any time in the future. A map for a future request might be used for planning purposes.The map also includes dynamic traffic incidents showing the location of accidents, construction, closures, and other issues that could potentially impact the flow of traffic. Traffic incidents are commonly used to provide context for routing, navigation and field operations. Incidents are not features; they cannot be exported and stored for later use or additional analysis.Data sourceEsri’s typical speed records and live and predictive traffic feeds come directly from HERE (www.HERE.com). HERE collects billions of GPS and cell phone probe records per month and, where available, uses sensor and toll-tag data to augment the probe data collected. An advanced algorithm compiles the data and computes accurate speeds. The real-time and predictive traffic data is updated every five minutes through traffic feeds.Data coverageThe service works globally and can be used to visualize traffic speeds and incidents in many countries. Check the service coverage web map to determine availability in your area of interest. Look at the coverage map to learn whether a country currently supports traffic. The support for traffic incidents can be determined by identifying a country. For detailed information on this service, visit the directions and routing documentation and the ArcGIS Help.SymbologyTraffic speeds are displayed as a percentage of free-flow speeds, which is frequently the speed limit or how fast cars tend to travel when unencumbered by other vehicles. The streets are color coded as follows:Green (fast): 85 - 100% of free flow speedsYellow (moderate): 65 - 85%Orange (slow); 45 - 65%Red (stop and go): 0 - 45%To view live traffic only—that is, excluding typical traffic conditions—enable the Live Traffic layer and disable the Traffic layer. (You can find these layers under World/Traffic > [region] > [region] Traffic). To view more comprehensive traffic information that includes live and typical conditions, disable the Live Traffic layer and enable the Traffic layer.ArcGIS Online organization subscriptionImportant Note:The World Traffic map service is available for users with an ArcGIS Online organizational subscription. To access this map service, you'll need to sign in with an account that is a member of an organizational subscription. If you don't have an organizational subscription, you can create a new account and then sign up for a 30-day trial of ArcGIS Online.
Global market share of leading desktop search engines 2015-2025
statista.com
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global market share of leading desktop search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
Explore at:
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2015 - Jan 2025
Area covered
Worldwide
Description
As of January 2025, online search engine Bing accounted for 12.23 percent of the global desktop search market, while market leader Google had a share of around 78.83 percent. Meanwhile, Yahoo's market share was 3.07 percent. Google in the global market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2023, with a market capitalization of 1,6 trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2023 with roughly 305.6 billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its’ alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than 63 percent of internet users in Russia used Yandex, whereas Google users were nearly 36 percent. Meanwhile, Baidu was the most used search engine in China, despite a strong percentage decrease of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. In the first quarter of 2022 nearly 56 percent of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over 27 percent of users in Mexico said they used Yahoo. Another search engine, Bing, operated by Microsoft, was the second most popular search engine in the United Kingdom after Google.
Total global visitor traffic to Wikipedia.org 2024
statista.com
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Total global visitor traffic to Wikipedia.org 2024 [Dataset]. https://www.statista.com/statistics/1259907/wikipedia-website-traffic/
Explore at:
Dataset updated
Nov 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
World
Description
In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.
Book piracy sites in the U.S. 2017
statista.com
Updated Mar 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2021). Book piracy sites in the U.S. 2017 [Dataset]. https://www.statista.com/statistics/688411/book-piracy-sites/
Explore at:
Dataset updated
Mar 22, 2021
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
United States
Description
According to a survey held in the United States in 2017, 50 percent of respondents admitted to using 4shared.com to access e-books illegally. Book sharing platforms like 4shared.com may appear innocent at first glance, but this particular site is the most popular among consumers looking to illegally download e-books, with Uploaded.net and Bookos.org ranking second and third as the most used websites for this purpose.

Does downloading e-books illegally really matter?

Illegal e-book downloads are a serious problem for authors, and present real risks to a writer’s career. This kind of piracy can directly affect an author’s income as genuine sales give way to free, illegal downloads which are shared across the web and passed from reader to reader.

Unfortunately, social media platforms only fuel this behavior. Reddit has multiple forums about e-book piracy. These forums allow users to discuss different piracy methods and give each other tips on the best illegitimate e-book download apps, websites and torrent files.

Can e-book piracy be stopped?

There are ongoing efforts to prevent e-book piracy from continuing or getting worse. Sadly this is not an easy task, given the sheer amount of options available to readers seeking ways to access paid content for free. Online guides for authors about illegal book downloads can help in tackling the problem when it arises or assist book writers in weighing up whether or not to try to address the issue. Methods such as digital rights management (DRM) could theoretically help to decrease illegal e-book distribution, but this is not a popular option as it heavily restricts how readers can access books online. Sadly though, e-book piracy is almost impossible to stop.
Leading e-commerce and shopping websites worldwide 2024, based on visit...
statista.com
flwrdeptvarieties.store
+1more
Updated Feb 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading e-commerce and shopping websites worldwide 2024, based on visit share [Dataset]. https://www.statista.com/statistics/1198949/most-visited-websites-in-the-retail-sector-worldwide/
Explore at:
Dataset updated
Feb 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 2024
Area covered
Worldwide
Description
Amazon's global platform 'amazon.com' was the most popular e-commerce and shopping website worldwide, accounting for more than 12.92 percent of desktop visits to sites in this category in April 2024. Second place went to eBay.com with roughly three percent. Amazon leads the way There is no denying the dominance of Amazon in the e-commerce industry. By 2026, Amazon's worldwide net sales are estimated to exceed one trillion U.S. dollars. In 2022, amazon.com garnered over three billion monthly visitors, maintaining its spot as the most popular retail website worldwide. As of April 2024, the leading social media traffic referrers to amazon.com were YouTube, Facebook, and X. Online shopping Amazon’s strong position is also due to shoppers’ preference for online marketplaces. As of April 2024, nearly one-third of online consumers opted for online marketplaces over all other digital channels. This category of platforms was also ranked as the e-commerce channel delivering the best customer experience in 2024. According to shoppers worldwide, the three most important changes that could make their digital shopping experience better were faster delivery, free returns, and more convenient shipping conditions.
Share of mobile internet traffic in global regions 2025
statista.com
flwrdeptvarieties.store
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of mobile internet traffic in global regions 2025 [Dataset]. https://www.statista.com/statistics/306528/share-of-mobile-internet-traffic-in-global-regions/
Explore at:
Dataset updated
Jan 29, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2025
Area covered
Worldwide
Description
In January 2025 mobile devices excluding tablets accounted for over 62 percent of web page views worldwide. Meanwhile, over 75 percent of webpage views in Africa were generated via mobile. In contrast, just over half of web traffic in North America still took place via desktop connections with mobile only accounting for 51.1 percent of total web traffic. While regional infrastructure remains an important factor in broadband vs. mobile coverage, most of the world has had their eyes on the recent 5G rollout across the globe, spearheaded by tech-leaders China and the United States. The number of mobile 5G subscriptions worldwide is forecast to reach more than 8 billion by 2028. Social media: room for growth in Africa and southern Asia Overall, more than 92 percent of the world’s mobile internet subscribers are also active on social media. A fast-growing market, with newcomers such as TikTok taking the world by storm, marketers have been cashing in on social media’s reach. Overall, social media penetration is highest in Europe and America while in Africa and southern Asia, there is still room for growth. As of 2021, Facebook and Google-owned YouTube are the most popular social media platforms worldwide. Facebook and Instagram are most effective With nearly 3 billion users, it is no wonder that Facebook remains the social media avenue of choice for the majority of marketers across the world. Instagram, meanwhile, was the second most popular outlet. Both platforms are low-cost and support short-form content, known for its universal consumer appeal and answering to the most important benefits of using these kind of platforms for business and advertising purposes.
Most visited conservative websites in the U.S. 2023
statista.com
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Most visited conservative websites in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/1340485/usa-most-visited-conservative-websites/
Explore at:
Dataset updated
Dec 11, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Sep 2023
Area covered
United States
Description
In September 2023, Fox News ranked first among the most popular multiplatform conservative and right-wing websites in the United States with over 78.6 million unique visitors from mobile and desktop connections. Far-right website and printed magazine Epoch Times ranked second with approximately 5.9 million unique monthly visitors.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/

Leading websites worldwide 2024, by monthly visits

Explore at:

90 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Mar 24, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Nov 2024

Area covered

World

Description

In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

Clear search

Close search

Google apps

Main menu

Leading websites worldwide 2024, by monthly visits

DataForSEO Labs API for keyword research and search analytics, real-time...

ScrapeHero Data Cloud - Free and Easy to use

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Share of global mobile website traffic 2015-2024

eCommerce events history in electronics store

About

More datasets

How to read it

File structure

Event types

Multiple purchases per session

Many thanks

Using datasets in your works, books, education materials

Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE...

Play Store Apps

googleplaystore.csv

googleplaystore_user_reviews.csv

Web map service DTK50

An analysis of the current overlay journals

Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2016 – VERSION 1)

Most popular travel and tourism websites worldwide 2025

Total global visitor traffic to Google.com 2024

Traffic

Global market share of leading desktop search engines 2015-2025

Total global visitor traffic to Wikipedia.org 2024

Book piracy sites in the U.S. 2017

Leading e-commerce and shopping websites worldwide 2024, based on visit...

Share of mobile internet traffic in global regions 2025

Most visited conservative websites in the U.S. 2023

Leading websites worldwide 2024, by monthly visits