22 datasets found
  1. Leading websites worldwide 2024, by monthly visits

    • statista.com
    • flwrdeptvarieties.store
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2024
    Area covered
    World
    Description

    In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

  2. DataForSEO Labs API for keyword research and search analytics, real-time...

    • datarade.ai
    .json
    Updated Jun 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
    Explore at:
    .jsonAvailable download formats
    Dataset updated
    Jun 4, 2021
    Dataset provided by
    Authors
    DataForSEO
    Area covered
    Tokelau, Korea (Democratic People's Republic of), Kenya, Morocco, Isle of Man, Cocos (Keeling) Islands, Mauritania, Micronesia (Federated States of), Azerbaijan, Armenia
    Description

    DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

    ‱ Related Keywords from the “searches related to” element of Google SERP. ‱ Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. ‱ Keyword Ideas that fall into the same category as specified seed keywords. ‱ Historical Search Volume with current cost-per-click, and competition values.

    Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

    You will find well-rounded ways to scout the competitors:

    ‱ Domain Whois Overview with ranking and traffic info from organic and paid search. ‱ Ranked Keywords that any domain or URL has positions for in SERP. ‱ SERP Competitors and the rankings they hold for the keywords you specify. ‱ Competitors Domain with a full overview of its rankings and traffic from organic and paid search. ‱ Domain Intersection keywords for which both specified domains rank within the same SERPs. ‱ Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. ‱ Relevant Pages of the specified domain with rankings and traffic data. ‱ Domain Rank Overview with ranking and traffic data from organic and paid search. ‱ Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. ‱ Page Intersection keywords for which the specified pages rank within the same SERP.

    All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

    The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

    We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

    We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.

  3. ScrapeHero Data Cloud - Free and Easy to use

    • datarade.ai
    .json, .csv
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scrapehero (2022). ScrapeHero Data Cloud - Free and Easy to use [Dataset]. https://datarade.ai/data-products/scrapehero-data-cloud-free-and-easy-to-use-scrapehero
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 8, 2022
    Dataset provided by
    ScrapeHero
    Authors
    Scrapehero
    Area covered
    Bhutan, Bahamas, Dominica, Ghana, Slovakia, Anguilla, Niue, Portugal, Bahrain, Chad
    Description

    The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs

    We have made it as simple as possible to collect data from websites

    Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.

    Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.

    Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.

    Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.

    Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.

    Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.

    Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.

    Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.

    Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.

    Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.

    Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.

    Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.

    Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.

    Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.

    LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.

    Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.

    Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.

    Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.

    Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.

    Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.

    Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.

    Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.

    Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.

    Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.

  4. Google Analytics Sample

    • kaggle.com
    zip
    Updated Sep 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

    Content

    The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

    Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

    Fork this kernel to get started.

    Acknowledgements

    Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

    Banner Photo by Edho Pratama from Unsplash.

    Inspiration

    What is the total number of transactions generated per device browser in July 2017?

    The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

    What was the average number of product pageviews for users who made a purchase in July 2017?

    What was the average number of product pageviews for users who did not make a purchase in July 2017?

    What was the average total transactions per user that made a purchase in July 2017?

    What is the average amount of money spent per session in July 2017?

    What is the sequence of pages viewed?

  5. Share of global mobile website traffic 2015-2024

    • statista.com
    • flwrdeptvarieties.store
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of global mobile website traffic 2015-2024 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
    Explore at:
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Mobile accounts for approximately half of web traffic worldwide. In the last quarter of 2024, mobile devices (excluding tablets) generated 62.54 percent of global website traffic. Mobiles and smartphones consistently hoovered around the 50 percent mark since the beginning of 2017, before surpassing it in 2020. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.

  6. eCommerce events history in electronics store

    • kaggle.com
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kechinov (2021). eCommerce events history in electronics store [Dataset]. https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-electronics-store/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Michael Kechinov
    Description

    About

    This file contains behavior data for 5 months (Oct 2019 – Feb 2020) from a large electronics online store.

    Each row in the file represents an event. All events are related to products and users. Each event is like many-to-many relation between products and users.

    Data collected by Open CDP project. Feel free to use open source customer data platform.

    More datasets

    Checkout another datasets:

    1. https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store
    2. https://www.kaggle.com/mkechinov/ecommerce-purchase-history-from-electronics-store
    3. https://www.kaggle.com/mkechinov/ecommerce-events-history-in-cosmetics-shop
    4. https://www.kaggle.com/mkechinov/ecommerce-purchase-history-from-jewelry-store
    5. https://www.kaggle.com/mkechinov/ecommerce-events-history-in-electronics-store - you're reading it right now
    6. [NEW] https://www.kaggle.com/datasets/mkechinov/direct-messaging

    How to read it

    There are different types of events. See below.

    Semantics (or how to read it):

    User user_id during session user_session added to shopping cart (property event_type is equal cart) product product_id of brand brand of category category_code (category_code) with price price at event_time

    File structure

    PropertyDescription
    event_timeTime when event happened at (in UTC).
    event_typeOnly one kind of event: purchase.
    product_idID of a product
    category_idProduct's category ID
    category_codeProduct's category taxonomy (code name) if it was possible to make it. Usually present for meaningful categories and skipped for different kinds of accessories.
    brandDowncased string of brand name. Can be missed.
    priceFloat price of a product. Present.
    user_idPermanent user ID.
    ** user_session**Temporary user's session ID. Same for each user's session. Is changed every time user come back to online store from a long pause.

    Event types

    Events can be:

    • view - a user viewed a product
    • cart - a user added a product to shopping cart
    • remove_from_cart - a user removed a product from shopping cart
    • purchase - a user purchased a product

    Multiple purchases per session

    A session can have multiple purchase events. It's ok, because it's a single order.

    Many thanks

    Thanks to REES46 Marketing Platform for this dataset.

    Using datasets in your works, books, education materials

    You can use this dataset for free. Just mention the source of it: link to this page and link to REES46 Marketing Platform.

  7. Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE...

    • datarade.ai
    .json, .xml, .csv
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2023). Ecommerce Data | Product & Customer Review Data | Scrape Any Website | FREE Sample Available | Custom Scraping Services | PromptCloud [Dataset]. https://datarade.ai/data-products/ecommerce-data-product-and-customer-review-dataset-from-eco-promptcloud
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset authored and provided by
    PromptCloud
    Area covered
    Peru, Mayotte, Bahrain, Monaco, Wallis and Futuna, Nigeria, Botswana, Latvia, Niger, Estonia
    Description

    PromptCloud offers specialized data extraction services for eCommerce businesses, focusing on acquiring detailed product and customer review datasets from a variety of eCommerce websites. This service is instrumental for businesses aiming to refine their eCommerce strategies through in-depth market analysis, competitive research, and enhanced customer insights.

    Customization is a key aspect of PromptCloud's offerings. PromptCloud provides bespoke scraping services, tailored to the unique requirements of each business. This adaptability is especially beneficial for companies seeking a competitive advantage in the dynamic eCommerce market. A distinctive feature of PromptCloud's approach is the provision of a free sample, allowing potential clients to experience the quality and accuracy of their data firsthand. This commitment to quality is reflected in their use of advanced technologies that ensure the delivery of precise, up-to-date data.

    PromptCloud's versatility extends to data delivery, offering various formats like JSON, CSV, and XML. This flexibility facilitates seamless integration of data into different business systems, highlighting their focus on creating user-friendly and effective solutions.

    PromptCloud positions itself as a vital resource for eCommerce businesses looking to utilize data for strategic planning and customer understanding. Their tailored scraping services, combined with a commitment to delivering current and accurate data, make PromptCloud the best option for businesses seeking to improve their market presence and deepen their understanding of customer behavior.

    We are committed to putting data at the heart of your business. Reach out for a no-frills PromptCloud experience- professional, technologically ahead and reliable.

  8. Play Store Apps

    • kaggle.com
    Updated Sep 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2022). Play Store Apps [Dataset]. https://www.kaggle.com/datasets/whenamancodes/play-store-apps
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aman Chauhan
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.

    Each app (row) has values for catergory, rating, size, and more.

    The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!

    googleplaystore.csv

    ColumnsDescription
    AppApplication name
    CategoryCategory the app belongs to
    RatingsOverall user rating of the app (as when scraped)
    ReviewsNumber of user reviews for the app (as when scraped)
    SizeSize of the app (as when scraped)
    InstallsNumber of user downloads/installs for the app (as when scraped)
    TypePaid or Free
    PricePrice of the app (as when scraped)
    Content RatingAge group the app is targeted at - Children / Mature 21+ / Adult
    GenreAn app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to
    Current VerCurrent version of the app available on Play Store (as when scraped)
    Android VerMin required Android version (as when scraped)

    googleplaystore_user_reviews.csv

    ColumnsDescription
    AppName of app
    Translated ReviewsUser review (Preprocessed and translated to English)
    SentimentPositive/Negative/Neutral (Preprocessed)
    Sentiment_polaritySentiment polarity score
    Sentiment_subjectivitySentiment subjectivity score

    More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿®)ᕗ , Keeps Aman Hurray Hurray..... Ù©(˘◡˘)Û¶Haha

  9. e

    Web map service DTK50

    • data.europa.eu
    wms
    Updated Mar 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Web map service DTK50 [Dataset]. https://data.europa.eu/data/datasets/54c4b80a-66c5-45a5-9e98-e3e9fde2c34b
    Explore at:
    wmsAvailable download formats
    Dataset updated
    Mar 1, 2022
    Description

    In the course of the construction of the spatial data infrastructure of the Free State of Thuringia (GDI-Th) selected geodata are made available to internal and external users for free use. From the geobase data of the central spatial data storage and spatial data provision component Geoproxy, data collections of particular public interest are made available to everyone as public data without restriction of access and free of charge via the Geoclient as a viewing service. These are data from the Digital Topographic Map 1:50 000 (DTK50).

  10. An analysis of the current overlay journals

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Oct 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antti M. Rousi; Antti M. Rousi; Mikael Laakso; Mikael Laakso (2022). An analysis of the current overlay journals [Dataset]. http://doi.org/10.5281/zenodo.6617002
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 18, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antti M. Rousi; Antti M. Rousi; Mikael Laakso; Mikael Laakso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Research data to accommodate the article "Overlay journals: a study of the current landscape" (https://doi.org/10.1177/09610006221125208)

    Identifying the sample of overlay journals was an explorative process (occurring during April 2021 to February 2022). The sample of investigated overlay journals were identified by using the websites of Episciences.org (2021), Scholastica (2021), Free Journal Network (2021), Open Journals (2021), PubPub (2022), and Wikipedia (2021). In total, this study identified 34 overlay journals. Please see the paper for more details about the excluded journal types.

    The journal ISSN numbers, manuscript source repositories, first overlay volumes, article volumes, publication languages, peer-review type, licence for published articles, author costs, publisher types, submission policy, and preprint availability policy were observed by inspecting journal editorial policies and submission guidelines found from journal websites. The overlay journals’ ISSN numbers were identified by examining journal websites and cross-checking this information with the Ulrich’s periodicals database (Ulrichsweb, 2021). Journals that published review reports, either with reviewers’ names or anonymously, were classified as operating with open peer-review. Publisher types defined by Laakso and Björk (2013) were used to categorise the findings concerning the publishers. If the journal website did not include publisher information, the editorial board was interpreted to publish the journal.

    The Organisation for Economic Co-operation and Development (OECD) field of science classification was used to categorise the journals into different domains of science. The journals’ primary OECD field of sciences were defined by the authors through examining the journal websites.

    Whether the journals were indexed in the Directory of Open Access Journals (DOAJ), Scopus, or Clarivate Analytics’ Web of Science Core collection’s journal master list was examined by searching the services with journal ISSN numbers and journal titles.

    The identified overlay journals were examined from the viewpoint of both qualitative and quantitative journal metrics. The qualitative metrics comprised the Nordic expert panel rankings of scientific journals, namely the Finnish Publication Forum, the Danish Bibliometric Research Indicator and the Norwegian Register for Scientific Journals, Series and Publishers. Searches were conducted from the web portals of the above services with both ISSN numbers and journal titles. Clarivate Analytics’ Journal Citation Reports database was searched with the use of both ISSN numbers and journal titles to identify whether the journals had a Journal Citation Indicator (JCI), Two-Year Impact Factor (IF) and an Impact Factor ranking (IF rank). The examined Journal Impact Factors and Impact Factor rankings were for the year 2020 (as released in 2021).

  11. c

    Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2016 – VERSION 1)

    • lindat.mff.cuni.cz
    Updated Nov 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Oliver RĂŒdiger (2024). Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2016 – VERSION 1) [Dataset]. https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-5790
    Explore at:
    Dataset updated
    Nov 12, 2024
    Authors
    Jan Oliver RĂŒdiger
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    *** german version see below ***

    The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the German-language (visible) internet over time - with the aim of achieving comparability with the DeReKo (‘German Reference Corpus’ of the Leibniz Institute for the German Language - DeReKo volume 57 billion tokens - status: DeReKo Release 2024-I). The corpus is separated by year (here year 2016) and versioned (here version 1). Version 1 comprises (all years 2013-2024) 97.45 billion tokens.

    The corpus is based on the data dumps from CommonCrawl (https://commoncrawl.org/). CommonCrawl is a non-profit organisation that provides copies of the visible Internet free of charge for research purposes.

    The CommonCrawl WET raw data was first filtered by TLD (top-level domain). Only pages ending in the following TLDs were taken into account: ‘.at; .bayern; .berlin; .ch; .cologne; .de; .gmbh; .hamburg; .koeln; .nrw; .ruhr; .saarland; .swiss; .tirol; .wien; .zuerich’. These are the exclusive German-language TLDs according to ICANN (https://data.iana.org/TLD/tlds-alpha-by-domain.txt) as of 1 June 2024 - TLDs with a purely corporate reference (e.g. ‘.edeka; .bmw; .ford’) were excluded. The language of the individual documents (URLs) was then estimated with the help of NTextCat (https://github.com/ivanakcheurov/ntextcat) (via the CORE14 profile of NTextCat) - only those documents/URLs for which German was the most likely language were processed further (e.g. to exclude foreign-language material such as individual subpages). The third step involved filtering for manual selectors and filtering for 1:1 duplicates (within one year).

    The filtering and subsequent processing was carried out using CorpusExplorer (http://hdl.handle.net/11234/1-2634) and our own (supplementary) scripts, and the TreeTagger (http://hdl.handle.net/11372/LRT-323) was used for automatic annotation. The corpus was processed on the HELIX HPC cluster. The author would like to take this opportunity to thank the state of Baden-WĂŒrttemberg and the German Research Foundation (DFG) for the possibility to use the bwHPC/HELIX HPC cluster - funding code HPC cluster: INST 35/1597-1 FUGG.

    Data content: - Tokens and record boundaries - Automatic lemma and POS annotation (using TreeTagger) - Metadata: - GUID - Unique identifier of the document - YEAR - Year of capture (please use this information for data slices) - Url - Full URL - Tld - Top-Level Domain - Domain - Domain without TLD (but with sub-domains if applicable) - DomainFull - Complete domain (incl. TLD) - DomainFull - Complete domain (incl. TLD) - Datum - (System Information): Date of the CorpusExplorer (date of capture by CommonCrawl - not date of creation/modification of the document). - Hash - (System Information): SHA1 hash of the CommonCrawl - Pfad - (System Information): Path of the cluster (raw data) - is supplied by the system.

    Please note that the files are saved as *.cec6.gz. These are binary files of the CorpusExplorer (see above). These files ensure efficient archiving. You can use both CorpusExplorer and the ‘CEC6-Converter’ (available for Linux, MacOS and Windows - see: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-5705) to convert the data. The data can be exported in the following formats:

    • CATMA v6
    • CoNLL
    • CSV
    • CSV (only meta-data)
    • DTA TCF-XML
    • DWDS TEI-XML
    • HTML
    • IDS I5-XML
    • IDS KorAP XML
    • IMS Open Corpus Workbench
    • JSON
    • OPUS Corpus Collection XCES
    • Plaintext
    • SaltXML
    • SlashA XML
    • SketchEngine VERT
    • SPEEDy/CODEX (JSON)
    • TLV-XML
    • TreeTagger
    • TXM
    • WebLicht
    • XML

    Please note that an export increases the storage space requirement extensively. The ‘CorpusExplorerConsole’ (https://github.com/notesjor/CorpusExplorer.Terminal.Console - available for Linux, MacOS and Windows) also offers a simple solution for editing and analysing. If you have any questions, please contact the author.

    Legal information The data was downloaded on 01.11.2024. The use, processing and distribution is subject to §60d UrhG (german copyright law), which authorises the use for non-commercial purposes in research and teaching. LINDAT/CLARIN is responsible for long-term archiving in accordance with §69d para. 5 and ensures that only authorised persons can access the data. The data has been checked to the best of our knowledge and belief (on a random basis) - should you nevertheless find legal violations (e.g. right to be forgotten, personal rights, etc.), please write an e-mail to the author (amc_report@jan-oliver-ruediger.de) with the following information: 1) why this content is undesirable (please outline only briefly) and 2) how the content can be identified - e.g. file name, URL or domain, etc. The author will endeavour to identify the content. The author will endeavour to remove the content and re-upload the data (modified) within two weeks (new version). If

  12. Most popular travel and tourism websites worldwide 2025

    • statista.com
    • flwrdeptvarieties.store
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular travel and tourism websites worldwide 2025 [Dataset]. https://www.statista.com/statistics/1215457/most-visited-travel-and-tourism-websites-worldwide/
    Explore at:
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    In February 2025, booking.com was the most visited travel and tourism website worldwide. That month, Booking’s web page recorded around 517 million visits. Tripadvisor.com and airbnb.com followed in the ranking, with roughly 120 million and 99 million visits, respectively. Popular online travel agencies in the U.S. Online travel agencies (OTAs), such as Booking.com and Expedia, offer a wide variety of services, including online hotel bookings, flight reservations, and car rentals. According to the Statista Consumer Insights Global survey, when looking at flight search engine online bookings by brand in the United States, Expedia and Booking.com were the most popular options when it came to making online flight reservations in 2024. When focusing on hotel and private accommodation online bookings in the U.S., Booking.com was the most popular brand, followed by Airbnb, Expedia, and Hotels.com. Booking Holdings vs. Expedia Group Booking.com is one of the most popular sites of online travel group Booking Holdings, the leading online travel agency worldwide based on revenue, that also owns brands like Priceline, Kayak, and Agoda. In 2024, Booking Holdings' revenue amounted to almost 24 billion U.S. dollars, the highest figure reported by the company to date. Meanwhile, global revenue of Expedia Group, which manages brands like Expedia, Hotels.com, and Vrbo, reached nearly 14 billion U.S. dollars that year.

  13. Total global visitor traffic to Google.com 2024

    • statista.com
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Total global visitor traffic to Google.com 2024 [Dataset]. https://www.statista.com/statistics/268252/web-visitor-traffic-to-googlecom/
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    In March 2024, search platform Google.com generated approximately 85.5 billion visits, down from 87 billion platform visits in October 2023. Google is a global search platform and one of the biggest online companies worldwide.

  14. a

    Traffic

    • open-data-cgvar.hub.arcgis.com
    • ps-dubai.hub.arcgis.com
    • +1more
    Updated Mar 11, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Conseil Départemental du Var (2014). Traffic [Dataset]. https://open-data-cgvar.hub.arcgis.com/datasets/traffic
    Explore at:
    Dataset updated
    Mar 11, 2014
    Dataset authored and provided by
    Conseil Départemental du Var
    License

    http://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdfhttp://opendata.regionpaca.fr/fileadmin//user_upload/tx_ausyopendata/licences/Licence-Ouverte-Open-Licence-ETALAB.pdf

    Area covered
    Description

    The map layers in this service provide color-coded maps of the traffic conditions you can expect for the present time (the default). The map shows present traffic as a blend of live and typical information. Live speeds are used wherever available and are established from real-time sensor readings. Typical speeds come from a record of average speeds, which are collected over several weeks within the last year or so. Layers also show current incident locations where available. By changing the map time, the service can also provide past and future conditions. Live readings from sensors are saved for 12 hours, so setting the map time back within 12 hours allows you to see a actual recorded traffic speeds, supplemented with typical averages by default. You can choose to turn off the average speeds and see only the recorded live traffic speeds for any time within the 12-hour window. Predictive traffic conditions are shown for any time in the future.The color-coded traffic map layer can be used to represent relative traffic speeds; this is a common type of a map for online services and is used to provide context for routing, navigation, and field operations. A color-coded traffic map can be requested for the current time and any time in the future. A map for a future request might be used for planning purposes.The map also includes dynamic traffic incidents showing the location of accidents, construction, closures, and other issues that could potentially impact the flow of traffic. Traffic incidents are commonly used to provide context for routing, navigation and field operations. Incidents are not features; they cannot be exported and stored for later use or additional analysis.Data sourceEsri’s typical speed records and live and predictive traffic feeds come directly from HERE (www.HERE.com). HERE collects billions of GPS and cell phone probe records per month and, where available, uses sensor and toll-tag data to augment the probe data collected. An advanced algorithm compiles the data and computes accurate speeds. The real-time and predictive traffic data is updated every five minutes through traffic feeds.Data coverageThe service works globally and can be used to visualize traffic speeds and incidents in many countries. Check the service coverage web map to determine availability in your area of interest. Look at the coverage map to learn whether a country currently supports traffic. The support for traffic incidents can be determined by identifying a country. For detailed information on this service, visit the directions and routing documentation and the ArcGIS Help.SymbologyTraffic speeds are displayed as a percentage of free-flow speeds, which is frequently the speed limit or how fast cars tend to travel when unencumbered by other vehicles. The streets are color coded as follows:Green (fast): 85 - 100% of free flow speedsYellow (moderate): 65 - 85%Orange (slow); 45 - 65%Red (stop and go): 0 - 45%To view live traffic only—that is, excluding typical traffic conditions—enable the Live Traffic layer and disable the Traffic layer. (You can find these layers under World/Traffic > [region] > [region] Traffic). To view more comprehensive traffic information that includes live and typical conditions, disable the Live Traffic layer and enable the Traffic layer.ArcGIS Online organization subscriptionImportant Note:The World Traffic map service is available for users with an ArcGIS Online organizational subscription. To access this map service, you'll need to sign in with an account that is a member of an organizational subscription. If you don't have an organizational subscription, you can create a new account and then sign up for a 30-day trial of ArcGIS Online.

  15. Global market share of leading desktop search engines 2015-2025

    • statista.com
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global market share of leading desktop search engines 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
    Explore at:
    Dataset updated
    Jan 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2015 - Jan 2025
    Area covered
    Worldwide
    Description

    As of January 2025, online search engine Bing accounted for 12.23 percent of the global desktop search market, while market leader Google had a share of around 78.83 percent. Meanwhile, Yahoo's market share was 3.07 percent. Google in the global market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2023, with a market capitalization of 1,6 trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2023 with roughly 305.6 billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its’ alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than 63 percent of internet users in Russia used Yandex, whereas Google users were nearly 36 percent. Meanwhile, Baidu was the most used search engine in China, despite a strong percentage decrease of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. In the first quarter of 2022 nearly 56 percent of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over 27 percent of users in Mexico said they used Yahoo. Another search engine, Bing, operated by Microsoft, was the second most popular search engine in the United Kingdom after Google.

  16. Total global visitor traffic to Wikipedia.org 2024

    • statista.com
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Total global visitor traffic to Wikipedia.org 2024 [Dataset]. https://www.statista.com/statistics/1259907/wikipedia-website-traffic/
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    World
    Description

    In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.

  17. Book piracy sites in the U.S. 2017

    • statista.com
    Updated Mar 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). Book piracy sites in the U.S. 2017 [Dataset]. https://www.statista.com/statistics/688411/book-piracy-sites/
    Explore at:
    Dataset updated
    Mar 22, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    United States
    Description

    According to a survey held in the United States in 2017, 50 percent of respondents admitted to using 4shared.com to access e-books illegally. Book sharing platforms like 4shared.com may appear innocent at first glance, but this particular site is the most popular among consumers looking to illegally download e-books, with Uploaded.net and Bookos.org ranking second and third as the most used websites for this purpose.

    Does downloading e-books illegally really matter?

    Illegal e-book downloads are a serious problem for authors, and present real risks to a writer’s career. This kind of piracy can directly affect an author’s income as genuine sales give way to free, illegal downloads which are shared across the web and passed from reader to reader.

    Unfortunately, social media platforms only fuel this behavior. Reddit has multiple forums about e-book piracy. These forums allow users to discuss different piracy methods and give each other tips on the best illegitimate e-book download apps, websites and torrent files.

    Can e-book piracy be stopped?

    There are ongoing efforts to prevent e-book piracy from continuing or getting worse. Sadly this is not an easy task, given the sheer amount of options available to readers seeking ways to access paid content for free. Online guides for authors about illegal book downloads can help in tackling the problem when it arises or assist book writers in weighing up whether or not to try to address the issue. Methods such as digital rights management (DRM) could theoretically help to decrease illegal e-book distribution, but this is not a popular option as it heavily restricts how readers can access books online. Sadly though, e-book piracy is almost impossible to stop.

  18. Leading e-commerce and shopping websites worldwide 2024, based on visit...

    • statista.com
    • flwrdeptvarieties.store
    • +1more
    Updated Feb 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading e-commerce and shopping websites worldwide 2024, based on visit share [Dataset]. https://www.statista.com/statistics/1198949/most-visited-websites-in-the-retail-sector-worldwide/
    Explore at:
    Dataset updated
    Feb 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2024
    Area covered
    Worldwide
    Description

    Amazon's global platform 'amazon.com' was the most popular e-commerce and shopping website worldwide, accounting for more than 12.92 percent of desktop visits to sites in this category in April 2024. Second place went to eBay.com with roughly three percent. Amazon leads the way There is no denying the dominance of Amazon in the e-commerce industry. By 2026, Amazon's worldwide net sales are estimated to exceed one trillion U.S. dollars. In 2022, amazon.com garnered over three billion monthly visitors, maintaining its spot as the most popular retail website worldwide. As of April 2024, the leading social media traffic referrers to amazon.com were YouTube, Facebook, and X. Online shopping Amazon’s strong position is also due to shoppers’ preference for online marketplaces. As of April 2024, nearly one-third of online consumers opted for online marketplaces over all other digital channels. This category of platforms was also ranked as the e-commerce channel delivering the best customer experience in 2024. According to shoppers worldwide, the three most important changes that could make their digital shopping experience better were faster delivery, free returns, and more convenient shipping conditions.

  19. Share of mobile internet traffic in global regions 2025

    • statista.com
    • flwrdeptvarieties.store
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of mobile internet traffic in global regions 2025 [Dataset]. https://www.statista.com/statistics/306528/share-of-mobile-internet-traffic-in-global-regions/
    Explore at:
    Dataset updated
    Jan 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2025
    Area covered
    Worldwide
    Description

    In January 2025 mobile devices excluding tablets accounted for over 62 percent of web page views worldwide. Meanwhile, over 75 percent of webpage views in Africa were generated via mobile. In contrast, just over half of web traffic in North America still took place via desktop connections with mobile only accounting for 51.1 percent of total web traffic. While regional infrastructure remains an important factor in broadband vs. mobile coverage, most of the world has had their eyes on the recent 5G rollout across the globe, spearheaded by tech-leaders China and the United States. The number of mobile 5G subscriptions worldwide is forecast to reach more than 8 billion by 2028. Social media: room for growth in Africa and southern Asia Overall, more than 92 percent of the world’s mobile internet subscribers are also active on social media. A fast-growing market, with newcomers such as TikTok taking the world by storm, marketers have been cashing in on social media’s reach. Overall, social media penetration is highest in Europe and America while in Africa and southern Asia, there is still room for growth. As of 2021, Facebook and Google-owned YouTube are the most popular social media platforms worldwide. Facebook and Instagram are most effective With nearly 3 billion users, it is no wonder that Facebook remains the social media avenue of choice for the majority of marketers across the world. Instagram, meanwhile, was the second most popular outlet. Both platforms are low-cost and support short-form content, known for its universal consumer appeal and answering to the most important benefits of using these kind of platforms for business and advertising purposes.

  20. Most visited conservative websites in the U.S. 2023

    • statista.com
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Most visited conservative websites in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/1340485/usa-most-visited-conservative-websites/
    Explore at:
    Dataset updated
    Dec 11, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2023
    Area covered
    United States
    Description

    In September 2023, Fox News ranked first among the most popular multiplatform conservative and right-wing websites in the United States with over 78.6 million unique visitors from mobile and desktop connections. Far-right website and printed magazine Epoch Times ranked second with approximately 5.9 million unique monthly visitors.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
Organization logo

Leading websites worldwide 2024, by monthly visits

Explore at:
90 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
World
Description

In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.

Search
Clear search
Close search
Google apps
Main menu