20 datasets found
  1. m

    (Dataset) The most visited health websites in the world

    • data.mendeley.com
    • narcis.nl
    Updated Jan 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patricia Acosta-Vargas (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1
    Explore at:
    Dataset updated
    Jan 11, 2021
    Authors
    Patricia Acosta-Vargas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Evaluation of the most visited health websites in the world

  2. A

    ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

    Methodology

    The data collected originates from SimilarWeb.com.

    Source

    For the analysis and study, go to The Concept Center

    This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

    How to use this dataset

    • Analyze 11/1/2016 in relation to 2/1/2017
    • Study the influence of 4/1/2017 on 1/1/2017
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Chase Willden

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  3. d

    Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...

    • datarade.ai
    .csv
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Consumer Edge (2025). Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B Shopper Insights | 59 Countries, 3-Day Lag, Daily Delivery [Dataset]. https://datarade.ai/data-products/click-global-data-web-traffic-data-transaction-data-con-consumer-edge
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset authored and provided by
    Consumer Edge
    Area covered
    Marshall Islands, Bermuda, Congo, South Africa, Bosnia and Herzegovina, Nauru, El Salvador, Sri Lanka, Finland, Montserrat
    Description

    Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights

    Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.

    Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.

    Use Case: Analyze Year Over Year Growth Rate by Region

    Problem A public investor wants to understand how a company’s year-over-year growth differs by region.

    Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends

    Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume

    Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels

    Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.

    Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer

  4. Top 100 Batsman

    • kaggle.com
    Updated Jan 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Wadwani (2023). Top 100 Batsman [Dataset]. http://doi.org/10.34740/kaggle/dsv/4824563
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2023
    Dataset provided by
    Kaggle
    Authors
    Rahul Wadwani
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen with the best performance level at the top of the dataset, indicating that the player who has performed the best has been ranked in the following top100batsman.csv file. This dataset has only the top 100 players This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen This a web-scraped dataset collected from the cricbuzz website contains the top 100 batsmen with the best performance level at the top of the dataset, indicating that the player who has performed the best has been ranked in the following top100batsman.csv file. This dataset has only the top 100 players who has completed the best in the field of test cricket and the data is collected on 7th January 2023.

    Dataset contains:- test_ranking: this column contains the current test ranking of the player. player id : this column contains the player id which is unique and specified according to cricbuzz batsman : this column contains the name of the batsman to date rating : this column is provided by the ICC team: this column deals with the name of the team from which the player belongs. matches : this column: this column is the number of matches played by the player till date innings : innings deals with the number of times in a match the player has batted runs:total number of runs scored by the batsman high_score : highest score achieved by a batsman average : it is the ratio of total number of runs scored to the number of times the batsman got out. strike_rate: this the overall strike rate of the batsman which is calculated by runs scored divided by the ball played century @[💯](100) : number of centuries scored by the batsman double_century : number of double centuries scored by the batsman h scored by the batsman half_century : number of half_century scored by the batsman fours : total number of fours hit till date sixes : total number of sixes hit till date

  5. Google Landmarks Dataset v2

    • github.com
    • opendatalab.com
    Updated Sep 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
    Explore at:
    Dataset updated
    Sep 27, 2019
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

  6. Recipes dataset from allrecipes

    • crawlfeeds.com
    csv, zip
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Recipes dataset from allrecipes [Dataset]. https://crawlfeeds.com/datasets/recipes-dataset-from-allrecipes
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unleash the culinary potential with our comprehensive Recipes dataset from Allrecipes. This dataset provides detailed information on a vast collection of recipes sourced from Allrecipes, one of the world's most popular recipe websites. Ideal for chefs, food enthusiasts, developers, and data scientists, this dataset offers an extensive range of culinary possibilities.

    The dataset includes key details such as recipe titles, ingredients, preparation instructions, cooking times, user ratings, and dietary categories. With recipes spanning various cuisines, dietary preferences, and meal types, this dataset is a valuable resource for creating recipe apps, conducting nutritional analysis, or exploring new culinary trends.

    Looking for more data to fuel your food-related projects? Check out our Food & Beverage Data for diverse datasets designed to inspire and empower innovation in the food and beverage industry.

    Enhance your food-related projects with structured, high-quality data from Allrecipes. Whether developing a recipe recommendation engine, building a food blog, or researching cooking trends, this dataset is your go-to resource for delicious inspiration and data-driven culinary insights.

  7. Global Starlink Web Cache Latency & Traceroute Measurement Dataset

    • zenodo.org
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Zhang; Qi Zhang; Zeqi Lai; Zeqi Lai; Qian Wu; Qian Wu; Jihao Li; Jihao Li; HEWU LI; HEWU LI (2025). Global Starlink Web Cache Latency & Traceroute Measurement Dataset [Dataset]. http://doi.org/10.5281/zenodo.14800115
    Explore at:
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Qi Zhang; Qi Zhang; Zeqi Lai; Zeqi Lai; Qian Wu; Qian Wu; Jihao Li; Jihao Li; HEWU LI; HEWU LI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains global web cache latency measurements collected via RIPE Atlas probes equipped with Starlink terminals across five continents, spanning over 24 hours and resulting in ~2 Million measurements. The measurements aim to evaluate the user-perceived latency of accessing popular websites through low-earth orbit (LEO) satellite networks.

    This dataset is a product of Spache, a research project on web caching from space. Please refer to its WWW'25 paper for more details and analysis results.

    Dataset File Content

    The dataset includes the following files:

    • Metadata

      • Target website list: A list of the top 50 most popular websites according to Alexa ranking.
        • RIPE Atlas Measurement IDs: For each website, the corresponding RIPE Atlas Measurement IDs for both Ping and Traceroute measurements are provided.
        • Note: microsoftonline.com (originally ranked 41st) is not included in the list due to its unresolvable domain name.
    • Measurement results - Raw Data

      • Ping and Traceroute results: Raw measurement results for each target website, including detailed information on each measurement.
    • Measurement results - Preprocessed Latency

      • Ping RTT latency: Preprocessed data containing the minimum RTT (Round Trip Time, in milliseconds) for each Ping measurement to all target websites.
        • Probe information: Corresponding Probe IDs, along with their respective countries and continents at the time of measurement.

    This dataset is intended to support research on web caching, particularly in the context of satellite Internet. Please cite both this dataset and the associated paper if you find this data useful.

  8. WebBench

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Halluminate (2025). WebBench [Dataset]. https://huggingface.co/datasets/Halluminate/WebBench
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset provided by
    Halluminate, Inc.
    Authors
    Halluminate
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Web Bench: A real-world benchmark for Browser Agents

    WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025

      Dataset Composition
    

    Category Description Example Count (% of dataset)

    READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.

  9. d

    Grips Competitive Intelligence (global e-commerce data)

    • datarade.ai
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grips Intelligence (2023). Grips Competitive Intelligence (global e-commerce data) [Dataset]. https://datarade.ai/data-products/grips-competitive-intelligence-global-e-commerce-data-grips-intelligence
    Explore at:
    Dataset updated
    Jul 17, 2023
    Dataset authored and provided by
    Grips Intelligence
    Area covered
    United States of America, United Kingdom, Germany
    Description

    Website visitation is nice, but sales and revenue are better. Grips tracks e-commerce-based sales across 5,000+ product categories, 30k retailers, and brands, enabling you to understand market size, share, opportunities, and threats.

    Use Cases

    Domain e-commerce performance Harness the power of data-driven analysis to evaluate critical metrics such as revenue, average order value (AOV), conversion rate, channels, and product assortment for an extensive selection of 30,000 leading e-commerce retailers, enabling you to make strategic decisions and stay ahead in the dynamic online marketplace.

    Product Category e-commerce performance Unlock the potential of your business with our game-changing Share of Wallet analysis. Gain valuable insights into the market size and growth of over 5000+ product categories, as well as your retailer or brand's market share within each category.

    Brand e-commerce performance Gain deep insights into the market size, share, and revenue growth of 30,000 top e-commerce brands in the digital ecosystem, exploring key metrics such as units sold, average price, and more. Empower your business with comprehensive data to make informed decisions and capitalize on lucrative opportunities in the ever-evolving online marketplace.

    Data Methodology

    We have a unique mix of sources from where we gather digital signals.

    • Raw data collection - we have developed several productivity tools, including Retailer Benchmarking, which collectively create the world’s largest transactional dataset - public data captured from millions of sites and partnerships with top data providers.

    • Data processing - cleaning and formatting, classification of products, sites and more preparation for the modelling phase.

    • Data modeling: from the billions of digital signals we extrapolate in detail how global e-commerce sites and products are performing.

    7-day free trial available Sign up for free at: https://gripsintelligence.com/

  10. R

    Buttondetection2 Dataset

    • universe.roboflow.com
    zip
    Updated Jul 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elsa Weber (2022). Buttondetection2 Dataset [Dataset]. https://universe.roboflow.com/elsa-weber/buttondetection2/model/13
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 26, 2022
    Dataset authored and provided by
    Elsa Weber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Clickable Elements Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Accessibility Enhancement: ButtonDetection2 can be used to improve the accessibility of websites and apps for visually impaired users. By identifying clickable elements such as links, buttons, and fields, the model can help screen readers and other assistive technologies better understand the interface and guide users through the navigation process more effectively.

    2. Automated UI Testing: ButtonDetection2 can be employed to automate user interface testing for websites and apps. By identifying clickable elements, the model can streamline the testing process by automatically clicking buttons, links, and fields to ensure that they function as expected, reducing manual efforts and speeding up the overall QA process.

    3. UX Analysis and Optimization: ButtonDetection2 can be used by UX designers and developers to analyze and optimize the design of websites and apps. By detecting clickable elements, the model can help identify areas of the interface that may be confusing or difficult for users to interact with, providing insights for designing more user-friendly experiences.

    4. Web Scraping/Data Extraction: ButtonDetection2 can be employed for web scraping and data extraction tasks. The model can identify clickable elements within webpages, facilitating automated extraction of relevant data such as product details, contact information, or event details by navigating through the appropriate links, buttons, and fields within the site.

    5. Augmented Reality Navigation: ButtonDetection2 can be integrated into augmented reality applications to enhance real-world interactions with digital interfaces. By detecting clickable elements such as buttons and links, the model can overlay visual indicators or audio cues on top of the real-world view, providing users with a more intuitive way to interact with digital interfaces in AR environments.

  11. p

    Qatar Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Qatar Number Dataset [Dataset]. https://listtodata.com/qatar-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Bahrain, Qatar
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Qatar number dataset can directly send your offers, and it will indeed promote your business at the highest level. Even more, you can use this database on any CRM platform. All of these parts working together will give you a respectable profit margin. We can provide lists based on your needs and uphold all business rules. Qatar number dataset only contains authentic data. List to Data is one of the websites that can provide you with the most reliable information, as was previously said. Therefore, it is guaranteed that you will receive nearly no bounce-back data from this source. We are here to help our clients grow their online businesses. Also, you can get a good and instant return on investment(ROI). Qatar phone data is now a basic need for businesses. Without telemarketing and SMS marketing no one can grow at this time. So, this database is heavily required at this time. From all across the world, our organization has gathered millions of phone number lists for both businesses and consumers. To launch your business in Qatar, you can acquire this dataset. Qatar phone data will come to you at an extremely low budget and will solve your marketing issue. To make it more simple you can choose your targeted database while launching your items. We also create contact directories using business area categories. List to Data is aware of updating the database, therefore if any false information was ever added, we promptly removed it. Qatar phone number list is a genuine dataset. This will provide you with the best and most increasingly effective details when you conduct internet marketing. After purchase, you can instantly download the file, which will come to you in an Excel or CSV format. If anyone wants to make a huge profit they can ignore the Qatar phone number list. In the end, Qatar phone number list is the product that you need now. You can also view the other products on our website and get more information there. Although the product is an easy-to-buy service, the price is also fixed. This contact address will indeed generate more revenue for you, and you can see your business at the top in a short amount of time.

  12. z

    Controlled Anomalies Time Series (CATS) Dataset

    • zenodo.org
    bin, csv
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.8338435
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Solenix Engineering GmbH
    Authors
    Patrick Fleith; Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

    The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

    • Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
      • 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
      • 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
      • 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
    • 5 million timestamps. Sensors readings are at 1Hz sampling frequency.
      • 1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.
      • 4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).
    • 200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.
    • Different types of anomalies to understand what anomaly types can be detected by different approaches. The categories are available in the dataset and in the metadata.
    • Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.
    • Suitable for root cause analysis. In addition to the anomaly category, the time series channel in which the anomaly first developed itself is recorded and made available as part of the metadata. This can be useful to evaluate the performance of algorithm to trace back anomalies to the right root cause channel.
    • Affected channels. In addition to the knowledge of the root cause channel in which the anomaly first developed itself, we provide information of channels possibly affected by the anomaly. This can also be useful to evaluate the explainability of anomaly detection systems which may point out to the anomalous channels (root cause and affected).
    • Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.
    • Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.
    • Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
    • No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

    Change Log

    Version 2

    • Metadata: we include a metadata.csv with information about:
      • Anomaly categories
      • Root cause channel (signal in which the anomaly is first visible)
      • Affected channel (signal in which the anomaly might propagate) through coupled system dynamics
    • Removal of anomaly overlaps: version 1 contained anomalies which overlapped with each other resulting in only 190 distinct anomalous segments. Now, there are no more anomaly overlaps.
    • Two data files: CSV and parquet for convenience.

    [1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

    About Solenix

    Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.

  13. Number of internet users worldwide 2014-2029

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    World
    Description

    The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.

  14. NFL Play Statistics dataset (secondary)

    • kaggle.com
    Updated Apr 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Todd Steussie (2020). NFL Play Statistics dataset (secondary) [Dataset]. https://www.kaggle.com/toddsteussie/nfl-play-statistics-secondary-datasets/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Todd Steussie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    NFL is one of the most popular sports in the world. Many of us are stat geeks who understanding not what just happened but also who and why. This NFL dataset provides a comprehensive view of NFL games, statistics, participation, and much more. The dataset includes NFL play data from 2004 to the present.

    This NFL dataset provides play-by-play data from the 2004 to 2019 seasons. Dataset also includes play and participation information for players, coaches, and game officials. Additional data tables included in this file includes NFL Draft from 1989 to present, NFL Combine 1999 to present, NFL rosters from 1998 to present, NFL schedules, stadium information and much more. The granularity of NFL statistics varies by NFL season. The current version of NFL statistics has been collected since 2012. All information sources used to create this dataset are from publically accessible websites and the NFL GSIS dataset.

    All information sources used to create this dataset are from publically accessible websites and NFL documentation. Although my current life is focused on data science, this project has a special place in my heart, since it links my previous profession in the NFL with my current passion for data analysis.

  15. Imgur Most Viral and Secret Santa

    • kaggle.com
    Updated Apr 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghalib93 (2020). Imgur Most Viral and Secret Santa [Dataset]. https://www.kaggle.com/ghalib93/imgur-most-viral-and-secret-santa/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2020
    Dataset provided by
    Kaggle
    Authors
    Ghalib93
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Imgur is an image hosting and sharing website founded in 2009. It became one of the most popular websites around the world with approximately 250 million users. The website does not require registration and anyone can browse its content. However, to be able to post an account must be created. It is famous for an event that it created in 2013 where members get to register to send/receive gifts from other members on the website. The event takes place during Christmas time and people share their gifts via the website where they post pictures of the process or what they received in a specific tag. Today the data provided covers two sections that I think are important to understanding certain patterns within the Imgur community. The first is the Most Viral section and the second is the Secret Santa tag.

    I have participated twice in The Imgur secret Santa event and always found funny and interesting post from its most viral section. I would like with the help of the Kaggle community to identify trends from the data provided and maybe make a comparison between the Secret Santa data and the most viral.

    Content

    There are two Dataframes included and they are almost identical in the number of columns:

    • The first Dataframe is Imgur Most Viral posts. This contains many of the posts that were labelled as Viral by The Imgur community and team using specific algorithms to track number of likes and dislikes across multiple platforms. The posts might be videos, gifs, pictures or just text.

    • The second Dataframe is Imgur Secret Santa Tag. Secret Santa is an annual Imgur tradition where members can sign up to send gifts to and receive gifts from other members during the Christmas holiday.This contains many of the posts that were tagged with Secret Santa by the Imgur community. The posts might be videos, gifs, pictures or just text. There is a (is_viral) column in this Dataframe that is not available in the Most Viral Dataframe since all of the posts there are viral.

      Data Dictionary

      FeatureTypeDatasetDescription
      account_idobjectImgur_Viral/imgur_secret_santaUnique Account ID per member
      comment_countfloat64Imgur_Viral/imgur_secret_santaNumber of comments made in the post
      datetimefloat64Imgur_Viral/imgur_secret_santaTimeStamp containing Date and Time Details
      downsfloat64Imgur_Viral/imgur_secret_santaNumber of dislikes for the post
      favorite_countfloat64Imgur_Viral/imgur_secret_santaNumber of user that marked the post as a favourite
      idobjectImgur_Viral/imgur_secret_santaUniqe Post ID. Even if it was posted by the same member, different posts will have different IDs
      images_countfloat64Imgur_Viral/imgur_secret_santaNumber of images included in the post
      pointsfloat64Imgur_Viral/imgur_secret_santaEach post will have calculated points based on (ups - downs)
      scorefloat64Imgur_Viral/imgur_secret_santaTicket number
      tagsobjectImgur_Viral/imgur_secret_santaTags are sub albums that the post will show under
      titleobjectImgur_Viral/imgur_secret_santaTitle of the post
      upsfloat64Imgur_Viral/imgur_secret_santaNumber of likes for the post
      viewsfloat64Imgur_Viral/imgur_secret_santaNumber of people that viewed the post
      is_most_viralbooleanimgur_secret_santaIf the post is viral or not

    Acknowledgements

    I would like to thank imgur for providing an API that made collecting data easier from its website. With their help we might be able to better understand certain trends that emerge from its community

    Inspiration

    There is no problem to solve from this data, but it just a fun way to explore and learn more about programming and analyzing data. I hope you enjoy playing with the data as much as I did collecting it and browsing the website

  16. Average daily time spent on social media worldwide 2012-2025

    • statista.com
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

  17. Top 3000+ Cryptocurrency Dataset

    • kaggle.com
    Updated Apr 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Top 3000+ Cryptocurrency Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/cryptocurrency-dataset-2021-395-types-of-crypto
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sourav Banerjee
    Description

    Context

    A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.

    Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database

    A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.

    Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.

    Content

    This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023

    Structure of the Dataset

    https://i.imgur.com/qGVJaHl.png" alt="">

    Acknowledgements

    This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.

    Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/

  18. YouTube's Channels Dataset

    • kaggle.com
    Updated Mar 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HarshitHGupta (2021). YouTube's Channels Dataset [Dataset]. https://www.kaggle.com/datasets/harshithgupta/youtubes-channels-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    HarshitHGupta
    Area covered
    YouTube
    Description

    Context

    YouTube is an American online video-sharing platform headquartered in San Bruno, California. The service, created in February 2005 by three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim—was bought by Google in November 2006 for US$1.65 billion and now operates as one of the company's subsidiaries. YouTube is the second most-visited website after Google Search, according to Alexa Internet rankings.

    YouTube allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Available content includes video clips, TV show clips, music videos, short and documentary films, audio recordings, movie trailers, live streams, video blogging, short original videos, and educational videos.

    YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments, and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.

    This dataset is a daily record of the top trending YouTube videos.

    Note that this dataset is a structurally improved version of this dataset.

    Acknowledgements

    This dataset was collected using the YouTube API. This Description is cited in Wikipedia.

  19. MyAnimeList - Anime Dataset with Reviews

    • kaggle.com
    Updated Mar 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Raj (2023). MyAnimeList - Anime Dataset with Reviews [Dataset]. https://www.kaggle.com/datasets/ansh0007/myanimelist-anime-dataset-with-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Harsh Raj
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    The Kaggle data set "Anime Comments Scrapped from https://myanimelist.net" is a valuable resource for anyone interested in exploring the world of anime. It is a collection of comments and reviews on various anime titles, sourced from the popular anime review website MyAnimeList. The data set was scraped using the Octoparse software, which is a powerful web scraping tool used to extract data from websites.

    The data set contains five columns of information, namely S.no, Title, Date of comment, User name, and text. The S.no column contains a unique identifier for each comment in the data set, while the Title column contains the name of the anime being reviewed. The Date of comment column indicates the date when the comment was posted, while the User name column shows the username of the person who posted the comment. Finally, the text column contains the actual comment or review left by the user on the anime in question.

    The data set is a great resource for anyone looking to analyze or explore anime-related content. Researchers and analysts can use the data set to gain insights into the opinions and sentiments of anime fans towards various titles. For example, one can use the data set to analyze which anime titles are the most popular or controversial among fans, and why. Similarly, researchers can analyze how the opinions and sentiments of anime fans have changed over time for specific anime titles.

    Another potential use case for the data set is in building recommendation systems for anime fans. By analyzing the text column of the data set, one can extract information about what anime fans like or dislike about certain anime titles. This information can then be used to build recommendation systems that suggest new anime titles to fans based on their preferences.

    The data set can also be used to build natural language processing (NLP) models for sentiment analysis. By training NLP models on the comments and reviews in the data set, researchers can build algorithms that automatically classify comments as positive, negative, or neutral. These models can then be used to analyze large volumes of comments and reviews quickly and efficiently.

    Furthermore, the data set can be used to perform network analyses of the relationships between anime titles and users. By analyzing which anime titles are reviewed or commented on by which users, one can identify clusters of users with similar tastes in anime. These clusters can then be used to build communities of anime fans with similar tastes, and to facilitate discussions and recommendations between these users.

    Another important point to note about the "Anime Comments Scrapped from https://myanimelist.net" data set is that it contains a large number of comments. Specifically, the data set includes over 30,000 comments on various anime titles. This makes the data set a rich source of information for anyone looking to perform large-scale analyses or build machine learning models.

    Overall, the "Anime Comments Scrapped from https://myanimelist.net" data set is a valuable resource for anyone interested in exploring the world of anime. It contains a wealth of information on the opinions and sentiments of anime fans towards various titles, and can be used for a variety of research and analysis purposes. Whether you are an anime enthusiast, a data analyst, or a machine learning researcher, this data set has something to offer.

  20. World Athletics Marathon Ranking List

    • kaggle.com
    Updated Aug 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Caraciolo (2023). World Athletics Marathon Ranking List [Dataset]. https://www.kaggle.com/datasets/marcelcaraciolo/world-athletics-marathon-ranking-list/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 5, 2023
    Dataset provided by
    Kaggle
    Authors
    Marcel Caraciolo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    The World Athletics, previously known as the International Amateur Athletic Federation and is the international governing organization for the sport of athletics covering from track and field and several running modalities (road, race walking, ultra, mountain running, etc). One of the World Atthletics tasks is to organize and publish a global ranking system to compare multiple athletes performances across a range of sports categories. By applying standardised compilation methods (under specific rules), it is therefore possible to evaluate the comparative quality of the participating fields at competitions of the same type and to produce competition performance rankings. The rankings are designed to recognize and celebrate the achievements of athletes participating in marathon events worldwide. The list takes into account various factors such as race results, timing, and the competitive level of the event.

    In this analysis we will focus on the World Athletics Marathon ranking list from 2019 until June 2023. Our goal is to evaluate the outstanding performances of the best marathon runners in the world. It is important to notice that this analysis will be limited to the listed athletes's performances acrosss different races and events recognized by the World Athletics organization. Many answers we will attempt to answer, such as the top countries that displays on the top 100 marathon runners, the countries evolution (based on the nationalities) on ranking from 2019-2023 (is Kenya really the country with the most top runners in the world ?), the age distribution for male and women and curiosities such the performance of Eliud Kipchoge (the fastest marathon runner in the world), the Brazilian performances and even for how long the athletes can keep his name in the ranking list.

    Motivation

    My name is Marcel Caraciolo, and currently doing a Data Science Specialization at the Cesar School, a famous technology university at Recife, Pernambuco Brazil. This project is part of the evaluation of a discipline named 'Data Visualization' ministered by the professor Eronides Neto. The initial reason is to apply data exploratory and visualization techniques on in sports analytics, and since I am marathon enthusiast and a passioned runner, I would like to understand the athetes profiles of the best marathoners in the world. This analyis could be useful for anyone interested to get a current data snapshot of the marathon performances and furthermore as basis for enthusiasts and journalists interested in data sports analytics.

    Datasets

    For this study, I had to scrape the website of World of Athletics, the organization that provides the marathon ranking lists. The data in original form can be found here. The parsed data can be found here at Kaggle webpage.

    Parsing and preparing the data provided was a little challenging, wince I needed to loop over all the marathon ranking lists organized by month-date and sex. For each ranking list I also had to loop over all the pages since the ranking was split into a table of 50 rows per page. All the data result files of the World Athletics ranking list over the past 4 years (January 2019 - June 2023) is saved as comma-separated text files. After a second analysis at the ranking lists I could also find some stats about the races considered to compute the ranking score. I could extract the race description, the date of the event and the race type (marathon (42km) or half-marathon (21km)).

    The data scraping notebook can be found following this link:

    Data Dictionary

    Data Dictionary for worldathletics/RANKINGDATE_SEX_WORLDATHLETICS_MARATHON_RANKINGS.csv

    rank,competitor,dob,nat,score,events,competitor_id,sex,rank_date

    VariableDefinitionKeyNotes
    rankPosition in the World Athletics Marathon Ranking list1,2,3..Integer
    competitorName of the AthleteJoshua Eliud, ...
    dobBirth date ...
  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patricia Acosta-Vargas (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1

(Dataset) The most visited health websites in the world

Explore at:
Dataset updated
Jan 11, 2021
Authors
Patricia Acosta-Vargas
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
World
Description

Evaluation of the most visited health websites in the world

Search
Clear search
Close search
Google apps
Main menu