20 datasets found

m
(Dataset) The most visited health websites in the world
data.mendeley.com
narcis.nl
Updated Jan 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patricia Acosta-Vargas (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1
Explore at:
Unique identifier
https://doi.org/10.17632/n468trh5my.1
Dataset updated
Jan 11, 2021
Authors
Patricia Acosta-Vargas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Evaluation of the most visited health websites in the world
A
‘Popular Website Traffic Over Time ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/62549059/?iid=003-357&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

Methodology

The data collected originates from SimilarWeb.com.

Source

For the analysis and study, go to The Concept Center

This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

How to use this dataset

Analyze 11/1/2016 in relation to 2/1/2017

Study the influence of 4/1/2017 on 1/1/2017

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
d
Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...
datarade.ai
.csv
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Consumer Edge (2025). Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B Shopper Insights | 59 Countries, 3-Day Lag, Daily Delivery [Dataset]. https://datarade.ai/data-products/click-global-data-web-traffic-data-transaction-data-con-consumer-edge
Explore at:
.csvAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Consumer Edge
Area covered
Marshall Islands, Bermuda, Congo, South Africa, Bosnia and Herzegovina, Nauru, El Salvador, Sri Lanka, Finland, Montserrat
Description
Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights

Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.

Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.

Use Case: Analyze Year Over Year Growth Rate by Region

Problem A public investor wants to understand how a company’s year-over-year growth differs by region.

Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends

Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume

Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels

Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.

Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer
Top 100 Batsman
kaggle.com
Updated Jan 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Wadwani (2023). Top 100 Batsman [Dataset]. http://doi.org/10.34740/kaggle/dsv/4824563
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4824563
Dataset updated
Jan 8, 2023
Dataset provided by
Kaggle
Authors
Rahul Wadwani
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen with the best performance level at the top of the dataset, indicating that the player who has performed the best has been ranked in the following top100batsman.csv file. This dataset has only the top 100 players This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen This a web-scraped dataset collected from the cricbuzz website contains the top 100 batsmen with the best performance level at the top of the dataset, indicating that the player who has performed the best has been ranked in the following top100batsman.csv file. This dataset has only the top 100 players who has completed the best in the field of test cricket and the data is collected on 7th January 2023.

Dataset contains:- test_ranking: this column contains the current test ranking of the player. player id : this column contains the player id which is unique and specified according to cricbuzz batsman : this column contains the name of the batsman to date rating : this column is provided by the ICC team: this column deals with the name of the team from which the player belongs. matches : this column: this column is the number of matches played by the player till date innings : innings deals with the number of times in a match the player has batted runs:total number of runs scored by the batsman high_score : highest score achieved by a batsman average : it is the ratio of total number of runs scored to the number of times the batsman got out. strike_rate: this the overall strike rate of the batsman which is calculated by runs scored divided by the ball played century @[💯](100) : number of centuries scored by the batsman double_century : number of double centuries scored by the batsman h scored by the batsman half_century : number of half_century scored by the batsman fours : total number of fours hit till date sixes : total number of sixes hit till date
Google Landmarks Dataset v2
github.com
opendatalab.com
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Recipes dataset from allrecipes
crawlfeeds.com
csv, zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Recipes dataset from allrecipes [Dataset]. https://crawlfeeds.com/datasets/recipes-dataset-from-allrecipes
Explore at:
zip, csvAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Unleash the culinary potential with our comprehensive Recipes dataset from Allrecipes. This dataset provides detailed information on a vast collection of recipes sourced from Allrecipes, one of the world's most popular recipe websites. Ideal for chefs, food enthusiasts, developers, and data scientists, this dataset offers an extensive range of culinary possibilities.

The dataset includes key details such as recipe titles, ingredients, preparation instructions, cooking times, user ratings, and dietary categories. With recipes spanning various cuisines, dietary preferences, and meal types, this dataset is a valuable resource for creating recipe apps, conducting nutritional analysis, or exploring new culinary trends.

Looking for more data to fuel your food-related projects? Check out our Food & Beverage Data for diverse datasets designed to inspire and empower innovation in the food and beverage industry.

Enhance your food-related projects with structured, high-quality data from Allrecipes. Whether developing a recipe recommendation engine, building a food blog, or researching cooking trends, this dataset is your go-to resource for delicious inspiration and data-driven culinary insights.
Global Starlink Web Cache Latency & Traceroute Measurement Dataset
zenodo.org
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Zhang; Qi Zhang; Zeqi Lai; Zeqi Lai; Qian Wu; Qian Wu; Jihao Li; Jihao Li; HEWU LI; HEWU LI (2025). Global Starlink Web Cache Latency & Traceroute Measurement Dataset [Dataset]. http://doi.org/10.5281/zenodo.14800115
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14800115
Dataset updated
Feb 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Qi Zhang; Qi Zhang; Zeqi Lai; Zeqi Lai; Qian Wu; Qian Wu; Jihao Li; Jihao Li; HEWU LI; HEWU LI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains global web cache latency measurements collected via RIPE Atlas probes equipped with Starlink terminals across five continents, spanning over 24 hours and resulting in ~2 Million measurements. The measurements aim to evaluate the user-perceived latency of accessing popular websites through low-earth orbit (LEO) satellite networks.

This dataset is a product of Spache, a research project on web caching from space. Please refer to its WWW'25 paper for more details and analysis results.

Dataset File Content

The dataset includes the following files:

Metadata

Target website list: A list of the top 50 most popular websites according to Alexa ranking.

RIPE Atlas Measurement IDs: For each website, the corresponding RIPE Atlas Measurement IDs for both Ping and Traceroute measurements are provided.

Note: microsoftonline.com (originally ranked 41st) is not included in the list due to its unresolvable domain name.

Measurement results - Raw Data

Ping and Traceroute results: Raw measurement results for each target website, including detailed information on each measurement.

Note: For details on the measurement result formats, please refer to the RIPE Atlas documentation.

Measurement results - Preprocessed Latency

Ping RTT latency: Preprocessed data containing the minimum RTT (Round Trip Time, in milliseconds) for each Ping measurement to all target websites.

Probe information: Corresponding Probe IDs, along with their respective countries and continents at the time of measurement.

This dataset is intended to support research on web caching, particularly in the context of satellite Internet. Please cite both this dataset and the associated paper if you find this data useful.
WebBench
huggingface.co
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Halluminate (2025). WebBench [Dataset]. https://huggingface.co/datasets/Halluminate/WebBench
Explore at:
Dataset updated
May 28, 2025
Dataset provided by
Halluminate, Inc.
Authors
Halluminate
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Web Bench: A real-world benchmark for Browser Agents

WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025

Dataset Composition

Category Description Example Count (% of dataset)

READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.
d
Grips Competitive Intelligence (global e-commerce data)
datarade.ai
Updated Jul 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grips Intelligence (2023). Grips Competitive Intelligence (global e-commerce data) [Dataset]. https://datarade.ai/data-products/grips-competitive-intelligence-global-e-commerce-data-grips-intelligence
Explore at:
Dataset updated
Jul 17, 2023
Dataset authored and provided by
Grips Intelligence
Area covered
United States of America, United Kingdom, Germany
Description
Website visitation is nice, but sales and revenue are better. Grips tracks e-commerce-based sales across 5,000+ product categories, 30k retailers, and brands, enabling you to understand market size, share, opportunities, and threats.

Use Cases

Domain e-commerce performance Harness the power of data-driven analysis to evaluate critical metrics such as revenue, average order value (AOV), conversion rate, channels, and product assortment for an extensive selection of 30,000 leading e-commerce retailers, enabling you to make strategic decisions and stay ahead in the dynamic online marketplace.

Product Category e-commerce performance Unlock the potential of your business with our game-changing Share of Wallet analysis. Gain valuable insights into the market size and growth of over 5000+ product categories, as well as your retailer or brand's market share within each category.

Brand e-commerce performance Gain deep insights into the market size, share, and revenue growth of 30,000 top e-commerce brands in the digital ecosystem, exploring key metrics such as units sold, average price, and more. Empower your business with comprehensive data to make informed decisions and capitalize on lucrative opportunities in the ever-evolving online marketplace.

Data Methodology

We have a unique mix of sources from where we gather digital signals.

Raw data collection - we have developed several productivity tools, including Retailer Benchmarking, which collectively create the world’s largest transactional dataset - public data captured from millions of sites and partnerships with top data providers.

Data processing - cleaning and formatting, classification of products, sites and more preparation for the modelling phase.

Data modeling: from the billions of digital signals we extrapolate in detail how global e-commerce sites and products are performing.

7-day free trial available Sign up for free at: https://gripsintelligence.com/
R
Buttondetection2 Dataset
universe.roboflow.com
zip
Updated Jul 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elsa Weber (2022). Buttondetection2 Dataset [Dataset]. https://universe.roboflow.com/elsa-weber/buttondetection2/model/13
Explore at:
zipAvailable download formats
Dataset updated
Jul 26, 2022
Dataset authored and provided by
Elsa Weber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Clickable Elements Bounding Boxes
Description
Here are a few use cases for this project:

Accessibility Enhancement: ButtonDetection2 can be used to improve the accessibility of websites and apps for visually impaired users. By identifying clickable elements such as links, buttons, and fields, the model can help screen readers and other assistive technologies better understand the interface and guide users through the navigation process more effectively.

Automated UI Testing: ButtonDetection2 can be employed to automate user interface testing for websites and apps. By identifying clickable elements, the model can streamline the testing process by automatically clicking buttons, links, and fields to ensure that they function as expected, reducing manual efforts and speeding up the overall QA process.

UX Analysis and Optimization: ButtonDetection2 can be used by UX designers and developers to analyze and optimize the design of websites and apps. By detecting clickable elements, the model can help identify areas of the interface that may be confusing or difficult for users to interact with, providing insights for designing more user-friendly experiences.

Web Scraping/Data Extraction: ButtonDetection2 can be employed for web scraping and data extraction tasks. The model can identify clickable elements within webpages, facilitating automated extraction of relevant data such as product details, contact information, or event details by navigating through the appropriate links, buttons, and fields within the site.

Augmented Reality Navigation: ButtonDetection2 can be integrated into augmented reality applications to enhance real-world interactions with digital interfaces. By detecting clickable elements such as buttons and links, the model can overlay visual indicators or audio cues on top of the real-world view, providing users with a more intuitive way to interact with digital interfaces in AR environments.
p
Qatar Number Dataset
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Qatar Number Dataset [Dataset]. https://listtodata.com/qatar-dataset
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Dataset authored and provided by
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Bahrain, Qatar
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Qatar number dataset can directly send your offers, and it will indeed promote your business at the highest level. Even more, you can use this database on any CRM platform. All of these parts working together will give you a respectable profit margin. We can provide lists based on your needs and uphold all business rules. Qatar number dataset only contains authentic data. List to Data is one of the websites that can provide you with the most reliable information, as was previously said. Therefore, it is guaranteed that you will receive nearly no bounce-back data from this source. We are here to help our clients grow their online businesses. Also, you can get a good and instant return on investment(ROI). Qatar phone data is now a basic need for businesses. Without telemarketing and SMS marketing no one can grow at this time. So, this database is heavily required at this time. From all across the world, our organization has gathered millions of phone number lists for both businesses and consumers. To launch your business in Qatar, you can acquire this dataset. Qatar phone data will come to you at an extremely low budget and will solve your marketing issue. To make it more simple you can choose your targeted database while launching your items. We also create contact directories using business area categories. List to Data is aware of updating the database, therefore if any false information was ever added, we promptly removed it. Qatar phone number list is a genuine dataset. This will provide you with the best and most increasingly effective details when you conduct internet marketing. After purchase, you can instantly download the file, which will come to you in an Excel or CSV format. If anyone wants to make a huge profit they can ignore the Qatar phone number list. In the end, Qatar phone number list is the product that you need now. You can also view the other products on our website and get more information there. Although the product is an easy-to-buy service, the price is also fixed. This contact address will indeed generate more revenue for you, and you can see your business at the top in a short amount of time.
z
Controlled Anomalies Time Series (CATS) Dataset
zenodo.org
bin, csv
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.8338435
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8338435
Dataset updated
Jul 11, 2024
Dataset provided by
Solenix Engineering GmbH
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

Different types of anomalies to understand what anomaly types can be detected by different approaches. The categories are available in the dataset and in the metadata.

Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

Suitable for root cause analysis. In addition to the anomaly category, the time series channel in which the anomaly first developed itself is recorded and made available as part of the metadata. This can be useful to evaluate the performance of algorithm to trace back anomalies to the right root cause channel.

Affected channels. In addition to the knowledge of the root cause channel in which the anomaly first developed itself, we provide information of channels possibly affected by the anomaly. This can also be useful to evaluate the explainability of anomaly detection systems which may point out to the anomalous channels (root cause and affected).

Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

Change Log

Version 2

Metadata: we include a metadata.csv with information about:

Anomaly categories

Root cause channel (signal in which the anomaly is first visible)

Affected channel (signal in which the anomaly might propagate) through coupled system dynamics

Removal of anomaly overlaps: version 1 contained anomalies which overlapped with each other resulting in only 190 distinct anomalous segments. Now, there are no more anomaly overlaps.

Two data files: CSV and parquet for convenience.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
NFL Play Statistics dataset (secondary)
kaggle.com
Updated Apr 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Todd Steussie (2020). NFL Play Statistics dataset (secondary) [Dataset]. https://www.kaggle.com/toddsteussie/nfl-play-statistics-secondary-datasets/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Todd Steussie
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
NFL is one of the most popular sports in the world. Many of us are stat geeks who understanding not what just happened but also who and why. This NFL dataset provides a comprehensive view of NFL games, statistics, participation, and much more. The dataset includes NFL play data from 2004 to the present.

This NFL dataset provides play-by-play data from the 2004 to 2019 seasons. Dataset also includes play and participation information for players, coaches, and game officials. Additional data tables included in this file includes NFL Draft from 1989 to present, NFL Combine 1999 to present, NFL rosters from 1998 to present, NFL schedules, stadium information and much more. The granularity of NFL statistics varies by NFL season. The current version of NFL statistics has been collected since 2012. All information sources used to create this dataset are from publically accessible websites and the NFL GSIS dataset.

All information sources used to create this dataset are from publically accessible websites and NFL documentation. Although my current life is focused on data science, this project has a special place in my heart, since it links my previous profession in the NFL with my current passion for data analysis.

Imgur Most Viral and Secret Santa

kaggle.com

Updated Apr 18, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Ghalib93 (2020). Imgur Most Viral and Secret Santa [Dataset]. https://www.kaggle.com/ghalib93/imgur-most-viral-and-secret-santa/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 18, 2020

Dataset provided by

Kaggle

Authors

Ghalib93

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Imgur is an image hosting and sharing website founded in 2009. It became one of the most popular websites around the world with approximately 250 million users. The website does not require registration and anyone can browse its content. However, to be able to post an account must be created. It is famous for an event that it created in 2013 where members get to register to send/receive gifts from other members on the website. The event takes place during Christmas time and people share their gifts via the website where they post pictures of the process or what they received in a specific tag. Today the data provided covers two sections that I think are important to understanding certain patterns within the Imgur community. The first is the Most Viral section and the second is the Secret Santa tag.

I have participated twice in The Imgur secret Santa event and always found funny and interesting post from its most viral section. I would like with the help of the Kaggle community to identify trends from the data provided and maybe make a comparison between the Secret Santa data and the most viral.

Content

There are two Dataframes included and they are almost identical in the number of columns:

The first Dataframe is Imgur Most Viral posts. This contains many of the posts that were labelled as Viral by The Imgur community and team using specific algorithms to track number of likes and dislikes across multiple platforms. The posts might be videos, gifs, pictures or just text.

The second Dataframe is Imgur Secret Santa Tag. Secret Santa is an annual Imgur tradition where members can sign up to send gifts to and receive gifts from other members during the Christmas holiday.This contains many of the posts that were tagged with Secret Santa by the Imgur community. The posts might be videos, gifs, pictures or just text. There is a (is_viral) column in this Dataframe that is not available in the Most Viral Dataframe since all of the posts there are viral.

Data Dictionary

Feature	Type	Dataset	Description
account_id	object	Imgur_Viral/imgur_secret_santa	Unique Account ID per member
comment_count	float64	Imgur_Viral/imgur_secret_santa	Number of comments made in the post
datetime	float64	Imgur_Viral/imgur_secret_santa	TimeStamp containing Date and Time Details
downs	float64	Imgur_Viral/imgur_secret_santa	Number of dislikes for the post
favorite_count	float64	Imgur_Viral/imgur_secret_santa	Number of user that marked the post as a favourite
id	object	Imgur_Viral/imgur_secret_santa	Uniqe Post ID. Even if it was posted by the same member, different posts will have different IDs
images_count	float64	Imgur_Viral/imgur_secret_santa	Number of images included in the post
points	float64	Imgur_Viral/imgur_secret_santa	Each post will have calculated points based on (ups - downs)
score	float64	Imgur_Viral/imgur_secret_santa	Ticket number
tags	object	Imgur_Viral/imgur_secret_santa	Tags are sub albums that the post will show under
title	object	Imgur_Viral/imgur_secret_santa	Title of the post
ups	float64	Imgur_Viral/imgur_secret_santa	Number of likes for the post
views	float64	Imgur_Viral/imgur_secret_santa	Number of people that viewed the post
is_most_viral	boolean	imgur_secret_santa	If the post is viral or not

Acknowledgements

I would like to thank imgur for providing an API that made collecting data easier from its website. With their help we might be able to better understand certain trends that emerge from its community

Inspiration

There is no problem to solve from this data, but it just a fun way to explore and learn more about programming and analyzing data. I hope you enjoy playing with the data as much as I did collecting it and browsing the website

Average daily time spent on social media worldwide 2012-2025
statista.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
Top 3000+ Cryptocurrency Dataset
kaggle.com
Updated Apr 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Top 3000+ Cryptocurrency Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/cryptocurrency-dataset-2021-395-types-of-crypto
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.

Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database

A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.

Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.

Content

This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023

Structure of the Dataset

https://i.imgur.com/qGVJaHl.png" alt="">

Acknowledgements

This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.

Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/
YouTube's Channels Dataset
kaggle.com
Updated Mar 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HarshitHGupta (2021). YouTube's Channels Dataset [Dataset]. https://www.kaggle.com/datasets/harshithgupta/youtubes-channels-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HarshitHGupta
Area covered
YouTube
Description
Context

YouTube is an American online video-sharing platform headquartered in San Bruno, California. The service, created in February 2005 by three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim—was bought by Google in November 2006 for US$1.65 billion and now operates as one of the company's subsidiaries. YouTube is the second most-visited website after Google Search, according to Alexa Internet rankings.

YouTube allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Available content includes video clips, TV show clips, music videos, short and documentary films, audio recordings, movie trailers, live streams, video blogging, short original videos, and educational videos.

YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments, and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.

This dataset is a daily record of the top trending YouTube videos.

Note that this dataset is a structurally improved version of this dataset.

Acknowledgements

This dataset was collected using the YouTube API. This Description is cited in Wikipedia.
MyAnimeList - Anime Dataset with Reviews
kaggle.com
Updated Mar 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harsh Raj (2023). MyAnimeList - Anime Dataset with Reviews [Dataset]. https://www.kaggle.com/datasets/ansh0007/myanimelist-anime-dataset-with-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Harsh Raj
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
The Kaggle data set "Anime Comments Scrapped from https://myanimelist.net" is a valuable resource for anyone interested in exploring the world of anime. It is a collection of comments and reviews on various anime titles, sourced from the popular anime review website MyAnimeList. The data set was scraped using the Octoparse software, which is a powerful web scraping tool used to extract data from websites.

The data set contains five columns of information, namely S.no, Title, Date of comment, User name, and text. The S.no column contains a unique identifier for each comment in the data set, while the Title column contains the name of the anime being reviewed. The Date of comment column indicates the date when the comment was posted, while the User name column shows the username of the person who posted the comment. Finally, the text column contains the actual comment or review left by the user on the anime in question.

The data set is a great resource for anyone looking to analyze or explore anime-related content. Researchers and analysts can use the data set to gain insights into the opinions and sentiments of anime fans towards various titles. For example, one can use the data set to analyze which anime titles are the most popular or controversial among fans, and why. Similarly, researchers can analyze how the opinions and sentiments of anime fans have changed over time for specific anime titles.

Another potential use case for the data set is in building recommendation systems for anime fans. By analyzing the text column of the data set, one can extract information about what anime fans like or dislike about certain anime titles. This information can then be used to build recommendation systems that suggest new anime titles to fans based on their preferences.

The data set can also be used to build natural language processing (NLP) models for sentiment analysis. By training NLP models on the comments and reviews in the data set, researchers can build algorithms that automatically classify comments as positive, negative, or neutral. These models can then be used to analyze large volumes of comments and reviews quickly and efficiently.

Furthermore, the data set can be used to perform network analyses of the relationships between anime titles and users. By analyzing which anime titles are reviewed or commented on by which users, one can identify clusters of users with similar tastes in anime. These clusters can then be used to build communities of anime fans with similar tastes, and to facilitate discussions and recommendations between these users.

Another important point to note about the "Anime Comments Scrapped from https://myanimelist.net" data set is that it contains a large number of comments. Specifically, the data set includes over 30,000 comments on various anime titles. This makes the data set a rich source of information for anyone looking to perform large-scale analyses or build machine learning models.

Overall, the "Anime Comments Scrapped from https://myanimelist.net" data set is a valuable resource for anyone interested in exploring the world of anime. It contains a wealth of information on the opinions and sentiments of anime fans towards various titles, and can be used for a variety of research and analysis purposes. Whether you are an anime enthusiast, a data analyst, or a machine learning researcher, this data set has something to offer.
World Athletics Marathon Ranking List
kaggle.com
Updated Aug 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Caraciolo (2023). World Athletics Marathon Ranking List [Dataset]. https://www.kaggle.com/datasets/marcelcaraciolo/world-athletics-marathon-ranking-list/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 5, 2023
Dataset provided by
Kaggle
Authors
Marcel Caraciolo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction

The World Athletics, previously known as the International Amateur Athletic Federation and is the international governing organization for the sport of athletics covering from track and field and several running modalities (road, race walking, ultra, mountain running, etc). One of the World Atthletics tasks is to organize and publish a global ranking system to compare multiple athletes performances across a range of sports categories. By applying standardised compilation methods (under specific rules), it is therefore possible to evaluate the comparative quality of the participating fields at competitions of the same type and to produce competition performance rankings. The rankings are designed to recognize and celebrate the achievements of athletes participating in marathon events worldwide. The list takes into account various factors such as race results, timing, and the competitive level of the event.

In this analysis we will focus on the World Athletics Marathon ranking list from 2019 until June 2023. Our goal is to evaluate the outstanding performances of the best marathon runners in the world. It is important to notice that this analysis will be limited to the listed athletes's performances acrosss different races and events recognized by the World Athletics organization. Many answers we will attempt to answer, such as the top countries that displays on the top 100 marathon runners, the countries evolution (based on the nationalities) on ranking from 2019-2023 (is Kenya really the country with the most top runners in the world ?), the age distribution for male and women and curiosities such the performance of Eliud Kipchoge (the fastest marathon runner in the world), the Brazilian performances and even for how long the athletes can keep his name in the ranking list.

Motivation

My name is Marcel Caraciolo, and currently doing a Data Science Specialization at the Cesar School, a famous technology university at Recife, Pernambuco Brazil. This project is part of the evaluation of a discipline named 'Data Visualization' ministered by the professor Eronides Neto. The initial reason is to apply data exploratory and visualization techniques on in sports analytics, and since I am marathon enthusiast and a passioned runner, I would like to understand the athetes profiles of the best marathoners in the world. This analyis could be useful for anyone interested to get a current data snapshot of the marathon performances and furthermore as basis for enthusiasts and journalists interested in data sports analytics.

Datasets

For this study, I had to scrape the website of World of Athletics, the organization that provides the marathon ranking lists. The data in original form can be found here. The parsed data can be found here at Kaggle webpage.

World Athletics Marathon Performance Ranking list 2019-2023

Parsing and preparing the data provided was a little challenging, wince I needed to loop over all the marathon ranking lists organized by month-date and sex. For each ranking list I also had to loop over all the pages since the ranking was split into a table of 50 rows per page. All the data result files of the World Athletics ranking list over the past 4 years (January 2019 - June 2023) is saved as comma-separated text files. After a second analysis at the ranking lists I could also find some stats about the races considered to compute the ranking score. I could extract the race description, the date of the event and the race type (marathon (42km) or half-marathon (21km)).

The data scraping notebook can be found following this link:

Data Dictionary

Data Dictionary for worldathletics/RANKINGDATE_SEX_WORLDATHLETICS_MARATHON_RANKINGS.csv

rank,competitor,dob,nat,score,events,competitor_id,sex,rank_date

Variable Definition Key Notes
rank Position in the World Athletics Marathon Ranking list 1,2,3.. Integer
competitor Name of the Athlete Joshua Eliud, ...
dob Birth date ...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Variable	Definition	Key	Notes
rank	Position in the World Athletics Marathon Ranking list	1,2,3..	Integer
competitor	Name of the Athlete	Joshua Eliud, ...
dob	Birth date ...

Facebook

Twitter

Click to copy link

Link copied

Cite

Patricia Acosta-Vargas (2021). (Dataset) The most visited health websites in the world [Dataset]. http://doi.org/10.17632/n468trh5my.1

(Dataset) The most visited health websites in the world

Explore at:

Unique identifier

https://doi.org/10.17632/n468trh5my.1

Dataset updated

Jan 11, 2021

Authors

Patricia Acosta-Vargas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

World

Description

Evaluation of the most visited health websites in the world

Clear search

Close search

Google apps

Main menu

(Dataset) The most visited health websites in the world

‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...

Top 100 Batsman

Google Landmarks Dataset v2

Recipes dataset from allrecipes

Global Starlink Web Cache Latency & Traceroute Measurement Dataset

Dataset File Content

Metadata

Measurement results - Raw Data

Measurement results - Preprocessed Latency

WebBench

Grips Competitive Intelligence (global e-commerce data)

Buttondetection2 Dataset

Qatar Number Dataset

Controlled Anomalies Time Series (CATS) Dataset

Number of internet users worldwide 2014-2029

NFL Play Statistics dataset (secondary)

Imgur Most Viral and Secret Santa

Context

Content

Data Dictionary

Acknowledgements

Inspiration

Average daily time spent on social media worldwide 2012-2025

Top 3000+ Cryptocurrency Dataset

Context

Content

Structure of the Dataset

Acknowledgements

YouTube's Channels Dataset

Context

Acknowledgements

MyAnimeList - Anime Dataset with Reviews

World Athletics Marathon Ranking List

Introduction

Motivation

Datasets

Data Dictionary

(Dataset) The most visited health websites in the world