Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Evaluation of the most visited health websites in the world
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---
Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights
Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.
Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.
Use Case: Analyze Year Over Year Growth Rate by Region
Problem A public investor wants to understand how a company’s year-over-year growth differs by region.
Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends
Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume
Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels
Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.
Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen with the best performance level at the top of the dataset, indicating that the player who has performed the best has been ranked in the following top100batsman.csv file. This dataset has only the top 100 players This web-scraped dataset collected from the cricbuzz website contains all the top 100 batsmen This a web-scraped dataset collected from the cricbuzz website contains the top 100 batsmen with the best performance level at the top of the dataset, indicating that the player who has performed the best has been ranked in the following top100batsman.csv file. This dataset has only the top 100 players who has completed the best in the field of test cricket and the data is collected on 7th January 2023.
Dataset contains:- test_ranking: this column contains the current test ranking of the player. player id : this column contains the player id which is unique and specified according to cricbuzz batsman : this column contains the name of the batsman to date rating : this column is provided by the ICC team: this column deals with the name of the team from which the player belongs. matches : this column: this column is the number of matches played by the player till date innings : innings deals with the number of times in a match the player has batted runs:total number of runs scored by the batsman high_score : highest score achieved by a batsman average : it is the ratio of total number of runs scored to the number of times the batsman got out. strike_rate: this the overall strike rate of the batsman which is calculated by runs scored divided by the ball played century @[💯](100) : number of centuries scored by the batsman double_century : number of double centuries scored by the batsman h scored by the batsman half_century : number of half_century scored by the batsman fours : total number of fours hit till date sixes : total number of sixes hit till date
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unleash the culinary potential with our comprehensive Recipes dataset from Allrecipes. This dataset provides detailed information on a vast collection of recipes sourced from Allrecipes, one of the world's most popular recipe websites. Ideal for chefs, food enthusiasts, developers, and data scientists, this dataset offers an extensive range of culinary possibilities.
The dataset includes key details such as recipe titles, ingredients, preparation instructions, cooking times, user ratings, and dietary categories. With recipes spanning various cuisines, dietary preferences, and meal types, this dataset is a valuable resource for creating recipe apps, conducting nutritional analysis, or exploring new culinary trends.
Looking for more data to fuel your food-related projects? Check out our Food & Beverage Data for diverse datasets designed to inspire and empower innovation in the food and beverage industry.
Enhance your food-related projects with structured, high-quality data from Allrecipes. Whether developing a recipe recommendation engine, building a food blog, or researching cooking trends, this dataset is your go-to resource for delicious inspiration and data-driven culinary insights.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains global web cache latency measurements collected via RIPE Atlas probes equipped with Starlink terminals across five continents, spanning over 24 hours and resulting in ~2 Million measurements. The measurements aim to evaluate the user-perceived latency of accessing popular websites through low-earth orbit (LEO) satellite networks.
This dataset is a product of Spache, a research project on web caching from space. Please refer to its WWW'25 paper for more details and analysis results.
The dataset includes the following files:
This dataset is intended to support research on web caching, particularly in the context of satellite Internet. Please cite both this dataset and the associated paper if you find this data useful.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Web Bench: A real-world benchmark for Browser Agents
WebBench is an open, task-oriented benchmark that measures how well browser agents handle realistic web workflows. It contains 2 ,454 tasks spread across 452 live websites selected from the global top-1000 by traffic. Last updated: May 28, 2025
Dataset Composition
Category Description Example Count (% of dataset)
READ Tasks that require searching and extracting information “Navigate to the news section and… See the full description on the dataset page: https://huggingface.co/datasets/Halluminate/WebBench.
Website visitation is nice, but sales and revenue are better. Grips tracks e-commerce-based sales across 5,000+ product categories, 30k retailers, and brands, enabling you to understand market size, share, opportunities, and threats.
Use Cases
Domain e-commerce performance Harness the power of data-driven analysis to evaluate critical metrics such as revenue, average order value (AOV), conversion rate, channels, and product assortment for an extensive selection of 30,000 leading e-commerce retailers, enabling you to make strategic decisions and stay ahead in the dynamic online marketplace.
Product Category e-commerce performance Unlock the potential of your business with our game-changing Share of Wallet analysis. Gain valuable insights into the market size and growth of over 5000+ product categories, as well as your retailer or brand's market share within each category.
Brand e-commerce performance Gain deep insights into the market size, share, and revenue growth of 30,000 top e-commerce brands in the digital ecosystem, exploring key metrics such as units sold, average price, and more. Empower your business with comprehensive data to make informed decisions and capitalize on lucrative opportunities in the ever-evolving online marketplace.
Data Methodology
We have a unique mix of sources from where we gather digital signals.
Raw data collection - we have developed several productivity tools, including Retailer Benchmarking, which collectively create the world’s largest transactional dataset - public data captured from millions of sites and partnerships with top data providers.
Data processing - cleaning and formatting, classification of products, sites and more preparation for the modelling phase.
Data modeling: from the billions of digital signals we extrapolate in detail how global e-commerce sites and products are performing.
7-day free trial available Sign up for free at: https://gripsintelligence.com/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Accessibility Enhancement: ButtonDetection2 can be used to improve the accessibility of websites and apps for visually impaired users. By identifying clickable elements such as links, buttons, and fields, the model can help screen readers and other assistive technologies better understand the interface and guide users through the navigation process more effectively.
Automated UI Testing: ButtonDetection2 can be employed to automate user interface testing for websites and apps. By identifying clickable elements, the model can streamline the testing process by automatically clicking buttons, links, and fields to ensure that they function as expected, reducing manual efforts and speeding up the overall QA process.
UX Analysis and Optimization: ButtonDetection2 can be used by UX designers and developers to analyze and optimize the design of websites and apps. By detecting clickable elements, the model can help identify areas of the interface that may be confusing or difficult for users to interact with, providing insights for designing more user-friendly experiences.
Web Scraping/Data Extraction: ButtonDetection2 can be employed for web scraping and data extraction tasks. The model can identify clickable elements within webpages, facilitating automated extraction of relevant data such as product details, contact information, or event details by navigating through the appropriate links, buttons, and fields within the site.
Augmented Reality Navigation: ButtonDetection2 can be integrated into augmented reality applications to enhance real-world interactions with digital interfaces. By detecting clickable elements such as buttons and links, the model can overlay visual indicators or audio cues on top of the real-world view, providing users with a more intuitive way to interact with digital interfaces in AR environments.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Qatar number dataset can directly send your offers, and it will indeed promote your business at the highest level. Even more, you can use this database on any CRM platform. All of these parts working together will give you a respectable profit margin. We can provide lists based on your needs and uphold all business rules. Qatar number dataset only contains authentic data. List to Data is one of the websites that can provide you with the most reliable information, as was previously said. Therefore, it is guaranteed that you will receive nearly no bounce-back data from this source. We are here to help our clients grow their online businesses. Also, you can get a good and instant return on investment(ROI). Qatar phone data is now a basic need for businesses. Without telemarketing and SMS marketing no one can grow at this time. So, this database is heavily required at this time. From all across the world, our organization has gathered millions of phone number lists for both businesses and consumers. To launch your business in Qatar, you can acquire this dataset. Qatar phone data will come to you at an extremely low budget and will solve your marketing issue. To make it more simple you can choose your targeted database while launching your items. We also create contact directories using business area categories. List to Data is aware of updating the database, therefore if any false information was ever added, we promptly removed it. Qatar phone number list is a genuine dataset. This will provide you with the best and most increasingly effective details when you conduct internet marketing. After purchase, you can instantly download the file, which will come to you in an Excel or CSV format. If anyone wants to make a huge profit they can ignore the Qatar phone number list. In the end, Qatar phone number list is the product that you need now. You can also view the other products on our website and get more information there. Although the product is an easy-to-buy service, the price is also fixed. This contact address will indeed generate more revenue for you, and you can see your business at the top in a short amount of time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.
The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:
Change Log
Version 2
[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”
About Solenix
Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NFL is one of the most popular sports in the world. Many of us are stat geeks who understanding not what just happened but also who and why. This NFL dataset provides a comprehensive view of NFL games, statistics, participation, and much more. The dataset includes NFL play data from 2004 to the present.
This NFL dataset provides play-by-play data from the 2004 to 2019 seasons. Dataset also includes play and participation information for players, coaches, and game officials. Additional data tables included in this file includes NFL Draft from 1989 to present, NFL Combine 1999 to present, NFL rosters from 1998 to present, NFL schedules, stadium information and much more. The granularity of NFL statistics varies by NFL season. The current version of NFL statistics has been collected since 2012. All information sources used to create this dataset are from publically accessible websites and the NFL GSIS dataset.
All information sources used to create this dataset are from publically accessible websites and NFL documentation. Although my current life is focused on data science, this project has a special place in my heart, since it links my previous profession in the NFL with my current passion for data analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Imgur is an image hosting and sharing website founded in 2009. It became one of the most popular websites around the world with approximately 250 million users. The website does not require registration and anyone can browse its content. However, to be able to post an account must be created. It is famous for an event that it created in 2013 where members get to register to send/receive gifts from other members on the website. The event takes place during Christmas time and people share their gifts via the website where they post pictures of the process or what they received in a specific tag. Today the data provided covers two sections that I think are important to understanding certain patterns within the Imgur community. The first is the Most Viral section and the second is the Secret Santa tag.
I have participated twice in The Imgur secret Santa event and always found funny and interesting post from its most viral section. I would like with the help of the Kaggle community to identify trends from the data provided and maybe make a comparison between the Secret Santa data and the most viral.
There are two Dataframes included and they are almost identical in the number of columns:
The first Dataframe is Imgur Most Viral posts. This contains many of the posts that were labelled as Viral by The Imgur community and team using specific algorithms to track number of likes and dislikes across multiple platforms. The posts might be videos, gifs, pictures or just text.
The second Dataframe is Imgur Secret Santa Tag. Secret Santa is an annual Imgur tradition where members can sign up to send gifts to and receive gifts from other members during the Christmas holiday.This contains many of the posts that were tagged with Secret Santa by the Imgur community. The posts might be videos, gifs, pictures or just text. There is a (is_viral) column in this Dataframe that is not available in the Most Viral Dataframe since all of the posts there are viral.
Feature | Type | Dataset | Description |
---|---|---|---|
account_id | object | Imgur_Viral/imgur_secret_santa | Unique Account ID per member |
comment_count | float64 | Imgur_Viral/imgur_secret_santa | Number of comments made in the post |
datetime | float64 | Imgur_Viral/imgur_secret_santa | TimeStamp containing Date and Time Details |
downs | float64 | Imgur_Viral/imgur_secret_santa | Number of dislikes for the post |
favorite_count | float64 | Imgur_Viral/imgur_secret_santa | Number of user that marked the post as a favourite |
id | object | Imgur_Viral/imgur_secret_santa | Uniqe Post ID. Even if it was posted by the same member, different posts will have different IDs |
images_count | float64 | Imgur_Viral/imgur_secret_santa | Number of images included in the post |
points | float64 | Imgur_Viral/imgur_secret_santa | Each post will have calculated points based on (ups - downs) |
score | float64 | Imgur_Viral/imgur_secret_santa | Ticket number |
tags | object | Imgur_Viral/imgur_secret_santa | Tags are sub albums that the post will show under |
title | object | Imgur_Viral/imgur_secret_santa | Title of the post |
ups | float64 | Imgur_Viral/imgur_secret_santa | Number of likes for the post |
views | float64 | Imgur_Viral/imgur_secret_santa | Number of people that viewed the post |
is_most_viral | boolean | imgur_secret_santa | If the post is viral or not |
I would like to thank imgur for providing an API that made collecting data easier from its website. With their help we might be able to better understand certain trends that emerge from its community
There is no problem to solve from this data, but it just a fun way to explore and learn more about programming and analyzing data. I hope you enjoy playing with the data as much as I did collecting it and browsing the website
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange. Individual coin ownership records are stored in a ledger, which is a computerized database using strong cryptography to secure transaction records, to control the creation of additional coins, and to verify the transfer of coin ownership. Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity. Some crypto schemes use validators to maintain the cryptocurrency. In a proof-of-stake model, owners put up their tokens as collateral. In return, they get authority over the token in proportion to the amount they stake. Generally, these token stakes get additional ownership in the token overtime via network fees, newly minted tokens, or other such reward mechanisms.
Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database
A cryptocurrency is a tradable digital asset or digital form of money, built on blockchain technology that only exists online. Cryptocurrencies use encryption to authenticate and protect transactions, hence their name. There are currently over a thousand different cryptocurrencies in the world, and many see them as the key to a fairer future economy.
Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.
This Dataset is a collection of records of 3000+ Different Cryptocurrencies. * Top 395+ from 2021 * Top 3000+ from 2023
https://i.imgur.com/qGVJaHl.png" alt="">
This Data is collected from: https://finance.yahoo.com/. If you want to learn more, you can visit the Website.
Cover Photo by Worldspectrum: https://www.pexels.com/photo/ripple-etehereum-and-bitcoin-and-micro-sdhc-card-844124/
YouTube is an American online video-sharing platform headquartered in San Bruno, California. The service, created in February 2005 by three former PayPal employees—Chad Hurley, Steve Chen, and Jawed Karim—was bought by Google in November 2006 for US$1.65 billion and now operates as one of the company's subsidiaries. YouTube is the second most-visited website after Google Search, according to Alexa Internet rankings.
YouTube allows users to upload, view, rate, share, add to playlists, report, comment on videos, and subscribe to other users. Available content includes video clips, TV show clips, music videos, short and documentary films, audio recordings, movie trailers, live streams, video blogging, short original videos, and educational videos.
YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments, and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.
This dataset is a daily record of the top trending YouTube videos.
Note that this dataset is a structurally improved version of this dataset.
This dataset was collected using the YouTube API. This Description is cited in Wikipedia.
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The Kaggle data set "Anime Comments Scrapped from https://myanimelist.net" is a valuable resource for anyone interested in exploring the world of anime. It is a collection of comments and reviews on various anime titles, sourced from the popular anime review website MyAnimeList. The data set was scraped using the Octoparse software, which is a powerful web scraping tool used to extract data from websites.
The data set contains five columns of information, namely S.no, Title, Date of comment, User name, and text. The S.no column contains a unique identifier for each comment in the data set, while the Title column contains the name of the anime being reviewed. The Date of comment column indicates the date when the comment was posted, while the User name column shows the username of the person who posted the comment. Finally, the text column contains the actual comment or review left by the user on the anime in question.
The data set is a great resource for anyone looking to analyze or explore anime-related content. Researchers and analysts can use the data set to gain insights into the opinions and sentiments of anime fans towards various titles. For example, one can use the data set to analyze which anime titles are the most popular or controversial among fans, and why. Similarly, researchers can analyze how the opinions and sentiments of anime fans have changed over time for specific anime titles.
Another potential use case for the data set is in building recommendation systems for anime fans. By analyzing the text column of the data set, one can extract information about what anime fans like or dislike about certain anime titles. This information can then be used to build recommendation systems that suggest new anime titles to fans based on their preferences.
The data set can also be used to build natural language processing (NLP) models for sentiment analysis. By training NLP models on the comments and reviews in the data set, researchers can build algorithms that automatically classify comments as positive, negative, or neutral. These models can then be used to analyze large volumes of comments and reviews quickly and efficiently.
Furthermore, the data set can be used to perform network analyses of the relationships between anime titles and users. By analyzing which anime titles are reviewed or commented on by which users, one can identify clusters of users with similar tastes in anime. These clusters can then be used to build communities of anime fans with similar tastes, and to facilitate discussions and recommendations between these users.
Another important point to note about the "Anime Comments Scrapped from https://myanimelist.net" data set is that it contains a large number of comments. Specifically, the data set includes over 30,000 comments on various anime titles. This makes the data set a rich source of information for anyone looking to perform large-scale analyses or build machine learning models.
Overall, the "Anime Comments Scrapped from https://myanimelist.net" data set is a valuable resource for anyone interested in exploring the world of anime. It contains a wealth of information on the opinions and sentiments of anime fans towards various titles, and can be used for a variety of research and analysis purposes. Whether you are an anime enthusiast, a data analyst, or a machine learning researcher, this data set has something to offer.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Athletics, previously known as the International Amateur Athletic Federation and is the international governing organization for the sport of athletics covering from track and field and several running modalities (road, race walking, ultra, mountain running, etc). One of the World Atthletics tasks is to organize and publish a global ranking system to compare multiple athletes performances across a range of sports categories. By applying standardised compilation methods (under specific rules), it is therefore possible to evaluate the comparative quality of the participating fields at competitions of the same type and to produce competition performance rankings. The rankings are designed to recognize and celebrate the achievements of athletes participating in marathon events worldwide. The list takes into account various factors such as race results, timing, and the competitive level of the event.
In this analysis we will focus on the World Athletics Marathon ranking list from 2019 until June 2023. Our goal is to evaluate the outstanding performances of the best marathon runners in the world. It is important to notice that this analysis will be limited to the listed athletes's performances acrosss different races and events recognized by the World Athletics organization. Many answers we will attempt to answer, such as the top countries that displays on the top 100 marathon runners, the countries evolution (based on the nationalities) on ranking from 2019-2023 (is Kenya really the country with the most top runners in the world ?), the age distribution for male and women and curiosities such the performance of Eliud Kipchoge (the fastest marathon runner in the world), the Brazilian performances and even for how long the athletes can keep his name in the ranking list.
My name is Marcel Caraciolo, and currently doing a Data Science Specialization at the Cesar School, a famous technology university at Recife, Pernambuco Brazil. This project is part of the evaluation of a discipline named 'Data Visualization' ministered by the professor Eronides Neto. The initial reason is to apply data exploratory and visualization techniques on in sports analytics, and since I am marathon enthusiast and a passioned runner, I would like to understand the athetes profiles of the best marathoners in the world. This analyis could be useful for anyone interested to get a current data snapshot of the marathon performances and furthermore as basis for enthusiasts and journalists interested in data sports analytics.
For this study, I had to scrape the website of World of Athletics, the organization that provides the marathon ranking lists. The data in original form can be found here. The parsed data can be found here at Kaggle webpage.
Parsing and preparing the data provided was a little challenging, wince I needed to loop over all the marathon ranking lists organized by month-date and sex. For each ranking list I also had to loop over all the pages since the ranking was split into a table of 50 rows per page. All the data result files of the World Athletics ranking list over the past 4 years (January 2019 - June 2023) is saved as comma-separated text files. After a second analysis at the ranking lists I could also find some stats about the races considered to compute the ranking score. I could extract the race description, the date of the event and the race type (marathon (42km) or half-marathon (21km)).
The data scraping notebook can be found following this link:
Data Dictionary for worldathletics/RANKINGDATE_SEX_WORLDATHLETICS_MARATHON_RANKINGS.csv
rank,competitor,dob,nat,score,events,competitor_id,sex,rank_date
Variable | Definition | Key | Notes |
---|---|---|---|
rank | Position in the World Athletics Marathon Ranking list | 1,2,3.. | Integer |
competitor | Name of the Athlete | Joshua Eliud, ... | |
dob | Birth date ... |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Evaluation of the most visited health websites in the world