Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.
Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.
The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:
The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.
This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset is completed! Data was updated daily during the Olympic!
You can support the dataset via the upvote button!
The Paris 2024 Olympic Summer Games dataset provides comprehensive information about the Summer Olympics held in 2024. It covers various aspects of the event, including participating countries, athletes, sports disciplines, medal standings, and key event details. More about the Olympic Games on the official site Olympics Paris 2024 and Wiki.
| Table | Description | Note |
|---|---|---|
athletes.csv | personal information about all athletes | released |
coaches.csv | personal information about all coaches | released |
events.csv | all events that had a place | released |
medals.csv | all medal holders | released |
medals_total.csv | all medals (grouped by country) | released |
medalists.csv | all medalists | released |
nocs.csv | all nocs (code, country, country_long ) | released |
schedule.csv | day-by-day schedule of all events | released |
schedule_preliminary.csv | preliminary schedule of all events | released |
teams.csv | all teams | released |
technical_officials.csv | all technical_officials (referees, judges, jury members) | released |
results | all results | released |
torch_route.csv | torch relay places | released |
vanues.csv | all Olympic venues | released |
I am very thankful to Luca Fontana, zenzombie and others for their efforts in helping me to make the dataset better. Luca Fontana did a manual check medalist.csv table and zenzombie cover dataset with tests.
If you have any questions or suggestions please start a discussion.
Facebook
TwitterEvery 4 years I love watching the Summer Olympics. I'm not interested in any sport enough to watch something every week but watching the variety of sport during the olympics is always great. Furthermore, I love seeing the medal rankings constantly change and love looking back at past years to see which countries are doing better or worse each games. Hence, I wanted a dataset of over 100 years of games and all the medals won by each country. Unfortunately, after an admitedly short search through current Kaggle datasets I found no dataset that contained details for all of the countries medals as well as their rank, so I thought I should add this one that I created late last year. Obviously this dataset only contains results up to 2012; for any analyses I do, I hope to combine this data with other datasets available for more recent games.
I cleaned and prepared this data with SQL on Microsoft SSMS, using datasets taken from DataCamp.com (a great website for anyone learning data science in my opinion). I edited the data to update all the details of athletes who had lost their medals (generally due to athletes doping) and those who had gained medals in their place. I also edited the original dataset to account for ever changing IOC codes (for example combining ROC (Republic of China) and PRC (Peoples Republic of China) with CHN (China)), and adding former countries to DataCamp's Countries dataset (eg East and West Germany) in order to separate these for analyses of past games. Unfortunatly, I didn't add details for all former countries (eg Bohemia and Netherlands Antilles) so some of the results do have NULL for the country. The dataset in its current form contains the ranking and number of gold, silver, bronze and total medals for each country, each year. This data may not exactly equate to what can be found online but it is correct to the best of my ability, at least for the more recent years.
As I've already said, the original datasets are from DataCamp.com as part of their SQL courses (the specific tables used are SummerOlympics and Countries). Datacamp was fundamental for me to learn SQL and I am still using it to learn more. This data was then edited using information primarily found via wikipedia (in regards to IOC codes and athletes losing medals), though to be clear I did cross check multiple wikipedia pages and if information did not align then I checked other sources.
The original reason I put together this dataset was to understand whether being the host country affected how well you did (as it seemed to do in Tokyo) but I am now also interested in how countries ranikings have changed over time, or perhaps whether a countries gold medal count is correlated with their silver medal count? I would love to hear of other potential analyses of this data
Facebook
TwitterIn the history of the Summer Olympics, the United States has been the most successful nation ever, with a combined total of 2,761 medals won across 29 Olympic Games. More than one thousand of these were gold, with almost 900 silver medals, and nearly 800 bronze medals. Emerging nations While European and Anglophone nations have traditionally dominated the medals tables, recent decades have seen the emergence and increased participation from athletes representing developing nations. One nation in particular has enjoyed a great deal of success in recent years, with China having won over 700 medals in the Summer Olympics, despite only having taken part in 12 Games. How big of a problem is doping at the Olympics? In recent history, one of the biggest threats to the reputation of the Olympics has been the issue of doping. On record, the worst Olympics in terms of medals stripped came in 2008, when 50 medals in total were lost by disqualified athletes. Meanwhile, Russia ranked as the country with the highest number of Olympic medals stripped as of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Changes in attitudes toward the Olympic Games at the individual level.
Facebook
TwitterThis data package contains tabular data collected during 2013-2016 for the determination of the status of reintroduced fishers on the Olympic Peninsula. The fisher, Pekania pennanti, once occupied coniferous forests at low to middle elevations throughout much of the Western United States, but was extirpated from Washington State during the last century. The fisher was listed as a State endangered species in October 1998. In 2006 Washington State developed a Fisher Recovery Plan, with a goal of establishing multiple self-sustaining fisher populations in Washington. In 2007, the NPS and WDFW completed a Fisher Reintroduction Plan and Environmental Assessment for Olympic National Park. The goal of that effort was to restore fishers to Olympic National Park (ONP) and Washington State. The project was designed to take up to 10 years to complete, and to be conducted in two phases. During Phase 1, 90 fishers were translocated from central British Columbia to the Olympic Peninsula from 2008 to 2010, and the initial success of the reintroduction was monitored by radio-tracking translocated fishers (2008–2011). Data were collected on post-release survival, movements, home-range establishment, and reproduction. Initial findings indicate that survival was highly variable among release years. In addition, access constraints in a large wilderness area prevented the reliable determination of breeding success for most of the released females, creating additional uncertainties about the current status of reintroduced fishers on the Olympic Peninsula. The need for a second monitoring phase, consisting of non-invasive surveys of fisher distribution, was identified in both the State and Federal fisher recovery planning efforts. The goal of Phase 2 of the fisher monitoring in the Olympic Recovery Area was to evaluate the status of reintroduced fishers on the Olympic Peninsula from 2013–2016. A fisher occupancy study was conducted from 2013 to 2016 on the Olympic Peninsula to evaluate the success of the reintroduction of 90 fishers from 2008 to 2010. The objectives of the study were to determine the current fisher distribution, the proportion of the recovery area currently occupied, and the genetic characteristics and reproductive success of the fisher population, via DNA analyses. The initial findings indicate that fishers are widely distributed across the Olympic Peninsula both inside and outside the recovery area, and the presence of second and third generation fishers indicates substantial reproductive success by founder individuals and their descendants. Data within this package include data for sampling locations, sample site visits, photo results, and genetic data.
Facebook
TwitterThis dataset has information about over 180,000 people who have been part of the Olympic Games, including athletes, referees, and others. For each person, you can find details like their name, gender, nationality, birthdate, the sport they took part in, and physical stats like height and weight. It also includes links to their official Olympedia profiles. The data covers many years and sports, making it a great source to explore changes and trends in athlete backgrounds, performance, and participation over time.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Speed skating has been featured as a sport in the Winter Olympics since the first winter games in 1924. Women's events were added to the Olympic program for the first time in 1960 Squaw Valley Olympics. Though some variation between men and women, races typically of the following events:
This dataset contains the top 3 athletes of each game on each event of every Winter Olympics to date.
Facebook
TwitterThroughout modern Olympic history, the 100 meter sprint is generally regarded as the most high profile and popular event of each Summer Games. The men's event has been included in every Olympics, while the women's event has been included since 1928. Athletes from the United States have won both events more than any other nation, with sixteen victories in the men's race and nine in the women's, although Jamaica has emerged as a sprinting superpower since the millennium. World's fastest man The only athlete to ever win three Olympic golds in the 100m sprint was Jamaica's Usain Bolt, who also set the current world record of 9.58 seconds in 2009. In 2016, Bolt even became the first athlete to ever win a "triple-triple" in sprinting, by claiming gold in the 100m, 200m and 4x100m in three consecutive Olympics; however his 2008 gold medal in the 4x100m was rescinded in 2017 when a teammate tested positive for banned substances. Despite this, Bolt is widely considered to be the greatest sprinter of all time, with eight gold medals to his name, winning every Olympic final in which he participated. World's fastest woman The fastest woman of all time was Florence Griffith-Joyner, whose world record of 10.49 seconds has stood since 1988. "Flo-Jo" also set an Olympic record of 10.62 seconds in 1988 (often given as 10.52, but with a wind assistance of +1.0 seconds); this record stood for 33 years before Elaine Thompson Herah topped it by 0.01 seconds at the Tokyo 2020 Games.
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
TwitterDataset containing country list with typical people names in the country (regardless of ethnicity).
• Country Code • Country • Name of athlete • Sport
Pandas and Python web scraping was used to generate data from the following sites: • Tennis ATP and WTA (https://live-tennis.eu/en/wta-live-ranking) • Badminton league (https://bwfbadminton.com/) • Olympic athletes Rio 2016 Summer Olympics
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterThis is a historical complete dataset of Winter Olympic medalists in Biathlon from Grenoble 1960 to Beijing 2022. I scraped the data from Olympic. I made this small dataset because I couldn't find any data about this amazing and my favorite sports competition. You can find the simple analysis here.
The table biathlon_medals.csv is really small and contains only 286 rows and 9 columns. Each row corresponds to an individual athlete or team that won in a biathlon Olympic event. All team events represent a country and do not include members of teams. It means that we can count athlete medals only in individual races.
You can find a description of each column in a relative place for it.
There are some dataset features:
- military patrol medals (1924) have not been included;
- 2 silver medals and no bronze were awarded in the 2010 men's individual distance;
- 2 silver medals at the 2014 Olympics were stripped and they are not redistributed (not included).
Dataset was created on 17 September 2018. Dataset was updated on 17 April 2022.
This dataset provides an opportunity to ask questions about the participation and performance of men, women, teams, countries, and events in biathlon.
Inspired by Ukraine Biathlon Team - the winner of the Gold medal in Sochi 2014, I used the winner team picture for a cover image (from left to right: Valj Semerenko, Juliya Dzhyma, Olena Pidhrushna, and Vita Semerenko).
Facebook
TwitterThe dataset contains a list of World Championships matches from 1989 to 2024 as well as matches from the 2006 to 2022 Winter Olympics.
For all matches the following is recorded: the tournament, the tournament round, the date and time the match started, the teams, the final score and the points representing the final result. The most complete data are called data_with_og.csv.
I created this dataset as I could not find any similar and easily available on the internet. However, I would like to thank the people from samizdat.cz for their advice and tips that made it easier to create
The main source of data was supposed to be the official IIHF website, but I soon found its limits. Easily accessible and retrievable data is only for World Championships since 2016. I eventually got the older data and also the data for the Olympic Games from the only sufficiently comprehensive data source from which it is not too difficult to download the data - Wikipedia.
For the overlapping years with the IIHF website (2014 and 2015), I checked that the data matched. For older years it is not in my power to double-check all the data from wikipedia, so I point out possible errors in the dataset. But I hope I have not introduced any errors in my download procedure. You can check the whole dataset creation process in my notebook (I'll be glad for suggestions to improve it, because currently it's still a bit of a mess especially in the wikipedia section).
Facebook
TwitterThe data this week comes from Adam Vagnar who also blogged about this dataset. There's a LOT of data here - match-level results, player details, and match-level statistics for some matches. For all this dataset all the matches are played 2 vs 2, so there are columns for 2 winners (1 team) and 2 losers (1 team). The data is relatively ready for analysis and clean, although there are some duplicated columns and the data is wide due to the 2-players per team.
Check out the data dictionary, or Wikipedia for some longer-form details around what the various match statistics mean.
Most of the data is from the international FIVB tournaments but about 1/3 is from the US-centric AVP.
The FIVB Beach Volleyball World Tour (known between 2003 and 2012 as the FIVB Beach Volleyball Swatch World Tour for sponsorship reasons) is the worldwide professional beach volleyball tour for both men and women organized by the Fédération Internationale de Volleyball (FIVB). The World Tour was introduced for men in 1989 while the women first competed in 1992.
Winning the World Tour is considered to be one of the highest honours in international beach volleyball, being surpassed only by the World Championships, and the Beach Volleyball tournament at the Summer Olympic Games.
FiveThirtyEight examined the disadvantage of serving in beach volleyball, although they used Olympic-level data. Again, Adam Vagnar also covered this data on his blog.
TidyTuesday A weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem. However, any code-based methodology is welcome - just please remember to share the code used to generate the results.
Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week we post a raw dataset, a chart or article related to that dataset, and ask you to explore the data. While the dataset will be “tamed”, it will not always be tidy!
We will have many sources of data and want to emphasize that no causation is implied. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our guidelines are to use the data provided to practice your data tidying and plotting techniques. Participants are invited to consider for themselves what nuancing factors might underlie these relationships.
The intent of Tidy Tuesday is to provide a safe and supportive forum for individuals to practice their wrangling and data visualization skills independent of drawing conclusions. While we understand that the two are related, the focus of this practice is purely on building skills with real-world data.
Facebook
TwitterContext The 2022 Commonwealth Games, officially known as the XXII Commonwealth Games and commonly known as Birmingham 2022, is an international multi-sport event for members of the Commonwealth of Nations that is currently taking place in Birmingham, England, from 28 July to 8 August 2022.
Birmingham was announced as host on 21 December 2017, marking England's third time hosting the Commonwealth Games after London 1934 and Manchester 2002, and the 7th Games in the United Kingdom after London and Manchester, Cardiff 1958, Edinburgh 1970 and 1986, and Glasgow 2014.
The Commonwealth Games bring nations together in a colourful celebration of sport and human performance. But the Games have evolved dramatically since its beginnings in 1930.
Held every four years, with a hiatus during World War II, the Games have grown from featuring 11 countries and 400 athletes, to a global spectacle of 4,600 sports men and women from across 72 nations and territories.
Underpinned by the core values of humanity, equality and destiny, the Games aim to unite the Commonwealth family through a glorious festival of sport. Often referred to as the ‘Friendly Games’, the event is renowned for inspiring athletes to compete in the spirit of friendship and fair play.
Some of the most memorable sporting moments in history took place at the Commonwealth Games:
At the 1954 Vancouver Games, Roger Bannister and John Landy became the first people to break the four-minute mile in a race that became known as the ‘Miracle Mile’.
Chantal Petitclerc became the first gold medal winner in a para-sport in 2002. An occasion that marked the first time an event for an athlete with a disability had been part of the official programme.
And women’s boxing became a mainstay of the Commonwealth Games in 2014 with Team England’s Nicola Adams taking the first gold medal in the flyweight division.
The encouraging ethos of the Games has stirred athletes to sprint faster, leap higher and push themselves to the very limits of what the human body is capable of.
The 2022 Games will be the first time West Midlands has played host to the event, following London 1934, and Manchester 2002. As preparations for the Birmingham 2022 Commonwealth Games take shape, the West Midlands become part of a lasting legacy. One that displays world-class teamwork, athleticism and friendship.
Sources The data is sourced from the official games website: https://olympics.com/
Inspiration I have always wanted to work with comprehensive data regarding any big event such as sporting events like the Olympics, Commonwealth Games etc.
Insights Refer this data to get insights like medals won as per sports or event, medals count. You will get the "Commonwealth Games 2022 medals tally dataset" also with the dataset of "Indian medal winners at Commonwealth Games 2022" to get different insights.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The 2022 Commonwealth Games, officially known as the XXII Commonwealth Games and commonly known as Birmingham 2022, is an international multi-sport event for members of the Commonwealth of Nations that is currently taking place in Birmingham, England, from 28 July to 8 August 2022.
Here I present to you some data regarding these games including Medal Standings, Athlete Counts, Event Schedule and more to be updated soon...
The data is sourced from Wikipedia: https://en.wikipedia.org/wiki/2022_Commonwealth_Games and the official games website: https://www.birmingham2022.com/
I have always wanted to work with comprehensive data regarding any big event such as sporting events like the Olympics, Commonwealth Games etc. So I will try to create different kinds of datasets related to CWGXXII and save them here for public use. I will also try to create an EDA notebook soon.
For Medal Standings: https://en.wikipedia.org/wiki/2022_Commonwealth_Games_medal_table.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis.
Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered.
The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:
The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis.
This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.