Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
I've been recently exploring Microsoft Azure and have been playing this game for the past 4 or so years. I am also a software developer by profession. I did a simple pipeline that gets data from the official Clash Royale API using (Python) Jupyter Notebooks and Azure VMs. I tried searching for public Clash Royale datasets, but the ones I saw don't quite have that much data from my perspective, so I decided to create one for the whole community.
I started pulling in the data at the beginning of the month of December until season 18 ended. This covers the season reset last December 07, and the latest balance changes last December 09. This dataset also contains ladder data for the new Legendary card Mother Witch.
The amount of data that I have, with the latest dataset, has ballooned to around 37.9 M distinct/ unique ladder matches that were (pseudo) randomly being pulled from a pool of 300k+ clans. If you think that this is A LOT, this could only be a percent of a percent (even lower) of the real amount of ladder battle data. It still may not reflect the whole population, also, the majority of my data are matches between players of 4000 trophies or more.
I don't see any reason for me not to share this to the public as the data is now considerably large that working on it and producing insights will take more than just a few hours of "hobby" time to do.
Feel free to use it on your own research and analysis, but don't forget to credit me.
Also, please don't monetize this dataset.
Stay safe. Stay healthy.
Happy holidays!
Card Ids Master List is in the discussion, I also created a simple notebook to load the data and made a sample n=20 rows, so you can get an idea on what the fields are.
With this data, the following can possibly be answered 1. Which cards are the strongest? The weakest? 2. Which win-con is the most winning? 3. Which cards are always with a specific win-con? 4. When 2 opposing players are using maxed decks, which win-con is the most winning? 5. Most widely used cards? Win-Cons? 6. What are the different metas in different arenas and trophy ranges? 7. Is ladder matchmaking algorithm rigged? (MOST CONTROVERSIAL)
(and many more)
I have 2 VMs running a total of 14 processes, and for each of these processes, I've divided a pool of 300k+ clans into the same number of groups. This went on 24/7, non-stop for the whole season. Each process will then randomize the list of clans it is assigned to and will iterate through each clan, and get that clan's members' ladder data. It is important to note that I also have a pool of 470 hand-picked clans that I always get data from, as these clans were the starting point that eventually enabled me to get the 300k+ clans. There are clans who have minimal ladder data, there are some clans who have A LOT.
To prevent out of memory exceptions, as my VMs are not really that powerful (I'm using Azure free credits), I've put on a time and limit of battles extracted per member.
My account: https://royaleapi.com/player/89L2CLRP My clan: https://royaleapi.com/clan/J898GQ
Thank you to SUPERCELL for creating this FREEMIUM game that has tested countless people's patience, as well as the durability of countless mobile devices after being smashed against a wall, and thrown on the floor.
Thank you to Microsoft for Azure and free monthly credits
Thank you to Python and Jupyter notebooks.
Thank you Kaggle for hosting this dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
I've been recently exploring Microsoft Azure and have been playing this game for the past 4 or so years. I am also a software developer by profession. I did a simple pipeline that gets data from the official Clash Royale API using (Python) Jupyter Notebooks and Azure VMs. I tried searching for public Clash Royale datasets, but the ones I saw don't quite have that much data from my perspective, so I decided to create one for the whole community.
I started pulling in the data at the beginning of the month of December until season 18 ended. This covers the season reset last December 07, and the latest balance changes last December 09. This dataset also contains ladder data for the new Legendary card Mother Witch.
The amount of data that I have, with the latest dataset, has ballooned to around 37.9 M distinct/ unique ladder matches that were (pseudo) randomly being pulled from a pool of 300k+ clans. If you think that this is A LOT, this could only be a percent of a percent (even lower) of the real amount of ladder battle data. It still may not reflect the whole population, also, the majority of my data are matches between players of 4000 trophies or more.
I don't see any reason for me not to share this to the public as the data is now considerably large that working on it and producing insights will take more than just a few hours of "hobby" time to do.
Feel free to use it on your own research and analysis, but don't forget to credit me.
Also, please don't monetize this dataset.
Stay safe. Stay healthy.
Happy holidays!
Card Ids Master List is in the discussion, I also created a simple notebook to load the data and made a sample n=20 rows, so you can get an idea on what the fields are.
With this data, the following can possibly be answered 1. Which cards are the strongest? The weakest? 2. Which win-con is the most winning? 3. Which cards are always with a specific win-con? 4. When 2 opposing players are using maxed decks, which win-con is the most winning? 5. Most widely used cards? Win-Cons? 6. What are the different metas in different arenas and trophy ranges? 7. Is ladder matchmaking algorithm rigged? (MOST CONTROVERSIAL)
(and many more)
I have 2 VMs running a total of 14 processes, and for each of these processes, I've divided a pool of 300k+ clans into the same number of groups. This went on 24/7, non-stop for the whole season. Each process will then randomize the list of clans it is assigned to and will iterate through each clan, and get that clan's members' ladder data. It is important to note that I also have a pool of 470 hand-picked clans that I always get data from, as these clans were the starting point that eventually enabled me to get the 300k+ clans. There are clans who have minimal ladder data, there are some clans who have A LOT.
To prevent out of memory exceptions, as my VMs are not really that powerful (I'm using Azure free credits), I've put on a time and limit of battles extracted per member.
My account: https://royaleapi.com/player/89L2CLRP My clan: https://royaleapi.com/clan/J898GQ
Thank you to SUPERCELL for creating this FREEMIUM game that has tested countless people's patience, as well as the durability of countless mobile devices after being smashed against a wall, and thrown on the floor.
Thank you to Microsoft for Azure and free monthly credits
Thank you to Python and Jupyter notebooks.
Thank you Kaggle for hosting this dataset.