Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In association football, predicting the likelihood and outcome of a shot at a goal is useful but challenging. Expected goal (xG) models can be used in a variety of ways including evaluating performance and designing offensive strategies. This study proposed a novel framework that uses the events preceding a shot, to improve the accuracy of the expected goals (xG) metric. A combination of previously explored and unexplored temporal features is utilized in the proposed framework. The new features include; “advancement factor”, and “player position column”. A random forest model was used, which performed better than published single-event-based models in the literature. Results further demonstrated a significant improvement in model performance with the inclusion of preceding event information. The proposed framework and model enable the discovery of event sequences that improve xG, which include; opportunities built up from the sides of the 18-yard box, shots attempted from in front of the goal within the opposition’s 18-yard box, and shots from successful passes to the far post.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test data results for comparison between expected goals statistic and traditional metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
XG is a dataset for object detection tasks - it contains Objects annotations for 340 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The AUC ROC for the optimal model in this research used test data, and used players’ FIFA ratings as a proxy for player ability.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by ShyamSUBEDI
Released under MIT
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides detailed information on football (soccer) shots, capturing various contextual and technical aspects of each attempt. It is designed for sports analytics, machine learning models, and tactical analysis. It was created with the objective to generate a basic xG model.
expected-goals-thesis
A repository for analysis on Expected Goals using StatsBomb and Wyscout data.
StatsBomb data
This repository assumes that the StatsBomb open-data has already been cloned to a local directory.
Versioning
The original thesis was run from a particular version of the data and mplsoccer (my football plotting library). The original code is here:… See the full description on the dataset page: https://huggingface.co/datasets/fadhilra101/xg-thesis.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Premier League Players Performance Dataset
This dataset provides a comprehensive overview of player performance in the Premier League capturing a wide array of metrics related to gameplay, scoring, passing, and defensive actions. With records detailing individual player statistics across different teams, this dataset is a valuable resource for analysts, data scientists, and fans who are interested in diving into player performance data from one of the world’s top soccer leagues.
Each entry represents a single player's profile, featuring data on expected goals (xG), expected assists (xAG), touches, dribbles, tackles, and more. This dataset is ideal for analyzing various aspects of player contribution, both offensively and defensively, and understanding their impact on team performance.
Dataset Columns
Player: Name of the player Team: Team the player belongs to '#' : Player's jersey number Nation: Nationality of the player Position: Primary playing position on the field Age: Age of the player Minutes: Total minutes played Goals: Number of goals scored Assists: Number of assists Penalty Shoot on Goal: Penalty shots taken on goal Penalty Shoot: Total penalty shots attempted Total Shoot: Total shots attempted Shoot on Target: Shots successfully on target Yellow Cards: Number of yellow cards received Red Cards: Number of red cards received Touches: Total ball touches Dribbles: Total dribbles attempted Tackles: Total tackles made Blocks: Total blocks Expected Goals (xG): Expected goals, calculated based on shooting positions and likelihood of scoring Non-Penalty xG (npxG): Expected goals excluding penalties Expected Assists (xAG): Expected assists, based on actions leading to an expected goal (xG) Shot-Creating Actions: Actions leading to a shot attempt Goal-Creating Actions: Actions leading to a goal Passes Completed: Successful passes completed Passes Attempted: Total passes attempted Pass Completion %: Pass completion rate, expressed as a percentage (some entries have missing values here) Progressive Passes: Passes advancing the ball significantly toward the opponent’s goal Carries: Total ball carries Progressive Carries: Carries advancing the ball significantly toward the opponent’s goal Dribble Attempts: Total dribbles attempted Successful Dribbles: Total successful dribbles Date: Date of record collection or game date
Potential Use Cases
Data Visualization: Explore relationships between various performance metrics to identify patterns.
Player Comparisons: Compare individual players based on goals, assists, xG, xAG, and other metrics.
Team Analysis: Evaluate contributions of players within the same team to gain insights into team dynamics.
Predictive Modeling: Use the dataset to build models for predicting game outcomes, goals, or assists based on player performance metrics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
including location-based services (LBSs) and efficient network management. However
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
League positions resulting in specific consequences for teams in each league.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the measurements (S-parameters and farfield patterms) of the prototype discussed in Chapter 4 of the PhD thesis Advanced Electromagnetic Modelling of the Next Generation (XG) Wireless Communication Systems.
this dataset contains shot statistics and xG values of shots taken during the match between arsenal vs nottingham forest on 12.08.2023
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Most publicly available football (soccer) statistics are limited to aggregated data such as Goals, Shots, Fouls, Cards. When assessing performance or building predictive models, this simple aggregation, without any context, can be misleading. For example, a team that produced 10 shots on target from long range has a lower chance of scoring than a club that produced the same amount of shots from inside the box. However, metrics derived from this simple count of shots will similarly asses the two teams.
A football game generates hundreds of events and it is very important and interesting to take into account the context in which those events were generated. This incredibly rich data set should keep football analytics enthusiasts awake for long hours as the size of the data set and number of questions that can be asked is huge.
There are 4 main files containing the data: 1) Competition data: Contains information regarding competetion id, competition name, season id, season name, country and gender.
2)Match data: Match information for each match including competition and season information, stadium and referee information, home and away team information as well as the data version the match was collected under.
3) Lineup data: Records the lineup information for the players, managers and referees involved with each match. The following variables are collected in the lineups of each match - team id, team name and lineup. The lineup array is a nested data frame inside of the lineup object, the lineup array contains the following information for each team- player id, player name, player nickname, jersey number and country
4) Event data: Event Data comprises of general attributes and event specific attributes. General attributes are recorded for most event types, depending only on applicability. Event specific attributes help describe the event type in more detail as well as describe the outcome of the event type.
The open data specification document in the doc folder describes the structure of the data along with all attributes in great detail. Take a look at this file for deeper understanding of the data.
This data is from the StatsBomb Open Data repository. StatsBomb are committed to sharing new data and research publicly to enhance understanding of the game of Football. They want to actively encourage new research and analysis at all levels. Therefore they have made certain leagues of StatsBomb Data freely available for public use for research projects and genuine interest in football analytics.
There are many many questions we can ask with such detailed event data. Here are just a few examples: What is the value of a shot? Or what is the probability of a shot being a goal given it's location, shooter, league, assist method, gamestate, number of players on the pitch, time - known as expected goals (xG) models When are teams more likely to score? Which teams are the best or sloppiest at holding the lead? Which teams or players make the best use of set pieces? How do players compare when they shoot with their week foot versus strong foot? Or which players are ambidextrous? Identify different styles of plays (shooting from long range vs shooting from the box, crossing the ball vs passing the ball, use of headers) Which teams have a bias for attacking on a particular flank?
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains all the data used in the paper, titled 'On-chip topological beamformer for multi-link terahertz 6G to XG wireless'.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A comparative analysis of DC, CTGAN-DC, XGBoost, CTGAN-XG, and TVAE-XG models in Kawasaki Disease experiments.
DATA REQUESTED IN JUNE 1985 BY YGS. DATA REQUESTED IN JUNE 1985 BY YGS. DATA SUPPLIED BY AUTHORS IN SEP 1985. FIRST MEASUREMENT OF INTERFERENCE STRUCTURE FUNCTION XG3(X) SCATTERING POSITIVE AND NEGATIVE MUONS OFF CARBON TARGET. BCDMS (NA4) COLLABORATION. Q*2 IN THERANGE 40 TO 180 GEV**2. WARNING(VVE-94): SOURCE OF THE NUMERICAL DATA UNKNOWN, DATA ON XG3(X) FROM THE SAME EXPERIMENT SUPPLIED BY AUTHORS SEE IN PART=1 OF THE RECORD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present study intended to determine the nationality of the fastest 100-mile ultra-marathoners and the country/events where the fastest 100-mile races are held. A machine learning model based on the XG Boost algorithm was built to predict the running speed from the athlete’s age (Age group), gender (Gender), country of origin (Athlete country) and where the race occurred (Event country). Model explainability tools were then used to investigate how each independent variable influenced the predicted running speed. A total of 172,110 race records from 65,392 unique runners from 68 different countries participating in races held in 44 different countries were used for analyses. The model rates Event country (0.53) as the most important predictor (based on data entropy reduction), followed by Athlete country (0.21), Age group (0.14), and Gender (0.13). In terms of participation, the United States leads by far, followed by Great Britain, Canada, South Africa, and Japan, in both athlete and event counts. The fastest 100-mile races are held in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. The fastest athletes come mostly from Eastern European countries (Lithuania, Latvia, Ukraine, Finland, Russia, Hungary, Slovakia) and also Israel. In contrast, the slowest athletes come from Asian countries like China, Thailand, Vietnam, Indonesia, Malaysia, and Brunei. The difference among male and female predictions is relatively small at about 0.25 km/h. The fastest age group is 25–29 years, but the average speeds of groups 20–24 and 30–34 years are close. Participation, however, peaks for the age group 40–44 years. The model predicts the event location (country of event) as the most important predictor for a fast 100-mile race time. The fastest race courses were occurred in Romania, Israel, Switzerland, Finland, Russia, the Netherlands, France, Denmark, Czechia, and Taiwan. Athletes and coaches can use these findings for their race preparation to find the most appropriate racecourse for a fast 100-mile race time.
this dataset contains shot statistics of arsenal players for the season 2023/24
Player: The name of the player.
Nation: The name or abbreviation of the player's country.
Pos: The position the player plays (e.g., Forward, Midfielder, Defender).
Age: The age of the player.
90s: The number of minutes the player has played in terms of 90-minute units. This represents the total amount of time played converted into 90-minute segments.
Gls: The total number of goals scored by the player.
Sh: The total number of shots taken by the player.
SoT: The number of shots on target by the player (shots that are on goal).
SoT%: The percentage of shots on target by the player. Calculation: (SoT / Sh) * 100.
Sh/90: The average number of shots per match by the player. Calculation: Sh / 90s.
SoT/90: The average number of shots on target per match by the player. Calculation: SoT / 90s.
G/Sh: The average number of goals per shot by the player. Calculation: Gls / Sh.
G/SoT: The average number of goals per shot on target by the player. Calculation: Gls / SoT.
Dist: The average distance of the player’s shots (typically measured from the goal).
FK: The number of goals scored from free kicks by the player.
PK: The number of goals scored from penalties by the player.
PKatt: The number of penalties taken by the player.
xG: The expected goals of the player. This metric measures the probability of a shot resulting in a goal, accounting for the quality of the chances.
npxG: The expected goals of the player excluding penalties. This calculates xG without considering penalty shots.
npxG/Sh: The ratio of expected goals excluding penalties per shot by the player. Calculation: npxG / Sh.
G-xG: The difference between the goals scored and the expected goals. Calculation: Gls - xG. This shows how much more or less the player has scored compared to the expected goals.
Champions League 2023/2024 Dataset
Overview
This dataset provides detailed statistics for the UEFA Champions League 2023/2024 season, focusing on team performance across various metrics. The data is sourced from FBref, a comprehensive platform for football statistics. This single-table dataset includes metrics such as matches played, wins, losses, goals scored, expected goals (xG), and more for each team participating in the Champions League.
The dataset is structured as a single CSV file with the following headers:
Data Source
The data has been scraped from FBref, a well-known source for football statistics. FBref provides detailed and historical data for various football competitions worldwide, including the UEFA Champions League.
Acknowledgements
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This data presents details of projects undertaken pursuant to the Streamlining Order XG/XO-100-2012. This Order provides the Canada Energy Regulator’s approval for the construction of certain classes of oil and gas projects regulated under the Canada Energy Regulator Act. The data ranges from 2003 to current; it is updated annually.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In association football, predicting the likelihood and outcome of a shot at a goal is useful but challenging. Expected goal (xG) models can be used in a variety of ways including evaluating performance and designing offensive strategies. This study proposed a novel framework that uses the events preceding a shot, to improve the accuracy of the expected goals (xG) metric. A combination of previously explored and unexplored temporal features is utilized in the proposed framework. The new features include; “advancement factor”, and “player position column”. A random forest model was used, which performed better than published single-event-based models in the literature. Results further demonstrated a significant improvement in model performance with the inclusion of preceding event information. The proposed framework and model enable the discovery of event sequences that improve xG, which include; opportunities built up from the sides of the 18-yard box, shots attempted from in front of the goal within the opposition’s 18-yard box, and shots from successful passes to the far post.