Data Sources :- The following data sources were used for this model: - Player attributes - FIFA 16-21 data - Injury history - Transfermarkt injury history data. Pulled and scraped from there using worldfootballR R package
Players/seasons in scope :- - Original scope was all players who have played in the British Premier League at any point between 2016/17 season and 2020/21 season - Due to complications and difficulties in joining 3 datasets from entirely different sources, this came out to a total of 685 rows of data, consisting of 317 players
Training Data :- - 3 separate data sources were combined to create a datset which included player attributes (i.e. - pace, height, weight), player injury history and player game time - Data was grouped on a player-year level
By Throwback Thursday [source]
This dataset provides comprehensive information on injuries that occurred in the National Football League (NFL) during the period from 2012 to 2017. The dataset includes details such as the type of injury sustained by players, the specific situation or event that led to the injury, and the type of season (regular season or playoffs) during which each injury occurred.
The Injury Type column categorizes the various types of injuries suffered by players, providing insights into specific anatomical areas or specific conditions. For example, it may include injuries like concussions, ankle sprains, knee ligament tears, shoulder dislocations, and many others.
The Scenario column offers further granularity by describing the specific situation or event that caused each injury. It can provide context about whether an injury happened during a tackle, collision with another player or object on field (such as goalposts), blocking maneuvers gone wrong, falls to the ground resulting from being off-balance while making plays, and other possible scenarios leading to player harm.
The Season Type column classifies when exactly each injury occurred within a particular year. It differentiates between regular season games and playoff matches – identifying whether an incident took place during high-stakes postseason competition or routine games throughout the regular season.
The Injuries column represents numeric data detailing how many times a particular combination of year-injury type-scenario-season type has occurred within this dataset's timeframe – measuring both occurrence frequency and severity for each unique combination.
Overall, this extensive dataset provides valuable insight into NFL injuries over a six-year span. By understanding which types of injuries are most prevalent under certain scenarios and during different seasons of play - such as regular seasons versus playoffs - stakeholders within professional football can identify potential areas for improvement in safety measures and develop strategies aimed at reducing player harm on-field
The dataset contains six columns:
Year: This column represents the year in which the injury occurred. It allows you to filter and analyze data based on specific years.
Injury Type: This column indicates the specific type of injury sustained by players. It includes various categories such as concussions, fractures, sprains, strains, etc.
Scenario: The scenario column describes the situation or event that led to each injury. It provides context for understanding how injuries occur during football games.
Season Type: This column categorizes injuries based on whether they occurred during regular season games or playoff games.
Injuries: The number of injuries recorded for each specific combination of year, injury type, scenario, and season type is mentioned in this column's numeric values.
Using this dataset effectively involves several steps:
Data Exploration: Start by examining all available columns carefully and making note of their meanings and data types (categorical or numeric).
Filtering Data by Year or Season Type: If you are interested in analyzing injuries during a particular year(s) or specific seasons (regular vs playoffs), apply filters accordingly using either one or both these columns respectively.
3a. Analyzing Injury Types: To gain insights into different types of reported injuries over time periods specified by your filters (e.g., a given year), group data based on Injury Type and calculate aggregate statistics like maximum occurrences or average frequency across years/seaso
3b.Scenario-based Analysis:/frequency across years/seasons. Group the data based on Scenario and calculate aggregate values to determine which situations or events lead to more injuries.
Exploring Injury Trends: Explore the overall trend of injuries throughout the 2012-2017 period to identify any significant patterns, spikes, or declines in injury occurrence.
Visualizing Data: Utilize appropriate visualization techniques such as bar graphs, line charts, or pie charts to present your findings effectively. These visualizations will help you communicate your analysis concisely and provide clear insights into both common injuries and specific scenarios.
Drawing Conclusions: Based on your analysis of the
- Understanding trends in NFL injuries: This dataset can be used to analyze the number and types of in...
Current approaches regarding injury prevention focus on the transfer of evidence into daily practice. One promising approach is to influence attitudes and beliefs of players. The objective of this study was to record player's perceptions on injury prevention. A survey was performed among players of one German high-level football (soccer) club. 139 professional and youth players between age 13 and 35 years completed a standardized questionnaire (response rate = 98%). It included categories with (1) history of lower extremity injuries, (2) perceptions regarding risk factors and (3) regularly used prevention strategies. The majority of players (84.2%) had a previous injury. 47.5% of respondents believe that contact with other players is a risk factor, followed by fatigue (38.1%) and environmental factors (25.9%). The relevance of previous injuries as a risk factor is differently perceived between injured (25%) and uninjured players (0.0%). Nearly all players (91.5%) perform stretching to p...
BackgroundFootball (soccer) is endorsed as a health-promoting physical activity worldwide. When football programs are introduced as part of general health promotion programs, equal access and limitation of pre-participation disparities with regard to injury risk are important. The aim of this study was to explore if disparity with regard to parents’ educational level, player body mass index (BMI), and self-reported health are determinants of football injury in community-based football programs, separately or in interaction with age or gender. Methodology/Principal FindingsFour community football clubs with 1230 youth players agreed to participate in the cross-sectional study during the 2006 season. The study constructs (parents’ educational level, player BMI, and self-reported health) were operationalized into questionnaire items. The 1-year prevalence of football injury was defined as the primary outcome measure. Data were collected via a postal survey and analyzed using a series of hierarchical statistical computations investigating associations with the primary outcome measure and interactions between the study variables. The survey was returned by 827 (67.2%) youth players. The 1-year injury prevalence increased with age. For youths with parents with higher formal education, boys reported more injuries and girls reported fewer injuries than expected; for youths with lower educated parents there was a tendency towards the opposite pattern. Youths reporting injuries had higher standardized BMI compared with youths not reporting injuries. Children not reporting full health were slightly overrepresented among those reporting injuries and underrepresented for those reporting no injury. ConclusionPre-participation disparities in terms of parents’ educational level, through interaction with gender, BMI, and self-reported general health are associated with increased injury risk in community-based youth football. When introduced as a general health promotion, football associations should adjust community-based youth programs to accommodate children and adolescents with increased pre-participation injury risk.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains comprehensive data from 800 Chinese university football players participating in collegiate and provincial leagues. The goal is to predict whether a player will suffer an injury in the next academic season using machine learning classification methods.
Injury_Next_Season: Binary classification where injury is defined as training/competition-related injury causing ≥7 consecutive days of absence, verified by university medical center and coaching staff.
This dataset bridges sports science and machine learning, offering insights into university-level athletic injury prediction. It's particularly valuable for researchers in sports medicine, preventive healthcare, and applied machine learning.
This dataset is intended for academic research and educational purposes. Please respect data privacy and usage guidelines.
http://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
GRoin Injury Prevention (GRIP) studyProject·
GRIP > project about groin injuries in Dutch
elite soccer players.
·
Groin injury > one of the most common
injuries in soccer, long rehabilitation period, high risk of sustaining a
recurrent injury.
·
Therefore > important to study the
prevention and rehabilitation of groin injuries.Research
·
Objective = to get more insight in the risk
factors and the treatment of a groin injury.
·
3 research questions:
How many soccer players will get injured during one soccer season?
What are the risk factors for sustaining a groin injury?
Which treatment is the most effective for
players with a groin injury?Data·
Participants: approx. 300 players of professional
soccer clubs (first + second division in the Netherlands).
·
Start of the season: screening of all participants
questionnaire (to collect baseline characteristics of the players + HAGOS), measurements/physical tests (ROM hip, strength and performance tests).
·
During the season (2015-2016): training
exposure will be recorded by a member of the medical staff. Also information
about groin injuries in their team will be registered (details about the injury* and the recovery period) and repeated measurements.· Fileset contains 6 SPSS-files (*.sav) and details about the protocols used.· Because of the sensitive nature of the data, the fileset is confidential and will be shared only under strict conditions. For more information contact researchdata-kcbsv@hva.nl.
This environment contains the data and the R code. The data set consists of time series from psychological and physiological self-reports as well as GPS sensors, across two consecutive football season of multiple individual professional youth players. More detailed information about what kind of data we collected and how can be retrieved from the article.
Background - Non-contact anterior cruciate ligament injuries (NC-ACLI) are becoming more common in football due to the involvement of actions such as cutting, landing, pivoting, changing directions, and jumping that expose the knee to angular stresses, rotational motions, and anterior translation force. These risk factors are addressed in the current injury prevention programs(IPP) but have shown to be less effective in further lowering NC-ACLI risk. The missing piece in these IPPs is their inability to target neurocognitive risk factors when a relationship between cognitive competencies and the mechanism of NC-ACLI is evidenced. This gap serves as the basis for the review. Purpose – To map the literature that has investigated the application of neurocognitive training (NCT) for the prevention of NC-ACLI in footballers. Study Design – Scoping Review Methods – A broad eligibility criteria was applied through population, concept, and context framework. Restrictions were applied to languag..., A literature search was conducted in approximately 24 physiotherapy databases, 10 sports sciences and rehabilitation databases and 5 neuroscience databases including SPORTDiscus, COCHRANE, PEDro, SCOPUS, PubMed, and MEDLINE. Keywords like ‘anterior cruci* ligament ’, ‘neurocogni’ training’ or ‘cogniti* intervention’, ‘football’ or ‘soccer’, and ‘injury prevent*’ were utilised. In an attempt to retrieve most literature on the topic, each neurocognitive components listed in DSM-V criteria were used as search words. Additionally, the types of neurocognitive approaches applied to other sports were used as search terms replacing ‘neurocognitive’ like Visuomotor Reaction Time, Self-Talk, Attention changing, Stress Management, Mindfulness, and Imagery, not to miss any literature. Grey literature was searched for on Google Scholar and Nottingham University (NU Search) using the same search terms. This search was conducted to look for all the relevant evidence published or unpublished from th..., , # A Neurocognitive Approach to Prevent Non-Contact Anterior Cruciate Ligament Injury in Football - A Scoping Review.
https://doi.org/10.5061/dryad.kwh70rzcc
List of Abbreviations
JBI | Joanna Briggs Institute |
---|---|
OSF | Open Science Framework |
FIFA | Fédération Internationale de Football Association |
ACL | Anterior Cruciate Ligament |
ACLI | ACL injury |
IPP | Injury Prevention Programs ... |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Anthropometric data of football players with a head injury determined by video analysis.
In the 2023/24 season, the opening month of the Premier League saw the most player injuries, with a total of 100 across all clubs. Meanwhile, peaks were also seen in November and December 2023.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
ABSTRACT Objective: To determine the incidence and risk factors for injuries that occurred during the matches of the Brazilian Football Championship. Methods: A prospective study was carried out with the collection of data referring to injuries that occurred during the 2019 Brazilian Football Championship. The injuries were recorded by the responsible physician of each team, through an online injury mapping system. Results: Among the 645 athletes who were included in the study, 214 (33.2%) of the players had at least one injury during the tournament. In total, 257 injuries were recorded during the Brazilian Championship, with an average of 0.68 injuries per game. 59.1% of the injured athletes were over 26 years old. The most common type of injury was muscle strain (37.7%) and forwards were the most affected (33.6%). Conclusion: Muscle injuries were the most frequent in the tournament, with the thigh muscles being the most affected. Most of the affected players were over 26 years old, there were 20.5 injuries for every 1000 hours of play and the incidence of injuries was approximately 33%, with attackers being the most affected (33.6%). Level of Evidence III, Study of nonconsecutive patients; without consistently applied reference “gold” standard.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Coronavirus Disease-19 (COVID-19) pandemic forced the Norwegian male premier league football season to reschedule, reducing the fixture calendar substantially. Previous research has shown that a congested match schedule can affect injury rates in professional football. Therefore, we aimed to investigate whether the Norwegian premier league teams suffered more injuries in the more match congested 2020 season than in the regular 2019-season. We invited all teams having participated in both seasons to export their injury data. Only teams that used the same medical staff to register injuries in both seasons were included, and to maximize data comparability between seasons, we applied a time-loss injury definition only. Seven of 13 teams agreed to participate and exported their injury data. Both seasons had 30 game weeks, but the 2020 season was 57 days shorter than the 2019 season. The match injury incidence did not differ significantly [incidence rate ratio 0.76 (0.48–1.20; p = 0.24) in the 2020 season compared to the 2019 season. Furthermore, we found no differences in the number of injuries, days lost to injury, matches missed to injury, or injury severity. We could not detect any differences between the two seasons, suggesting the congested match calendar combined with the safety measures in the 2020 season can be a safe alternative in future seasons.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The player tracking market, valued at $5.4987 billion in 2025, is experiencing robust growth driven by the increasing adoption of advanced analytics in professional and amateur sports. Teams and coaches are leveraging data-driven insights to optimize player performance, prevent injuries, and enhance strategic decision-making during games. The integration of wearable technology, such as GPS trackers and sensors embedded in apparel, provides real-time data on player movement, speed, acceleration, and physiological metrics. This detailed information allows for personalized training programs, improved player recruitment strategies, and more effective in-game adjustments. Furthermore, the rising popularity of esports and the growing demand for enhanced fan engagement through interactive data visualization are contributing significantly to market expansion. Technological advancements, including improved sensor accuracy, longer battery life, and advanced data analytics platforms, are further fueling market growth. Despite the significant growth potential, the market faces certain challenges. High initial investment costs associated with technology acquisition and infrastructure development can pose a barrier to entry for smaller teams and leagues. Data privacy concerns and the need for robust data security measures also play a role. However, ongoing technological innovation and the increasing affordability of player tracking systems are gradually mitigating these restraints. We project a healthy CAGR for the next few years, driven by factors discussed above. The market segmentation is diverse, with key players like Zebra Technologies, Catapult Sports, and STATSports offering a range of solutions tailored to specific sports and performance needs. The competitive landscape is expected to remain dynamic, with ongoing innovations and strategic partnerships shaping the future of the player tracking market.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global football analysis software market size was valued at USD 1.2 billion in 2023 and is projected to reach USD 3.2 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.2% during the forecast period. This significant growth is primarily driven by the increasing adoption of data analytics in sports for enhancing team performance, player efficiency, and strategic planning. The integration of advanced technologies such as machine learning (ML) and artificial intelligence (AI) into football analysis software is facilitating more precise and actionable insights, further fueling market expansion.
One of the key growth factors contributing to the expansion of the football analysis software market is the rising investment in sports technology by professional clubs and sports academies. These investments are aimed at utilizing sophisticated software solutions to analyze match data, evaluate player performance, and optimize team tactics. The growing popularity of football worldwide and the increasing competitiveness among clubs are compelling teams to adopt innovative technologies that can provide a competitive edge. Additionally, the proliferation of wearable technology and IoT devices is generating vast amounts of data, which can be analyzed using football analysis software to derive valuable insights.
Moreover, the increasing focus on player safety and injury prevention is driving the demand for football analysis software. By analyzing players' movements and physical conditions during training and matches, coaches and medical staff can identify the risk of injuries and implement preventive measures. The software can also assist in managing players' workload and providing personalized training programs, thereby enhancing overall player health and performance. This emphasis on player welfare and performance optimization is significantly contributing to the market's growth.
The growing trend of video analysis in sports is another crucial factor propelling the football analysis software market. Video analysis tools enable coaches to breakdown match footage, analyze key moments, and communicate strategies effectively to players. These tools are not only used for performance analysis but also for talent scouting and recruitment. By evaluating players' performances through video analytics, clubs can make informed decisions on player acquisitions and transfers. The integration of 3D simulation and augmented reality (AR) technologies in video analysis is further enhancing the capabilities of football analysis software.
In the realm of sports technology, Sports Graphics have emerged as a pivotal tool for enhancing the visual representation of data in football analysis software. These graphics are instrumental in transforming complex data sets into easily digestible visual formats, such as heat maps, player movement trails, and tactical diagrams. By employing Sports Graphics, coaches and analysts can effectively communicate strategies and insights to players, making it easier to understand and implement tactical adjustments. The integration of dynamic and interactive graphics into football analysis software is not only improving the clarity of data presentation but also enhancing the overall user experience. As the demand for visually engaging and informative content grows, Sports Graphics are set to play an increasingly important role in the evolution of football analysis tools.
Regionally, North America and Europe are leading the market due to the presence of advanced sports infrastructure and high adoption rates of sports technology. In North America, the United States has emerged as a significant market for football analysis software, driven by the increasing focus on soccer and the presence of major sports tech companies. Europe, with its rich football heritage and technologically advanced clubs, is also witnessing substantial growth. The Asia Pacific region is expected to grow at the highest CAGR during the forecast period, attributed to the rising popularity of football, increasing investments in sports infrastructure, and the growing adoption of technology in sports.
The football analysis software market can be segmented by components into software and services. The software segment is further divided into various types o
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Noncontact injuries are prevalent among professional football players. Yet, most research on this topic is retrospective, focusing solely on statistical correlations between Global Positioning System (GPS) metrics and injury occurrence, overlooking the multifactorial nature of injuries. This study introduces an automated injury identification and prediction approach using machine learning, leveraging GPS data and player-specific parameters. A sample of 34 male professional players from a Portuguese first-division team was analyzed, combining GPS data from Catapult receivers with descriptive variables for machine learning models—Support Vector Machines (SVMs), Feedforward Neural Networks (FNNs), and Adaptive Boosting (AdaBoost)—to predict injuries. These models, particularly the SVMs with cost-sensitive learning, showed high accuracy in detecting injury events, achieving a sensitivity of 71.43%, specificity of 74.19%, and overall accuracy of 74.22%. Key predictive factors included the player’s position, session type, player load, velocity and acceleration. The developed models are notable for their balanced sensitivity and specificity, efficiency without extensive manual data collection, and capability to predict injuries for short time frames. These advancements will aid coaching staff in identifying high-risk players, optimizing team performance, and reducing rehabilitation costs.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Digital Twin Sports Injury Prevention market size stood at USD 1.42 billion in 2024 and is projected to reach USD 8.93 billion by 2033, expanding at a robust CAGR of 22.6% during the forecast period. The market’s rapid growth is primarily driven by increasing adoption of advanced analytics and simulation technologies in sports for injury prevention and athlete performance optimization. As the sports industry continues to embrace digital transformation, the integration of digital twin technology is revolutionizing how teams, trainers, and healthcare professionals predict, monitor, and mitigate sports injuries.
One of the most significant growth factors in the Digital Twin Sports Injury Prevention market is the escalating emphasis on athlete health and safety. With professional sports organizations investing heavily in player welfare, the demand for predictive analytics and real-time monitoring has surged. Digital twin technology enables the creation of virtual replicas of athletes, allowing teams to simulate different scenarios and assess injury risks before they occur. This proactive approach not only reduces the incidence of injuries but also extends athletes’ careers and helps organizations avoid substantial financial losses associated with sidelined players. The rising awareness about long-term health effects of sports injuries, especially in high-impact sports such as football and basketball, further accelerates the adoption of digital twin solutions.
Another key driver is the technological advancements in artificial intelligence, machine learning, and IoT sensors. These technologies are foundational to digital twin solutions, providing the data collection, processing, and simulation capabilities necessary for accurate injury prediction and prevention. The continuous improvement in sensor accuracy, data analytics, and cloud computing has enabled real-time monitoring of athletes’ biomechanics, physiological parameters, and environmental factors. This has resulted in more precise modeling of injury risks and personalized training regimens, which are essential for both elite and amateur athletes. Moreover, collaborations between technology providers and sports organizations have fostered innovation, leading to the development of sophisticated, user-friendly digital twin platforms tailored to the unique needs of various sports.
The growing trend of data-driven decision-making in sports management also significantly fuels market expansion. Sports teams, athletic training centers, and healthcare providers are increasingly relying on digital insights to optimize performance and minimize injury risks. The integration of digital twin technology into existing sports management systems enhances the ability to track progress, identify potential health issues, and implement timely interventions. Additionally, the COVID-19 pandemic highlighted the importance of remote monitoring and virtual care, further boosting the adoption of cloud-based digital twin solutions. The convergence of these factors is expected to sustain the market’s high growth trajectory over the coming years.
Regionally, North America leads the Digital Twin Sports Injury Prevention market, accounting for the largest share due to advanced sports infrastructure, high adoption of cutting-edge technologies, and strong presence of leading sports organizations. Europe follows closely, driven by progressive healthcare systems and a strong focus on athlete well-being. The Asia Pacific region is witnessing the fastest growth, supported by rising investments in sports technology and increasing participation in professional sports. Latin America and the Middle East & Africa are also emerging as promising markets, although their adoption rates remain comparatively lower. Overall, the global landscape is characterized by rapid technological advancements and growing awareness of the benefits of digital twin solutions in sports injury prevention.
The Digital Twin Sports Injury Prevention market is segmented by component into software, hardware, and services, each playing a pivotal role in the ecosystem. Software solutions form the backbone of digital twin technology, enabling the creation, simulation, and analysis of virtual athlete models. These platforms integrate data from various sources, including wearables, motion capture sys
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Football Analysis Software Market size was valued at USD 1.20 Billion in 2024 and is projected to reach USD 3.28 Billion by 2032, growing at a CAGR of 13.5% during the forecast period 2026-2032.• Performance Optimization: Player statistics, mobility, and fitness levels are monitored in real time, allowing coaches and trainers to maximize individual and team performance.• Injury Prevention: Workload data and physical measurements are being recorded to help athletes avoid injuries and enhance their health management.• Tactical Analysis: Match footage and situational data are being analyzed to help develop team plans and make tactical judgments.• Talent Scouting: Potential players are assessed using video and data analytics, which improves recruitment accuracy across clubs and academies.• Demand for Data-Driven Coaching: Coaching decisions are aided by tech that delivers actionable information from training and match results.• Adoption by Professional Clubs: Clubs and federations are increasing their investment in advanced analytics tools to meet the high levels of league competitiveness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The return-to-sport (RTS) process is multifaceted and complex, as multiple variables may interact and influence the time to RTS. These variables include intrinsic factors related the player, such as anthropometrics and playing position, or extrinsic factors, such as competitive pressure. Providing an individualised estimation of time to return to play is often challenging, and clinical decision support tools are not common in sports medicine. This study uses epidemiological data to demonstrate a Bayesian Network (BN). We applied a BN that integrated clinical, non-clinical factors, and expert knowledge to classify time day to RTS and injury severity (minimal, mild, moderate and severe) for individual players. Retrospective injury data of 3374 player seasons and 6143 time-loss injuries from seven seasons of the professional German football league (Bundesliga, 2014/2015 through 2020/2021) were collected from public databases and media resources. A total of twelve variables from three categories (player’s characteristics and anthropometrics, match information and injury information) were included. The response variables were 1) days to RTS (1–3, 4–7, 8–14, 15–28, 29–60, > 60, and 2) injury severity (minimal, mild, moderate, and severe). The sensitivity of the model for days to RTS was 0.24–0.97, while for severity categories it was 0.73–1.00. The user’s accuracy of the model for days to RTS was 0.52–0.83, while for severity categories, it was 0.67–1.00. The BN can help to integrate different data types to model the probability of an outcome, such as days to return to sport. In our study, the BN may support coaches and players in 1) predicting days to RTS given an injury, 2) team planning via assessment of scenarios based on players’ characteristics and injury risk, and 3) understanding the relationships between injury risk factors and RTS. This study demonstrates the how a Bayesian network may aid clinical decision making for RTS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Frequency of keywords appearing in football research papers.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Emergency department surveillance of injuries and head injuries associated with baseball, football, soccer and ice hockey, children and youth, ages 5 to 18 years, 2004 to 2014
Data Sources :- The following data sources were used for this model: - Player attributes - FIFA 16-21 data - Injury history - Transfermarkt injury history data. Pulled and scraped from there using worldfootballR R package
Players/seasons in scope :- - Original scope was all players who have played in the British Premier League at any point between 2016/17 season and 2020/21 season - Due to complications and difficulties in joining 3 datasets from entirely different sources, this came out to a total of 685 rows of data, consisting of 317 players
Training Data :- - 3 separate data sources were combined to create a datset which included player attributes (i.e. - pace, height, weight), player injury history and player game time - Data was grouped on a player-year level