8 datasets found
  1. Horse Racing Results 2017-2020

    • kaggle.com
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bogdan Doicin (2023). Horse Racing Results 2017-2020 [Dataset]. https://www.kaggle.com/datasets/bogdandoicin/horse-racing-results-2017-2020
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Kaggle
    Authors
    Bogdan Doicin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset presents all the characteristics of the horses that raced in Honk Kong, between 2017 and 2020. The data was taken from the Hong Kong Jockey Club website.

    The meaning of the columns:

    1. Date: Date of the race. In Hong Kong there is only one race day, per day (racing two days a week). Bear in mind that the racing season starts in September and ends in July.
    2. Track: The track the race was ran on, in Hong Kong that is either Sha Tin or Happy Valley. This is of importance, as horses often like Happy Valley or Sha Tin better than the other. Sha Tin is the main track.
    3. Race number: The race number (a race day consists of 8-11 races).
    4. Distance: The distance the particular race was ran at, in meters. Some horses are experts at sprint distances (e.g 1200 meters), some at middle distance (e.g 1600 meters), and some at longer distances (e.g 2000 m). Trainers could have specialities, too.
    5. Surface: If the race was ran at Turf track (grass), or dirt track (AW - All Weather), which is a sand-based surface. This has predictive power - some horses prefer turf racing; some horses prefer dirt racing. Trainers could have specialities here, too.
    6. Prize money. This is the total amount of prize money in the race. The higher the prize money, the better the race.
    7. Starting position: The start gate number/post position, high numbers are drawn "wide" while low numbers are drawn to the inside. This is important, as high start gate numbers will correlate with ground loss (a lot of "paths"), because there is an increased chance of getting a position outside of other horses in the turns. The exception is the distance 1000 metres at Sha Tin, at this distance there are no turns and high numbers are usually not a bad thing.
    8. Jockey: Who rode the horse in the race. Some jockeys win a lot more races than others.
    9. Jockey weight: The weight of the jockey How much a jockey should (minimum) weigh in a given race is not a coincidence, it's based upon rules. Low weight (e.g 50 kg) represents racing against better horses with some weight advantages, while a high weight (e.g 60 kg) represents facing slower horses, but at a penalty.
    10. Country: Where the horse was born.
    11. Age: The age of the horse at the time of the race. A horse peaks at about 4 to 5 years old in average. Younger horses could improve more, older horses might get slower with age.
    12. Trainer Name: The name of the trainer. A trainer is obviously important. How good they are could be calculated with a winning-% (wins/starts*100), but one could also calculate ROI based upon odds. There could also be hidden patterns based on age, distance, surface, the form of stable mates in the time period, etc.
    13. Race time: The time of the race for the particular horse, in seconds.
    14. Path: Is a measure of how wide each horse has been in the turn(s). A higher number means more ground loss due to wide position in the turns, i.e they have not ran the shortest way possible.
    15. Final place: The finishing position in a race. (i.e 1st, 3rd, 4th etc)
    16. FGrating: It's a way to normalize race times, so that it measures the quickness of the race regardless of which track, which distance, or the conditions at the race day. A way to normalize how fast a horse ran.
    17. Odds: The odds the horse went off at in the market, i.e the probability of victory. This is important, as obviously lower odds correspond with a better finishing position in general.
    18. Race Type: Mostly a distinguishing between "handicap races", where the horses do not carry the same jockey weight, and "non-handicap" - where the horses carry the same jockey weight and the fastest horse most often wins.
    19. HorseId: Just an ID of the horse.
    20. JockeyId: Just an ID of the jockey.
    21. TrainerID: Just an ID of the trainer.
  2. Horse Racing

    • kaggle.com
    Updated Dec 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikolay Kashavkin (2020). Horse Racing [Dataset]. https://www.kaggle.com/hwaitt/horse-racing/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikolay Kashavkin
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    This dataset contains data of horse racings from 1990 till 2020.

    Content

    There are two different file types, races and horses, one pair for each year from 1990. I hope to update the current year data on a regular basis.

    races_* columns description:

    rid - Race id; course - Course of the race, country code in brackets, AW means All Weather, no brackets means UK; time - Time of the race in hh:mm format, London TZ; date - Date of the race; title - Title of the race; rclass - Race class; band - Band; ages - Ages allowed distance - Distance; condition - Surface condition; hurdles - Hurdles, their type and amount; prizes - Places prizes; winningTime - Best time shown; prize - Prizes total (sum of prizes column); metric - Distance in meters; countryCode - Country of the race; ncond - condition type (created from condition feature); class - class type (created from rclass feature).

    horses_* columns description:

    rid - Race id; horseName - Horse name; age - Horse age; saddle - Saddle # where horse starts; decimalPrice - 1/Decimal price; isFav - Was horse favorite before start? Can be more then one fav in a race; trainerName - Trainer name; jockeyName - Jockey name; position - Finishing position, 40 if horse didn't finish; positionL - how far a horse has finished from the pursued horse, horses corpses; dist - how far a horse has finished from a winner, horses corpses; weightSt - Horse weight in St; weightLb - Horse weight in Lb; overWeight - Overweight code; outHandicap - Handicap; headGear - Head gear code; RPR - RP Rating; TR - Topspeed; OR - Official Rating father - Horse's Father name; mother - Horse's Mother name; gfather - Horse's Grandfather name; runners - Runners total; margin - Sum of decimalPrices for the race; weight - Horse weight in kg; res_win - Horse won or not; res_place - Horse placed or not

    forward.csv contains information collected prior a race starts. The odds are averages from from Oddschecker.com, RPRc and TRc also have current values.

    Note

    Please be aware, the prices provided are the SP (starting prices), and they are not available before race starts. This means prices before start may differ from SP. But usually favorites stay the same, and prices on them often higher then SP. Anyway you can't predict profit with accuracy based only on SP prices.

    Inspiration

    I suppose prediction of horse racing results by machine learning methods is a difficult task. There is no any highly correlated features, the outcome classes are imbalanced. I tried to make my own predictions, but with no luck. I hope to get some inspirations from your research. Please, share your experience with everyone or just with me. Thank you!

    Disclaimer

    The data provided has been collected from public open websites, without sign-ups, log-ins and other restrictions from sources. Please, do not use this data for any commercial purposes.

  3. d

    Korea Racing Association racing record information

    • data.go.kr
    json+xml
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Korea Racing Association racing record information [Dataset]. https://www.data.go.kr/en/data/15058305/openapi.do
    Explore at:
    json+xmlAvailable download formats
    Dataset updated
    May 16, 2025
    License

    https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do

    Description

    Korea Racing Authority provides information on races held at racecourses in Seoul, Busan, Gyeongnam, and Jeju. (The information provided includes the name of the racecourse, race date, race day, race number, number of race days, race distance, grade conditions, burden classification, race conditions, age conditions, conditions by prize, weather, course, race name, 1st place prize, 2nd place prize, 3rd place prize, 4th place prize, 5th place prize, additional prize 1, additional prize 2, additional prize 3 data, as well as information on participating horses and rankings, starting number, horse name, English horse name, horse number, nationality, age, gender, burden weight, rating (grade), jockey name, English jockey name, jockey number, trainer name, English trainer name, trainer number, owner name, English owner name, owner number, race record, horse weight, and record data by race route section.) - If nothing is entered as a request variable among race year/race year month/race date, information for the past month based on the most recent race date is displayed.

  4. f

    Mean displacement minima and maxima based on collated stride data.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau (2023). Mean displacement minima and maxima based on collated stride data. [Dataset]. http://doi.org/10.1371/journal.pone.0257820.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mean displacement minima and maxima based on collated stride data.

  5. f

    Summary of output from linear mixed models: F and significance (p) values.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau (2023). Summary of output from linear mixed models: F and significance (p) values. [Dataset]. http://doi.org/10.1371/journal.pone.0257820.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of output from linear mixed models: F and significance (p) values.

  6. f

    Number of gallop runs analysed for each shoe-surface combination.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau (2023). Number of gallop runs analysed for each shoe-surface combination. [Dataset]. http://doi.org/10.1371/journal.pone.0257820.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of gallop runs analysed for each shoe-surface combination.

  7. f

    Estimated marginal means and confidence intervals for shoe effects.

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau (2023). Estimated marginal means and confidence intervals for shoe effects. [Dataset]. http://doi.org/10.1371/journal.pone.0257820.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Estimated marginal means and confidence intervals for shoe effects.

  8. f

    Estimated marginal means and confidence intervals for surface effects.

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau (2023). Estimated marginal means and confidence intervals for surface effects. [Dataset]. http://doi.org/10.1371/journal.pone.0257820.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kate Horan; Kieran Kourdache; James Coburn; Peter Day; Henry Carnall; Dan Harborne; Liam Brinkley; Lucy Hammond; Sean Millard; Bryony Lancaster; Thilo Pfau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Estimated marginal means and confidence intervals for surface effects.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bogdan Doicin (2023). Horse Racing Results 2017-2020 [Dataset]. https://www.kaggle.com/datasets/bogdandoicin/horse-racing-results-2017-2020
Organization logo

Horse Racing Results 2017-2020

Hong Kong horse racing results.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2023
Dataset provided by
Kaggle
Authors
Bogdan Doicin
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset presents all the characteristics of the horses that raced in Honk Kong, between 2017 and 2020. The data was taken from the Hong Kong Jockey Club website.

The meaning of the columns:

  1. Date: Date of the race. In Hong Kong there is only one race day, per day (racing two days a week). Bear in mind that the racing season starts in September and ends in July.
  2. Track: The track the race was ran on, in Hong Kong that is either Sha Tin or Happy Valley. This is of importance, as horses often like Happy Valley or Sha Tin better than the other. Sha Tin is the main track.
  3. Race number: The race number (a race day consists of 8-11 races).
  4. Distance: The distance the particular race was ran at, in meters. Some horses are experts at sprint distances (e.g 1200 meters), some at middle distance (e.g 1600 meters), and some at longer distances (e.g 2000 m). Trainers could have specialities, too.
  5. Surface: If the race was ran at Turf track (grass), or dirt track (AW - All Weather), which is a sand-based surface. This has predictive power - some horses prefer turf racing; some horses prefer dirt racing. Trainers could have specialities here, too.
  6. Prize money. This is the total amount of prize money in the race. The higher the prize money, the better the race.
  7. Starting position: The start gate number/post position, high numbers are drawn "wide" while low numbers are drawn to the inside. This is important, as high start gate numbers will correlate with ground loss (a lot of "paths"), because there is an increased chance of getting a position outside of other horses in the turns. The exception is the distance 1000 metres at Sha Tin, at this distance there are no turns and high numbers are usually not a bad thing.
  8. Jockey: Who rode the horse in the race. Some jockeys win a lot more races than others.
  9. Jockey weight: The weight of the jockey How much a jockey should (minimum) weigh in a given race is not a coincidence, it's based upon rules. Low weight (e.g 50 kg) represents racing against better horses with some weight advantages, while a high weight (e.g 60 kg) represents facing slower horses, but at a penalty.
  10. Country: Where the horse was born.
  11. Age: The age of the horse at the time of the race. A horse peaks at about 4 to 5 years old in average. Younger horses could improve more, older horses might get slower with age.
  12. Trainer Name: The name of the trainer. A trainer is obviously important. How good they are could be calculated with a winning-% (wins/starts*100), but one could also calculate ROI based upon odds. There could also be hidden patterns based on age, distance, surface, the form of stable mates in the time period, etc.
  13. Race time: The time of the race for the particular horse, in seconds.
  14. Path: Is a measure of how wide each horse has been in the turn(s). A higher number means more ground loss due to wide position in the turns, i.e they have not ran the shortest way possible.
  15. Final place: The finishing position in a race. (i.e 1st, 3rd, 4th etc)
  16. FGrating: It's a way to normalize race times, so that it measures the quickness of the race regardless of which track, which distance, or the conditions at the race day. A way to normalize how fast a horse ran.
  17. Odds: The odds the horse went off at in the market, i.e the probability of victory. This is important, as obviously lower odds correspond with a better finishing position in general.
  18. Race Type: Mostly a distinguishing between "handicap races", where the horses do not carry the same jockey weight, and "non-handicap" - where the horses carry the same jockey weight and the fastest horse most often wins.
  19. HorseId: Just an ID of the horse.
  20. JockeyId: Just an ID of the jockey.
  21. TrainerID: Just an ID of the trainer.
Search
Clear search
Close search
Google apps
Main menu