7 datasets found

race_ethnicity
kaggle.com
Updated Feb 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parvaneh Khosravi Zadeh (2022). race_ethnicity [Dataset]. https://www.kaggle.com/datasets/parvanekhosravizade/race-ethnicity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Parvaneh Khosravi Zadeh
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
Context

Prevalence of Self-Reported Physical Inactivity by Race/Ethnicity, BRFSS, 2017–2020

Content

The content of this dataset reveals valuable information about prevalence of self-reported physical inactivity among US adults by race/ethnicity

Acknowledgements

Content source: Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion

Inspiration

This dataset helped me to get more insights in order to analyze FitBit Fitness Tracker Data notebook for my Bellabeat Analysis
California Adults Who Met Physical Activity Guidelines for Americans, 2013
data.chhs.ca.gov
data.ca.gov
+3more
csv, xlsx, zip
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). California Adults Who Met Physical Activity Guidelines for Americans, 2013 [Dataset]. https://data.chhs.ca.gov/dataset/california-adults-who-met-physical-activity-guidelines-for-americans-2013
Explore at:
csv, xlsx, zipAvailable download formats
Dataset updated
Aug 28, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Area covered
California
Description
This dataset is from the 2013 California Dietary Practices Survey of Adults. This survey has been discontinued. Adults were asked a series of eight questions about their physical activity practices in the last month. These questions were borrowed from the Behavior Risk Factor Surveillance System. Data displayed in this table represent California adults who met the aerobic recommendation for physical activity, as defined by the 2008 U.S. Department of Health and Human Services Physical Activity Guidelines for Americans and Objectives 2.1 and 2.2 of Healthy People 2020.

The California Dietary Practices Surveys (CDPS) (now discontinued) was the most extensive dietary and physical activity assessment of adults 18 years and older in the state of California. CDPS was designed in 1989 and was administered biennially in odd years up through 2013. The CDPS was designed to monitor dietary trends, especially fruit and vegetable consumption, among California adults for evaluating their progress toward meeting the 2010 Dietary Guidelines for Americans and the Healthy People 2020 Objectives. For the data in this table, adults were asked a series of eight questions about their physical activity practices in the last month. Questions included: 1) During the past month, other than your regular job, did you participate in any physical activities or exercise such as running, calisthenics, golf, gardening or walking for exercise? 2) What type of physical activity or exercise did you spend the most time doing during the past month? 3) How many times per week or per month did you take part n this activity during the past month? 4) And when you took part in this activity, for how many minutes or hours did you usually keep at it? 5) During the past month, how many times per week or per month did you do physical activities or exercises to strengthen your muscles? Questions 2, 3, and 4 were repeated to collect a second activity. Data were collected using a list of participating CalFresh households and random digit dial, approximately 1,400-1,500 adults (ages 18 and over) were interviewed via phone survey between the months of June and October. Demographic data included gender, age, ethnicity, education level, income, physical activity level, overweight status, and food stamp eligibility status. Data were oversampled for low-income adults to provide greater sensitivity for analyzing trends among our target population.
The big dataset of ultra-marathon running
kaggle.com
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David (2023). The big dataset of ultra-marathon running [Dataset]. https://www.kaggle.com/datasets/aiaiaidavid/the-big-dataset-of-ultra-marathon-running
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
David
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
According to the Wikipedia, an ultramarathon, also called ultra distance or ultra running, is any footrace longer than the traditional marathon length of 42.195 kilometres (26 mi 385 yd). Various distances are raced competitively, from the shortest common ultramarathon of 31 miles (50 km) to over 200 miles (320 km). 50k and 100k are both World Athletics record distances, but some 100 miles (160 km) races are among the oldest and most prestigious events, especially in North America.}

The data in this file is a large collection of ultra-marathon race records registered between 1798 and 2022 (a period of well over two centuries) being therefore a formidable long term sample. All data was obtained from public websites.

Despite the original data being of public domain, the race records, which originally contained the athlete´s names, have been anonymized to comply with data protection laws and to preserve the athlete´s privacy. However, a column Athlete ID has been created with a numerical ID representing each unique runner (so if Antonio Fernández participated in 5 races over different years, then the corresponding race records now hold his unique Athlete ID instead of his name). This way I have preserved valuable information.

The dataset contains 7,461,226 ultra-marathon race records from 1,641,168 unique athletes.

The following columns (with data types) are included:

Year of event (int64)

Event dates (object)

Event name (object)

Event distance/length (object)

Event number of finishers (int64)

Athlete performance (object)

Athlete club (object)

Athlete country (object)

Athlete year of birth (float64)

Athlete gender (object)

Athlete age category (object)

Athlete average speed (object)

Athlete ID (int64)

The Event name column include country location information that can be derived to a new column, and similarly seasonal information can be found in the Event dates column beyond the Year of event (these can be extracted with a bit of processing).

The Event distance/length column describes the type of race, covering the most popular UM race distances and lengths, and some other specific modalities (multi-day, etc.):

Distances: 50km, 100km, 50mi, 100mi

Lengths: 6h, 12h, 24h, 48h, 72h, 6d, 10d

Additionally, there is information of age, gender and speed (in km/h) in other columns.
2024 Marathon Results
kaggle.com
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Rock (2025). 2024 Marathon Results [Dataset]. https://www.kaggle.com/datasets/runningwithrock/2024-marathon-results
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 27, 2025
Dataset provided by
Kaggle
Authors
Brian Rock
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains a (mostly) complete set of results from marathons across the United States and Canada in 2024.

The dataset is restricted to races with more than 200 finishers. Some races are therefore excluded, but they account for a small share of the total number of finishers.

The dataset is also restricted to races that are USATF-certified. Most of the races are road marathons, although some trail races are included. But these are "road-like" trail marathons, where times are similar to the road and can be used for Boston qualifying purposes.

This dataset is similar to the one I created with results from 2023. The two datasets can be combined, but the race names differ in some cases. You'll have to clean up the race names to get them to group correctly.

I initially collected these results to prepare the dataset for the 2026 Boston Marathon Cutoff Time Tracker. I also used it to update my percentile-based age grade calculator, to calculate the average marathon times for each age group, to identify a list of the largest races in the United States, and to support various other analyses.

If time permits, I plan to update this dataset to include additional information about each race - including the location and the weather on race day.
Horse Racing
kaggle.com
Updated Dec 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikolay Kashavkin (2020). Horse Racing [Dataset]. https://www.kaggle.com/datasets/hwaitt/horse-racing/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2020
Dataset provided by
Kaggle
Authors
Nikolay Kashavkin
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Context

This dataset contains data of horse racings from 1990 till 2020.

Content

There are two different file types, races and horses, one pair for each year from 1990. I hope to update the current year data on a regular basis.

races_* columns description:

rid - Race id; course - Course of the race, country code in brackets, AW means All Weather, no brackets means UK; time - Time of the race in hh:mm format, London TZ; date - Date of the race; title - Title of the race; rclass - Race class; band - Band; ages - Ages allowed distance - Distance; condition - Surface condition; hurdles - Hurdles, their type and amount; prizes - Places prizes; winningTime - Best time shown; prize - Prizes total (sum of prizes column); metric - Distance in meters; countryCode - Country of the race; ncond - condition type (created from condition feature); class - class type (created from rclass feature).

horses_* columns description:

rid - Race id; horseName - Horse name; age - Horse age; saddle - Saddle # where horse starts; decimalPrice - 1/Decimal price; isFav - Was horse favorite before start? Can be more then one fav in a race; trainerName - Trainer name; jockeyName - Jockey name; position - Finishing position, 40 if horse didn't finish; positionL - how far a horse has finished from the pursued horse, horses corpses; dist - how far a horse has finished from a winner, horses corpses; weightSt - Horse weight in St; weightLb - Horse weight in Lb; overWeight - Overweight code; outHandicap - Handicap; headGear - Head gear code; RPR - RP Rating; TR - Topspeed; OR - Official Rating father - Horse's Father name; mother - Horse's Mother name; gfather - Horse's Grandfather name; runners - Runners total; margin - Sum of decimalPrices for the race; weight - Horse weight in kg; res_win - Horse won or not; res_place - Horse placed or not

forward.csv contains information collected prior a race starts. The odds are averages from from Oddschecker.com, RPRc and TRc also have current values.

Note

Please be aware, the prices provided are the SP (starting prices), and they are not available before race starts. This means prices before start may differ from SP. But usually favorites stay the same, and prices on them often higher then SP. Anyway you can't predict profit with accuracy based only on SP prices.

Inspiration

I suppose prediction of horse racing results by machine learning methods is a difficult task. There is no any highly correlated features, the outcome classes are imbalanced. I tried to make my own predictions, but with no luck. I hope to get some inspirations from your research. Please, share your experience with everyone or just with me. Thank you!

Disclaimer

The data provided has been collected from public open websites, without sign-ups, log-ins and other restrictions from sources. Please, do not use this data for any commercial purposes.
c
Ethnicity and Economic Activity: Longitudinal Perspectives, 1971-2006
datacatalogue.cessda.eu
beta.ukdataservice.ac.uk
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Disney, R. (2024). Ethnicity and Economic Activity: Longitudinal Perspectives, 1971-2006 [Dataset]. http://doi.org/10.5255/UKDA-SN-6416-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-6416-1
Dataset updated
Nov 28, 2024
Dataset provided by
Institute for Fiscal Studies
Authors
Disney, R.
Time period covered
Apr 1, 1971 - Dec 1, 2006
Area covered
England and Wales
Variables measured
Individuals, National
Measurement technique
Face-to-face interview, Postal survey
Description
Abstract copyright UK Data Service and data collection copyright owner.
Centre for Longitudinal Study Information and User Support (CeLSIUS) exists to assist people in UK higher education to analyse the Office for National Statistics Longitudinal Study (ONS LS). CeLSIUS is part of the Economic and Social Research Council's (ESRC) Census Programme for 2006-2011. Part of the service it offers is the provision of web-based tools and extracts, including the subset of the ONS LS.

Further information about CeLSIUS is available from the CeLSIUS web site and the ESRC Award web page.

Ethnicity and Economic Activity: Longitudinal Perspectives, 1971-2006 is an aggregated teaching dataset which has 1,120 records (i.e. combinations of values of the variables). It is based on 333,015 cases; cases are ONS LS sample members who were present at all three of the most recent censuses of England and Wales (1981, 1991 and 2001). The ONS LS sample provides a random, one per cent sample of the population of England and Wales, clustered by date of birth.

The documentation includes exercises to accompany the dataset with instructions for both SPSS and Stata users. Exercises centre on exploring the dataset by ethnicity, age, gender, country of birth and economic activity status.

Main Topics:

The dataset includes the following eight variables:
ethnicity grouped 2001
sex
age group in 2001
whether born outside UK (from 1981 Census)
whether economically active in 1981
whether economically active in 1991
whether economically active in 2001
weight variable
h
Synthetic Dataset- Patients at risk of sudden death: hypertrophic...
healthdatagateway.org
unknown
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic Dataset- Patients at risk of sudden death: hypertrophic cardiomyopathy [Dataset]. https://healthdatagateway.org/en/dataset/186
Explore at:
unknownAvailable download formats
Dataset updated
Feb 29, 2024
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
Background:

A PIONEER synthetic dataset of 20,000 ethnically diverse hypertrophic cardiomyopathy patients created using CT-GAN generative AI. Data includes clinical & biological phenotyping, co-morbidities, investigations (ECG, ECHO), procedures & outcomes.

Well-created synthetic data establishes a governance risk-free environment for algorithm development & experimentation. This includes evaluating new treatment models, care management systems, clinical decision support, and more. Synthetic data is of particular use in rare diseases, where real data may be in short supply, or to replicate disease in less common patient demographics (e.g. ethnicities).

Familial hypertrophic cardiomyopathy (HCM) is a rare genetic condition characterised by thickening (hypertrophy) of the cardiac muscle, usually of the interventricular septum. Arrhythmias can be life threatening and HCM is associated with an increased risk of sudden death. Some affected individuals develop potentially fatal heart failure, which may require heart transplantation. Approximately 130,000 people have HCM in the UK, but there is a significant burden of undiagnosed disease and diagnostic delay.

Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real world data to meet bespoke requirements.

Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Not seeing a result you expected?
Learn how you can add new datasets to our index.