https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Prevalence of Self-Reported Physical Inactivity by Race/Ethnicity, BRFSS, 2017–2020
The content of this dataset reveals valuable information about prevalence of self-reported physical inactivity among US adults by race/ethnicity
Content source: Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion
This dataset helped me to get more insights in order to analyze FitBit Fitness Tracker Data notebook for my Bellabeat Analysis
This dataset is from the 2013 California Dietary Practices Survey of Adults. This survey has been discontinued. Adults were asked a series of eight questions about their physical activity practices in the last month. These questions were borrowed from the Behavior Risk Factor Surveillance System. Data displayed in this table represent California adults who met the aerobic recommendation for physical activity, as defined by the 2008 U.S. Department of Health and Human Services Physical Activity Guidelines for Americans and Objectives 2.1 and 2.2 of Healthy People 2020.
The California Dietary Practices Surveys (CDPS) (now discontinued) was the most extensive dietary and physical activity assessment of adults 18 years and older in the state of California. CDPS was designed in 1989 and was administered biennially in odd years up through 2013. The CDPS was designed to monitor dietary trends, especially fruit and vegetable consumption, among California adults for evaluating their progress toward meeting the 2010 Dietary Guidelines for Americans and the Healthy People 2020 Objectives. For the data in this table, adults were asked a series of eight questions about their physical activity practices in the last month. Questions included: 1) During the past month, other than your regular job, did you participate in any physical activities or exercise such as running, calisthenics, golf, gardening or walking for exercise? 2) What type of physical activity or exercise did you spend the most time doing during the past month? 3) How many times per week or per month did you take part n this activity during the past month? 4) And when you took part in this activity, for how many minutes or hours did you usually keep at it? 5) During the past month, how many times per week or per month did you do physical activities or exercises to strengthen your muscles? Questions 2, 3, and 4 were repeated to collect a second activity. Data were collected using a list of participating CalFresh households and random digit dial, approximately 1,400-1,500 adults (ages 18 and over) were interviewed via phone survey between the months of June and October. Demographic data included gender, age, ethnicity, education level, income, physical activity level, overweight status, and food stamp eligibility status. Data were oversampled for low-income adults to provide greater sensitivity for analyzing trends among our target population.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to the Wikipedia, an ultramarathon, also called ultra distance or ultra running, is any footrace longer than the traditional marathon length of 42.195 kilometres (26 mi 385 yd). Various distances are raced competitively, from the shortest common ultramarathon of 31 miles (50 km) to over 200 miles (320 km). 50k and 100k are both World Athletics record distances, but some 100 miles (160 km) races are among the oldest and most prestigious events, especially in North America.}
The data in this file is a large collection of ultra-marathon race records registered between 1798 and 2022 (a period of well over two centuries) being therefore a formidable long term sample. All data was obtained from public websites.
Despite the original data being of public domain, the race records, which originally contained the athlete´s names, have been anonymized to comply with data protection laws and to preserve the athlete´s privacy. However, a column Athlete ID has been created with a numerical ID representing each unique runner (so if Antonio Fernández participated in 5 races over different years, then the corresponding race records now hold his unique Athlete ID instead of his name). This way I have preserved valuable information.
The dataset contains 7,461,226 ultra-marathon race records from 1,641,168 unique athletes.
The following columns (with data types) are included:
The Event name column include country location information that can be derived to a new column, and similarly seasonal information can be found in the Event dates column beyond the Year of event (these can be extracted with a bit of processing).
The Event distance/length column describes the type of race, covering the most popular UM race distances and lengths, and some other specific modalities (multi-day, etc.):
Additionally, there is information of age, gender and speed (in km/h) in other columns.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains a (mostly) complete set of results from marathons across the United States and Canada in 2024.
The dataset is restricted to races with more than 200 finishers. Some races are therefore excluded, but they account for a small share of the total number of finishers.
The dataset is also restricted to races that are USATF-certified. Most of the races are road marathons, although some trail races are included. But these are "road-like" trail marathons, where times are similar to the road and can be used for Boston qualifying purposes.
This dataset is similar to the one I created with results from 2023. The two datasets can be combined, but the race names differ in some cases. You'll have to clean up the race names to get them to group correctly.
I initially collected these results to prepare the dataset for the 2026 Boston Marathon Cutoff Time Tracker. I also used it to update my percentile-based age grade calculator, to calculate the average marathon times for each age group, to identify a list of the largest races in the United States, and to support various other analyses.
If time permits, I plan to update this dataset to include additional information about each race - including the location and the weather on race day.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains data of horse racings from 1990 till 2020.
There are two different file types, races and horses, one pair for each year from 1990. I hope to update the current year data on a regular basis.
rid - Race id; course - Course of the race, country code in brackets, AW means All Weather, no brackets means UK; time - Time of the race in hh:mm format, London TZ; date - Date of the race; title - Title of the race; rclass - Race class; band - Band; ages - Ages allowed distance - Distance; condition - Surface condition; hurdles - Hurdles, their type and amount; prizes - Places prizes; winningTime - Best time shown; prize - Prizes total (sum of prizes column); metric - Distance in meters; countryCode - Country of the race; ncond - condition type (created from condition feature); class - class type (created from rclass feature).
rid - Race id; horseName - Horse name; age - Horse age; saddle - Saddle # where horse starts; decimalPrice - 1/Decimal price; isFav - Was horse favorite before start? Can be more then one fav in a race; trainerName - Trainer name; jockeyName - Jockey name; position - Finishing position, 40 if horse didn't finish; positionL - how far a horse has finished from the pursued horse, horses corpses; dist - how far a horse has finished from a winner, horses corpses; weightSt - Horse weight in St; weightLb - Horse weight in Lb; overWeight - Overweight code; outHandicap - Handicap; headGear - Head gear code; RPR - RP Rating; TR - Topspeed; OR - Official Rating father - Horse's Father name; mother - Horse's Mother name; gfather - Horse's Grandfather name; runners - Runners total; margin - Sum of decimalPrices for the race; weight - Horse weight in kg; res_win - Horse won or not; res_place - Horse placed or not
forward.csv contains information collected prior a race starts. The odds are averages from from Oddschecker.com, RPRc and TRc also have current values.
Please be aware, the prices provided are the SP (starting prices), and they are not available before race starts. This means prices before start may differ from SP. But usually favorites stay the same, and prices on them often higher then SP. Anyway you can't predict profit with accuracy based only on SP prices.
I suppose prediction of horse racing results by machine learning methods is a difficult task. There is no any highly correlated features, the outcome classes are imbalanced. I tried to make my own predictions, but with no luck. I hope to get some inspirations from your research. Please, share your experience with everyone or just with me. Thank you!
The data provided has been collected from public open websites, without sign-ups, log-ins and other restrictions from sources. Please, do not use this data for any commercial purposes.
Abstract copyright UK Data Service and data collection copyright owner.
Centre for Longitudinal Study Information and User Support (CeLSIUS) exists to assist people in UK higher education to analyse the Office for National Statistics Longitudinal Study (ONS LS). CeLSIUS is part of the Economic and Social Research Council's (ESRC) Census Programme for 2006-2011. Part of the service it offers is the provision of web-based tools and extracts, including the subset of the ONS LS.https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Background:
A PIONEER synthetic dataset of 20,000 ethnically diverse hypertrophic cardiomyopathy patients created using CT-GAN generative AI. Data includes clinical & biological phenotyping, co-morbidities, investigations (ECG, ECHO), procedures & outcomes.
Well-created synthetic data establishes a governance risk-free environment for algorithm development & experimentation. This includes evaluating new treatment models, care management systems, clinical decision support, and more. Synthetic data is of particular use in rare diseases, where real data may be in short supply, or to replicate disease in less common patient demographics (e.g. ethnicities).
Familial hypertrophic cardiomyopathy (HCM) is a rare genetic condition characterised by thickening (hypertrophy) of the cardiac muscle, usually of the interventricular septum. Arrhythmias can be life threatening and HCM is associated with an increased risk of sudden death. Some affected individuals develop potentially fatal heart failure, which may require heart transplantation. Approximately 130,000 people have HCM in the UK, but there is a significant burden of undiagnosed disease and diagnostic delay.
Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real world data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Prevalence of Self-Reported Physical Inactivity by Race/Ethnicity, BRFSS, 2017–2020
The content of this dataset reveals valuable information about prevalence of self-reported physical inactivity among US adults by race/ethnicity
Content source: Division of Nutrition, Physical Activity, and Obesity, National Center for Chronic Disease Prevention and Health Promotion
This dataset helped me to get more insights in order to analyze FitBit Fitness Tracker Data notebook for my Bellabeat Analysis