13 datasets found
  1. Live births, by month

    • www150.statcan.gc.ca
    • ouvert.canada.ca
    • +1more
    Updated Sep 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2025). Live births, by month [Dataset]. http://doi.org/10.25318/1310041501-eng
    Explore at:
    Dataset updated
    Sep 24, 2025
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Number and percentage of live births, by month of birth, 1991 to most recent year.

  2. Data for: World's human migration patterns in 2000-2019 unveiled by...

    • data.niaid.nih.gov
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niva, Venla; Horton, Alexander; Virkki, Vili; Heino, Matias; Kallio, Marko; Kinnunen, Pekka; Abel, Guy J; Muttarak, Raya; Taka, Maija; Varis, Olli; Kummu, Matti (2024). Data for: World's human migration patterns in 2000-2019 unveiled by high-resolution data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7997133
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Wittgenstein Centre for Demography and Global Human Capitalhttp://www.oeaw.ac.at/wic/
    Aalto University
    Authors
    Niva, Venla; Horton, Alexander; Virkki, Vili; Heino, Matias; Kallio, Marko; Kinnunen, Pekka; Abel, Guy J; Muttarak, Raya; Taka, Maija; Varis, Olli; Kummu, Matti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    This dataset provides a global gridded (5 arc-min resolution) detailed annual net-migration dataset for 2000-2019. We also provide global annual birth and death rate datasets – that were used to estimate the net-migration – for same years. The dataset is presented in details, with some further analyses, in the following publication. Please cite this paper when using data.

    Niva et al. 2023. World's human migration patterns in 2000-2019 unveiled by high-resolution data. Nature Human Behaviour 7: 2023–2037. Doi: https://doi.org/10.1038/s41562-023-01689-4

    You can explore the data in our online net-migration explorer: https://wdrg.aalto.fi/global-net-migration-explorer/

    Short introduction to the data

    For the dataset, we collected, gap-filled, and harmonised:

    a comprehensive national level birth and death rate datasets for altogether 216 countries or sovereign states; and

    sub-national data for births (data covering 163 countries, divided altogether into 2555 admin units) and deaths (123 countries, 2067 admin units).

    These birth and death rates were downscaled with selected socio-economic indicators to 5 arc-min grid for each year 2000-2019. These allowed us to calculate the 'natural' population change and when this was compared with the reported changes in population, we were able to estimate the annual net-migration. See more about the methods and calculations at Niva et al (2023).

    We recommend using the data either over multiple years (we provide 3, 5 and 20 year net-migration sums at gridded level) or then aggregated over larger area (we provide adm0, adm1 and adm2 level geospatial polygon files). This is due to some noise in the gridded annual data.

    Due to copy-right issues we are not able to release all the original data collected, but those can be requested from the authors.

    List of datasets

    Birth and death rates:

    raster_birth_rate_2000_2019.tif: Gridded birth rate for 2000-2019 (5 arc-min; multiband tif)

    raster_death_rate_2000_2019.tif: Gridded death rate for 2000-2019 (5 arc-min; multiband tif)

    tabulated_adm1adm0_birth_rate.csv: Tabulated sub-national birth rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

    tabulated_ adm1adm0_death_rate.csv: Tabulated sub-national death rate for 2000-2019 at the division to which data was collected (subnational data when available, otherwise national)

    Net-migration:

    raster_netMgr_2000_2019_annual.tif: Gridded annual net-migration 2000-2019 (5 arc-min; multiband tif)

    raster_netMgr_2000_2019_3yrSum.tif: Gridded 3-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

    raster_netMgr_2000_2019_5yrSum.tif: Gridded 5-yr sum net-migration 2000-2019 (5 arc-min; multiband tif)

    raster_netMgr_2000_2019_20yrSum.tif: Gridded 20-yr sum net-migration 2000-2019 (5 arc-min)

    polyg_adm0_dataNetMgr.gpkg: National (adm 0 level) net-migration geospatial file (gpkg)

    polyg_adm1_dataNetMgr.gpkg: Provincial (adm 1 level) net-migration geospatial file (gpkg) (if not adm 1 level division, adm 0 used)

    polyg_adm2_dataNetMgr.gpkg: Communal (adm 2 level) net-migration geospatial file (gpkg) (if not adm 2 level division, adm 1 used; and if not adm 1 level division either, adm 0 used)

    Files to run online net migration explorer

    masterData.rds and admGeoms.rds are related to our online ‘Net-migration explorer’ tool (https://wdrg.aalto.fi/global-net-migration-explorer/). The source code of this application is available in https://github.com/vvirkki/net-migration-explorer. Running the application locally requires these two .rds files from this repository.

    Metadata

    Grids:

    Resolution: 5 arc-min (0.083333333 degrees)

    Spatial extent: Lon: -180, 180; -90, 90 (xmin, xmax, ymin, ymax)

    Coordinate ref system: EPSG:4326 - WGS 84

    Format: Multiband geotiff; each band for each year over 2000-2019

    Units:

    Birth and death rates: births/deaths per 1000 people per year

    Net-migration: persons per 1000 people per time period (year, 3yr, 5yr, 20yr, depending on the dataset)

    Geospatial polygon (gpkg) files:

    Spatial extent: -180, 180; -90, 83.67 (xmin, xmax, ymin, ymax)

    Temporal extent: annual over 2000-2019

    Coordinate ref system: EPSG:4326 - WGS 84

    Format: gkpk

    Units:

    Net-migration: persons per 1000 people per year

  3. 'Climate Just' data - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jun 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2025). 'Climate Just' data - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/climate-just-data
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    The 'Climate Just' Map Tool shows the geography of England’s vulnerability to climate change at a neighbourhood scale. The Climate Just Map Tool shows which places may be most disadvantaged through climate impacts. It aims to raise awareness about how social vulnerability combined with exposure to hazards, like flooding and heat, may lead to uneven impacts in different neighbourhoods, causing climate disadvantage. Climate Just Map Tool includes maps on: Flooding (river/coastal and surface water) Heat Fuel poverty. The flood and heat analysis for England is based on an assessment of social vulnerability in 2011 carried out by the University of Manchester. This has been combined with national datasets on exposure to flooding, using Environment Agency data, and exposure to heat, using UKCP09 data. Data is available at Middle Super Output Area (MSOA) level across England. Summaries of numbers of MSOAs are shown in the file named Climate Just-LA_summaries_vulnerability_disadvantage_Dec2014.xls Indicators include: Climate Just-Flood disadvantage_2011_Dec2014.xlsx Fluvial flood disadvantage indexPluvial flood disadvantage index (1 in 30 years)Pluvial flood disadvantage index (1 in 100 years)Pluvial flood disadvantage index (1 in 1000 years) Climate Just-Flood_hazard_exposure_2011_Dec2014.xlsx Percentage of area at moderate and significant risk of fluvial floodingPercentage of area at risk of surface water flooding (1 in 30 years)Percentage of area at risk of surface water flooding (1 in 100 years)Percentage of area at risk of surface water flooding (1 in 1000 years) Climate Just-SSVI_indices_2011_Dec2014.xlsx Sensitivity - flood and heatAbility to prepare - floodAbility to respond - floodAbility to recover - floodEnhanced exposure - floodAbility to prepare - heatAbility to respond - heatAbility to recover - heatEnhanced exposure - heatSocio-spatial vulnerability index - floodSocio-spatial vulnerability index - heat Climate Just-SSVI_indicators_2011_Dec2014.xlsx % children < 5 years old% people > 75 years old% people with long term ill-health/disability (activities limited a little or a lot)% households with at least one person with long term ill-health/disability (activities limited a little or a lot)% unemployed% in low income occupations (routine & semi-routine)% long term unemployed / never worked% households with no adults in employment and dependent childrenAverage weekly household net income estimate (equivalised after housing costs) (Pounds)% all pensioner households% households rented from social landlords% households rented from private landlords% born outside UK and IrelandFlood experience (% area associated with past events)Insurance availability (% area with 1 in 75 chance of flooding)% people with % unemployed% in low income occupations (routine & semi-routine)% long term unemployed / never worked% households with no adults in employment and dependent childrenAverage weekly household net income estimate (equivalised after housing costs) (Pounds)% all pensioner households% born outside UK and IrelandFlood experience (% area associated with past events)Insurance availability (% area with 1 in 75 chance of flooding)% single pensioner households% lone parent household with dependent children% people who do not provide unpaid care% disabled (activities limited a lot)% households with no carCrime score (IMD)% area not roadDensity of retail units (count /km2)% change in number of local VAT-based units% people with % not home workers% unemployed% in low income occupations (routine & semi-routine)% long term unemployed / never worked% households with no adults in employment and dependent childrenAverage weekly household net income estimate (Pounds)% all pensioner households% born outside UK and IrelandInsurance availability (% area with 1 in 75 chance of flooding)% single pensioner households% lone parent household with dependent children% people who do not provide unpaid care% disabled (activities limited a lot)% households with no carTravel time to nearest GP by walk/public transport (mins - representative time)% of at risk population (no car) outside of 15 minutes by walk/public transport to nearest GP Number of GPs within 15 minutes by walk/public transport Number of GPs within 15 minutes by car Travel time to nearest hospital by walk/public transport (mins - representative time)Travel time to nearest hospital by car (mins - representative time)% of at risk population outside of 30 minutes by walk/PT to nearest hospitalNumber of hospitals within 30 minutes by walk/public transport Number of hospitals within 30 minutes by car % people with % not home workersChange in median house price 2004-09 (Pounds)% area not green space Area of domestic buildings per area of domestic gardens (m2 per m2)% area not blue spaceDistance to coast (m)Elevation (m)% households with the lowest floor level: Basement or semi-basement% households with the lowest floor level: ground floor% households with the lowest floor level: fifth floor or higher

  4. Data from: Adaptive benefits of group fission: evidence from blue monkeys

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rory Wakeford; Marina Cords (2025). Adaptive benefits of group fission: evidence from blue monkeys [Dataset]. http://doi.org/10.5061/dryad.0cfxpnwbb
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 3, 2025
    Dataset provided by
    Columbia University
    Authors
    Rory Wakeford; Marina Cords
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Permanent group fissions are thought to represent the tipping point at which a group has become too large and therefore splits into two, allowing for an evaluation of the consequences of living in too large a group and if fission can alleviate those costs. We first examined how adult female activity budgets (feeding, moving, resting) differed among periods surrounding (i.e., before and after) multiple fission events, accounting for seasonal variation, and using five mixed-effects beta regression models. We then assessed how rates of agonism differed among periods surrounding these fission events using two negative binomial models, one examining all agonistic interactions and one focusing on agonistic interactions that were lost. Our third analysis used a generalized linear mixed model to investigate a female’s likelihood of conception in a given month, based on her individual characteristics, which post-fission group size she joined, and whether that month fell before vs. after fission, vs. neither. Finally, we used a mixed effects Cox proportional hazards model to evaluate the relationship between infant survival, whether the infant’s mother joined the small vs. large post-fission group, and whether the month in which the infant was born fell before vs. after fission vs. neither. Here we present the three datasets used for these analyses, thus presenting individualized records of both behavioral and life history variables in relation to group fissions. Methods The datasets relate to seven fission events that occurred between 1999 and 2019 in the blue monkey population inhabiting the Kakamega Forest, western Kenya. We used data from all seven fissions for records of female conceptions and infant survival and data from the last five fissions only (2008 to 2019) for records of female behavior, because only these last five fissions occurred while the long-term monitoring protocol included focal animal follows of adult females, which allowed systematic recording of activity. Throughout the study period, a team of trained observers monitored the study groups for all or part of a day on a near daily basis. All group members could be identified as individuals. Observers documented which individuals were present and whether any sub-grouping occurred, meaning that group members were separated into two parties that traveled and foraged separately for at least part of the day. They also recorded all observed agonistic interactions, noting winners and losers when one and only one animal (the loser) showed submission. Beginning in September 2006, the team also conducted systematic 30-minute focal animal follows of adult females, selecting subjects to maintain even sampling across females and across the morning (until 10:30 AM), midday (10:30 AM-14:30 PM) and afternoon (14:30 and later). During focal follows, observers recorded the subject’s activity at 1-minute intervals: main activity categories included feeding (if the subject ingested food on or within 2 sec of the minute mark), moving (involving hindlimb locomotion), and resting. Observers also noted the food item if the focal subject was feeding and the identity of any social partner. Observers recorded all occurrences of agonistic interactions involving the focal subject during focal follows; agonistic interactions between the same opponents were considered separate events if there was a lull in aggressive behavior for at least 30 seconds. We used the census data to identify periods of sub-grouping. Specifically, we identified a sub-grouping period as when the group was split into spatially distinct parties on at least five days, and consecutive sub-grouping days were less than 14 days apart. We considered a fission to be complete when the two sub-groups had their first aggressive intergroup encounter. We designated four 60-day periods representing different times relative to each sub-grouping period. The earliest period was centered on the day that fell a year before the onset of sub-grouping. The last day of the second period fell immediately (a week) before the onset of sub-grouping, and the first day of the third period fell immediately (a week) after fission was complete. The fourth and latest period was centered on the day that fell one year after the date of fission. We aggregated activity records from focal follows for each female in each of the four periods. We calculated individuals’ activity budgets for each period by dividing the total number of instantaneous records when a female performed a given activity by the total number of instantaneous records when she was a focal subject. We accounted for seasonal variation by calculating a population-wide mean percentage for a given activity for each month using all focal follows from 2006 to 2013. We then calculated the mean during the time of year matching each 60-day analysis period as a weighted mean based on the number of days of each month that matched the analysis period. Finally, we expressed the percentage of a female’s activity budget as a deviation in percentage points from the mean time spent on that activity during the same time of year. To investigate how agonism rates varied by period, we aggregated all agonism that a female experienced during her focal samples in each period, breaking it down into total agonism and agonism losses. Agonistic interactions included aggressive (spatial displacements, threats, chases, contact aggression) and submissive (flee, cower, gecker, trill) behavior. Females did not need to be present in all four periods to be included in either analysis. However, we excluded females that were sampled for less than 6 hours in a given period, as these females were prone to having outlying data values. To analyze likelihood of conception, we focused on females who were adults at any time from October 1997 to December 2022. Females that were already reproductively mature (i.e., had already conceived their first offspring) in October 1997 were included in the dataset beginning that month. Females that matured after October 1997 were added to the dataset starting the month after their first confirmed conception. For females that died during the study period, the last month we included in the dataset was 7 months before their death or the month of their last birth, whichever occurred later. All other females remained in the data set through December 2022. We excluded the month of a female’s first conception because it had missing values for certain predictors, including time since last conception. Conceptions could be confirmed only if an offspring was born, whether it was first seen alive or dead (either stillbirth or peri-natal death). Therefore, the month of a female’s first conception fell 176 days before her first birth of a full-term infant (whether living or stillborn). For one female that had a miscarriage after her first confirmed birth, we omitted all months from seven months before the miscarriage to the month after the subsequent conception (because we could not confirm a value for the time since last conception for these months). We assigned each adult female a monthly reproductive status (pregnant, gave birth, conceived, or non-reproductive). We categorized a female as “pregnant” if she was pregnant the entire month, “gave birth” if she gave birth during that month, “conceived” if she conceived during that month, and “non-reproductive” if no other status applied. We created three categorical variables to assess the influence of fission on probability of conception at six months, one year, and two years. We calculated time since last conception and maternal age to the nearest month. We classified lactation stage as one of five categories based on the age of her most recent surviving infant: 1 (infant age < 5 months), 2 (infant age 5-9 months), 3 (infant age 10-15 months), 4 (infant age 15-32 months), and 5 (infant age > 32 months). We also created an exposure variable that equaled the number of days in each month in which a female could conceive. For months during which females gave birth, this value was the number of days remaining in the month after the birth. Pregnant females, who took a value of 0, were excluded from the model of conception probability. We added a variable identifying which post-fission group a female ended up in for months falling within 2 years before or after a fission event. For the infant survival analysis, we created three categorical variables to assess the influence of fission on infant survival, assigning each infant as being born before vs. after fission vs. neither, and using timescales of six months, one year, and two years to assess “before” and “after”. We used the infant’s mother’s age at the time of the infant’s birth and designated whether the infant was born during the peak birth season (December-March) or not. We added a variable identifying which post-fission group an infant’s mother ended up in for infants born two years before or after fission.

  5. PISA Test Scores

    • kaggle.com
    zip
    Updated Dec 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    piAI (2019). PISA Test Scores [Dataset]. https://www.kaggle.com/datasets/econdata/pisa-test-scores/code
    Explore at:
    zip(74778 bytes)Available download formats
    Dataset updated
    Dec 27, 2019
    Authors
    piAI
    Description

    Context

    The Programme for International Student Assessment (PISA) is a test given every three years to 15-year-old students from around the world to evaluate their performance in mathematics, reading, and science. This test provides a quantitative way to compare the performance of students from different parts of the world. In this homework assignment, we will predict the reading scores of students from the United States of America on the 2009 PISA exam.

    The datasets pisa2009train.csv and pisa2009test.csv contain information about the demographics and schools for American students taking the exam, derived from 2009 PISA Public-Use Data Files distributed by the United States National Center for Education Statistics (NCES). While the datasets are not supposed to contain identifying information about students taking the test, by using the data you are bound by the NCES data use agreement, which prohibits any attempt to determine the identity of any student in the datasets.

    Each row in the datasets pisa2009train.csv and pisa2009test.csv represents one student taking the exam. The datasets have the following variables:

    Content

    grade: The grade in school of the student (most 15-year-olds in America are in 10th grade)

    male: Whether the student is male (1/0)

    raceeth: The race/ethnicity composite of the student

    preschool: Whether the student attended preschool (1/0)

    expectBachelors: Whether the student expects to obtain a bachelor's degree (1/0)

    motherHS: Whether the student's mother completed high school (1/0)

    motherBachelors: Whether the student's mother obtained a bachelor's degree (1/0)

    motherWork: Whether the student's mother has part-time or full-time work (1/0)

    fatherHS: Whether the student's father completed high school (1/0)

    fatherBachelors: Whether the student's father obtained a bachelor's degree (1/0)

    fatherWork: Whether the student's father has part-time or full-time work (1/0)

    selfBornUS: Whether the student was born in the United States of America (1/0)

    motherBornUS: Whether the student's mother was born in the United States of America (1/0)

    fatherBornUS: Whether the student's father was born in the United States of America (1/0)

    englishAtHome: Whether the student speaks English at home (1/0)

    computerForSchoolwork: Whether the student has access to a computer for schoolwork (1/0)

    read30MinsADay: Whether the student reads for pleasure for 30 minutes/day (1/0)

    minutesPerWeekEnglish: The number of minutes per week the student spend in English class

    studentsInEnglish: The number of students in this student's English class at school

    schoolHasLibrary: Whether this student's school has a library (1/0)

    publicSchool: Whether this student attends a public school (1/0)

    urban: Whether this student's school is in an urban area (1/0)

    schoolSize: The number of students in this student's school

    readingScore: The student's reading score, on a 1000-point scale

    Acknowledgements

    MITx ANALYTIX

  6. 2

    NCDS8; NCDS

    • datacatalogue.ukdataservice.ac.uk
    Updated Apr 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of London, Institute of Education, Centre for Longitudinal Studies (2024). NCDS8; NCDS [Dataset]. http://doi.org/10.5255/UKDA-SN-6137-2
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    University of London, Institute of Education, Centre for Longitudinal Studies
    Area covered
    United Kingdom
    Description

    The National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.

    The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.

    Survey and Biomeasures Data (GN 33004):

    To date there have been ten attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137), the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669), and the tenth sweep was conducted in 2020-24 when the respondents were aged 60-64 (held under SN 9412).

    A Secure Access version of the NCDS is available under SN 9413, containing detailed sensitive variables not available under Safeguarded access (currently only sweep 10 data). Variables include uncommon health conditions (including age at diagnosis), full employment codes and income/finance details, and specific life circumstances (e.g. pregnancy details, year/age of emigration from GB).

    Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.

    From 2002-2004, a Biomedical Survey was completed and is available under Safeguarded Licence (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.

    Linked Geographical Data (GN 33497):
    A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.

    Linked Administrative Data (GN 33396):
    A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.

    Multi-omics Data and Risk Scores Data (GN 33592)
    Proteomics analyses were run on the blood samples collected from NCDS participants in 2002-2004 and are available under SL SN 9254. Metabolomics analyses were conducted on respondents of sweep 10 and are available under SL SN 9411. Polygenic indices are available under SL SN 9439. Derived summary scores have been created that combine the estimated effects of many different genes on a specific trait or characteristic, such as a person's risk of Alzheimer's disease, asthma, substance abuse, or mental health disorders, for example. These scores can be combined with existing survey data to offer a more nuanced understanding of how cohort members' outcomes may be shaped.

    Additional Sub-Studies (GN 33562):
    In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.

    How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
    For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.

    Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.

    NCDS8:
    The eighth sweep of NCDS was conducted in 2008-2009, when respondents were aged 50 years. The core aims of the NCDS8 were to update the life history information collected in previous studies and to collect new information to help understand the ageing process. Many of the questions in the NCDS8 follow-up had been asked in earlier waves of the NCDS and the BCS, which will allow for the making of comparisons both across the sweeps of NCDS and with the BCS cohort.

    The 2008-2009 survey is comprised of the following elements:
    • a 55 minute a 'core' interview (included a Computer Assisted Personal Interview (CAPI); Computer Assisted Self Interview (CASI); a series of cognitive assessments)
    • a paper questionnaire
    Edition history:
    The NCDS8 has been deposited at the UK Data Archive in stages. For the first Archive edition (March 2009) an interim data file was deposited, based on 2,997 interviews completed between August and December 2008. This file comprised a subset of the full list of variables.
    The second Archive edition (the first full sample edition) was released in February 2010. This deposit included responses to the bulk of the questions fielded to cohort members in 2008-2009. The variables that were not included in this file were essentially those that required the most complex post-fieldwork editing in order to make them usable, mostly those that related to the four 'history' modules; housing history, relationship history, fertility history and economic activity history. In addition, variables relating to absent children, older children and specific details of recently-achieved qualifications were not included (although a series of derived summary variables relating to highest qualification were).
    For the third Archive edition (October 2012), the final version of NCDS8 was deposited. Two files, 'ncds_2008_followup.sav' and 'ncds8_unfolding_brackets.sav' replaced the previous single data file, a new User Guide replaced the previous version, and the Technical Report and Appendices were added to the documentation. For further details, see the User Guide.

  7. Optimal Timings Codebook.xlsx

    • figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emma Spillane; Shawn Walker; Christine McCourt (2023). Optimal Timings Codebook.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.15134376.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Emma Spillane; Shawn Walker; Christine McCourt
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A single-centre retrospective case control study was conducted. The protocol defined cases as all neonatal deaths or NICU admissions occurring within an eight-year period from 2012 to 2020, although no neonatal deaths occurred during this period following a vaginal breech birth. Controls were identified as the two vaginal breech births directly prior to the case where no neonatal death nor NICU admission occurred. Two previous births were used to prevent bias on the understanding that an adverse outcome can affect clinical decision-making for subsequent births.12 Any NICU admission was included because this indicates a neonate which requires additional observation, tests and/or intervention. Neonates who are not admitted are deemed as generally well.13 Additionally, separation from the baby was considered an important outcome by our Patient and Public Involvement Group,14 who also requested more information on the timing of cord clamping.The study was conducted within the maternity unit at a London District General Hospital which serves a large population of 176,313 people. Two thirds are of white British ethnicity and one third from Black, Asian and Minority Ethnic (BAME) backgrounds. The community the hospital serves is thought of as affluent, with good employment rates, particularly employment in high-end jobs. The hospital itself serves a wider community than the borough it is situated within and has 5000 births per year. It has a level two NICU situated within the maternity unit. The Algorithm was not in use at the site, and none of the authors were employed by the Trust, during the time period covered by the study. Fifteen cases and thirty controls were identified from routine electronic health records. The Medical Record Numbers were sent to the Health Records Department for the complete files to be retrieved. Data were extracted by the lead researcher from the intrapartum care records and recorded anonymously in a Microsoft Excel spreadsheet.A structured data collection tool was developed based on Reitter et al.13 The data collection tool consisted of information usually recorded in the notes during a breech birth and included: lead professional, type of breech, position, epidural, fetal monitoring, meconium, what emerged first, time each part of the breech born, documented manoeuvres used, time performed and information related to the condition of the neonate at birth.To calculate our sample size, based on the work of Reitter et al,11 we hypothesised that the rate of exposure to a pelvis-to-head interval >3 minutes would be 25% among controls and 75% among cases. Using a case:control ratio of 1:2, we determined that 15 independent cases and 30 controls were required to infer an association between a pelvis-to-head interval >3 minutes and the composite neonatal outcome with a confidence interval of 95% and a power of 80%. First, we calculated the time to event interval for variables of interest. We then reported descriptive statistics for all variables, including means, medians and range for continuous variables. Exposures and confounders were converted into binary variables, reflecting the cut-offs used in the Algorithm. These were then tested against the primary outcome using the non-parametric chi-square, or Fisher’s Exact tests where cell frequencies were too small for the chi-square test. Logistic regression analysis was used to test the predictive values of meeting or exceeding the recommended time limits in the Physiological Breech Birth Algorithm. Further logistic regression analyses were conducted with all variables that showed an association with the composite neonatal outcome to determine their predictive value, and additional variables to explore their potential as confounding factors for investigation in future studies. Finally, a Receiver Operating Characteristics (ROC) curve analysis was conducted to compare the sensitivity and specificity of the 7-5-3 minute time limits. All statistical analyses were performed using IBM SPSS version 26.

  8. IMDB Dataset upto MAR 2025

    • kaggle.com
    zip
    Updated Apr 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arun Vithyasegar (2025). IMDB Dataset upto MAR 2025 [Dataset]. https://www.kaggle.com/datasets/arunvithyasegar/imdb-dataset-upto-mar-2025
    Explore at:
    zip(1774932228 bytes)Available download formats
    Dataset updated
    Apr 12, 2025
    Authors
    Arun Vithyasegar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    📦 About the Dataset This dataset provides a comprehensive snapshot of IMDb data up to March 2025, formatted as gzipped, tab-separated values (TSV) files encoded in UTF-8. Each file includes a header row detailing the columns, and missing values are denoted by \N.​

    📁 Dataset Contents 1. title.akas.tsv.gz Contains alternative titles for films and shows, including:​

    titleId: Unique identifier for the title.​

    ordering: Sequence number for titles with the same titleId.​

    title: Localized title.​

    region: Region for this version of the title.​

    language: Language of the title.​

    types: Attributes like "alternative", "dvd", "festival", etc.​

    attributes: Additional descriptors for the title.​

    isOriginalTitle: Indicates if it's the original title (1) or not (0).​

    1. title.basics.tsv.gz Provides fundamental details about each title:​

    tconst: Unique identifier for the title.​

    titleType: Type of title (e.g., movie, short, tvSeries).​

    primaryTitle: Main title used in promotional materials.​

    originalTitle: Original title in the original language.​

    isAdult: Indicates if the title is adult content (1) or not (0).​

    startYear: Release year of the title.​

    endYear: End year for TV series; otherwise null.​ Kaggle

    runtimeMinutes: Runtime in minutes.​ Kaggle

    genres: Up to three genres associated with the title.​

    1. title.principals.tsv.gz Details about principal cast and crew members:​

    tconst: Unique identifier for the title.​

    ordering: Sequence number for credits.​

    nconst: Unique identifier for the person.​

    category: Job category (e.g., actor, director).​

    job: Specific job title, if applicable.​

    characters: Names of characters played, if applicable.​

    1. title.ratings.tsv.gz Contains IMDb ratings and vote counts:​

    tconst: Unique identifier for the title.​

    averageRating: Weighted average of user ratings.​

    numVotes: Number of votes the title has received.​

    1. name.basics.tsv.gz Information about individuals in the industry:​

    nconst: Unique identifier for the person.​

    primaryName: Name by which the person is most often credited.​

    birthYear: Year of birth.​

    deathYear: Year of death, if applicable.​

    primaryProfession: Top three professions of the person.​

    knownForTitles: Titles the person is known for.​

    💡 Inspiration This dataset is ideal for various analytical and machine learning projects, such as:​

    Building movie recommendation systems.​

    Predicting movie ratings based on metadata.​

    Analyzing trends in genres, runtimes, and release years.​

    Studying the careers of actors, directors, and other professionals.​

  9. f

    Data used in "A summer heatwave reduced activity, heart rate and autumn body...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans, Alina L.; Albon, Steve; Król, Elżbieta; Trondrud, L. Monica; Kumpula, Jouko; Speakman, john; Loe, Leif Egil; Pigeon, Gabriel; Ropstad, Erik (2023). Data used in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate" [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001093609
    Explore at:
    Dataset updated
    Mar 31, 2023
    Authors
    Evans, Alina L.; Albon, Steve; Król, Elżbieta; Trondrud, L. Monica; Kumpula, Jouko; Speakman, john; Loe, Leif Egil; Pigeon, Gabriel; Ropstad, Erik
    Description

    Overview This dataset contains biologging data and R script used to produce the results in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate", a submitted manuscript. The longitudinal data of female reindeer and calf body masses used in the paper is owned by the Finnish Reindeer Herders’ Association. Natural Resources Institute Finland (Luke) updates, saves and administrates this long-term reindeer herd data. Methods of data collection Animals and study area The study involved biologging (see below) 14 adult semi-domesticated reindeer females (Focal animals: Table S1) at the Kutuharju Reindeer Research Facility (Kaamanen, Northern Finland, 69° 8’ N, 26° 59’ E, Figure S1), during June–September 2018. Ten of these individuals had been intensively handled in June as part of another study (Trondrud, 2021). The 14 females were part of a herd of ~100 animals, belonging to the Reindeer Herders’ Association. The herding management included keeping reindeer in two large enclosures (~13.8 and ~15 km2) after calving until the rut, after which animals were moved to a winter enclosure (~15 km2) and then in spring to a calving paddock (~0.3 km2) to give birth (See Supporting Information for further details on the study area). Kutuharju reindeer graze freely on natural pastures from May to November and after that are provided with silage and pellets as a supplementary feed in winter. During the period from September to April animals are weighed 5–6 times. In September, body masses of the focal females did not differ from the rest of the herd. Heart rate (HR) and subcutaneous body temperature (Tsc) data In February 2018, the focal females were instrumented with a heart rate (HR) and temperature logger (DST centi-HRT, Star-Oddi, Gardabaer, Iceland). The surgical protocol is described in the Supporting Information. The DST centi-HRT sensors recorded HR and subcutaneous body temperature (Tsc) every 15 min. HR was automatically calculated from a 4-sec electrocardiogram (ECG) at 150 Hz measurement frequency, alongside an index for signal quality. Additional data processing is described in Supporting Information. Activity data The animals were fitted with collar-mounted tri-axial accelerometers (Vertex Plus Activity Sensor, Vectronic Aerospace GmbH, Berlin, Germany) to monitor their activity levels. These sensors recorded acceleration (g) in three directions representing back-forward, lateral, and dorsal-ventral movements at 8 Hz resolution. For each axis, partial dynamic body acceleration (PDBA) was calculated by subtracting the static acceleration using a 4 sec running average from the raw acceleration (Shepard et al., 2008). We estimated vectorial dynamic body acceleration (VeDBA) by calculating the square root of the sum of squared PDBAs (Wilson et al., 2020). We aggregated VeDBA data into 15-min sums (hereafter “sum VeDBA”) to match with HR and Tsc records. Corrections for time offsets are described in Supporting Information. Due to logger failures, only 10 of the 14 individuals had complete data from both loggers (activity and heart rate). Weather and climate data We set up a HOBO weather station (Onset Computer Corporation, Bourne, MA, USA) mounted on a 2 m tall tripod in May 2018 that measured air temperature (Ta, °C) at 15-minute intervals. The placement of the station was between the two summer paddocks. These measurements were matched to the nearest timestamps for VeDBA, HR and Tsc recordings. Also, we obtained weather records from the nearest public weather stations for the years 1990–2021 (Table S2). Weather station IDs and locations relative to the study area are shown in Figure S1 in the Supporting Information. The temperatures at the study site and the nearest weather station were strongly correlated (Pearson’s, r = 0.99), but temperatures were on average ~1.0°C higher at the study site (Figure S2). Statistical analyses All statistical analyses were conducted in R version 4.1.1 (The R Core Team, 2021). Mean values are presented with standard deviation (SD), and parameter estimates with standard error (SE). Environmental effects on activity states and transition probabilities We fitted hidden Markov models (HMM) to 15-min sum VeDBA using the package ‘momentuHMM’ (McClintock & Michelot, 2018). HMMs assume that the observed pattern is driven by an underlying latent state sequence (a finite Markov chain). These states can then be used as proxies to interpret the animal’s unobserved behaviour (Langrock et al., 2012). We assumed only two underlying states, thought to represent ‘inactive’ and ‘active’ (Figure S3). The ‘active’ state thus contains multiple forms of movement, e.g., foraging, walking, and running, but reindeer spend more than 50% of the time foraging in summer (Skogland, 1980). We fitted several HMMs to evaluate both external (temperature and time of day) and individual-level (calf status) effects on the probability to occupy each state (stationary state probabilities). The combination of the explanatory variables in each HMM is listed in Table S5. Ta was fitted as a continuous variable with piecewise polynomial spline with 8 knots, asserted from visual inspection of the model outputs. We included sine and cosine terms for time of day to account for cyclicity. In addition, to assess the impact of Ta on activity patterns, we fitted five temperature-day categories in interaction with time of day. These categories were based on 20% intervals of the distribution of temperature data from our local weather station, in the period 19 June to 19 August 2018, with ranges of < 10°C (cold), 10−13°C (cool), 13−16°C (intermediate) 16−20°C (warm) and ≥ 20°C (hot). We evaluated the significance of each variable on the transition probabilities from the confidence intervals of each estimate, and the goodness-of-fit of each model using Akaike information criteria (AIC) (Burnham & Anderson, 2002), retaining models within ΔAIC < 5. We extracted the most likely state occupied by an individual using the viterbi function, returning the optimal state pathway, i.e., a two-level categorical variable indicating whether the individual was most likely resting or active. We used this output to calculate daily activity budgets (% time spent active). Drivers of heart rate (HR) and subcutaneous body temperature (Tsc) We matched the activity states derived from the HMM to the HR and Tsc data. We opted to investigate the drivers of variation in HR and Tsc only within the inactive state. HR and Tsc were fitted as response variables in separate generalised additive mixed-effects models (GAMM), which included the following smooth terms: calendar day as a thin-plate regression spline, time of day (ToD, in hours, knots [k] = 10) as a cubic circular regression spline and individual as random intercept. All models were fitted using restricted maximum likelihood, a penalization value (λ) of 1.4 (Wood, 2017), and an autoregressive structure (AR1) to account for temporal autocorrelation. We used the ‘gam.check’ function from the ‘mgcv’ package to select k. The sum of VeDBA in the past 15 minutes was included as a predictor in all models. All models were fitted with the same set of explanatory variables: sum VeDBA, age, body mass (BM), lactation status, Ta, as well as the interaction between lactation status and Ta. Description of files 1. Data: "kutuharju_weather.csv" weather data recorded from local weather station during study period "Inari_Ivalo_lentoasema.csv" public weather data from weather station ID 102033, owned and managed by the Finnish Meterorological Institute "activitydata.Rdata" dataset used in analyses of activity patterns in reindeer "HR_temp_data.Rdata" dataset used in analyses of heart rate and body temperature responses in reindeer "HRfigureData.Rdata" and "TempFigureData.Rdata" are data files (lists) with model outputs generated in "heartrate_bodytemp_analyses.R" and used in "figures_in_paper.R" "HMM_df_withStates.Rdata" data frame used in HMM models including output from viterbi function "plotdf_m16.Rdata" dataframe for plotting output from model 16 "plotdf_m22.Rdata" dataframe for plotting output from model 22 2. Scripts "activitydata_HMMs.R" R script for data prep and hidden markov models to analyse activity patterns in reindeer "heartrate_bodytemp_analyses.R" R script for data prep and generalized additive mixed models to analyse heart rate and body temperature responses in reindeer "figures_in_paper.R" R script for generating figures 1-3 in the manuscript 3. HMM_model "modelList.Rdata" list containing 2 items: string of all 25 HMM models created, and dataframe with model number and formula "m16.Rdata" and "m22.Rdata" direct acces to two best-fit models

  10. Predict NHL Player Salaries

    • kaggle.com
    zip
    Updated Aug 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cam Nugent (2017). Predict NHL Player Salaries [Dataset]. https://www.kaggle.com/camnugent/predict-nhl-player-salaries
    Explore at:
    zip(187266 bytes)Available download formats
    Dataset updated
    Aug 18, 2017
    Authors
    Cam Nugent
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context & Content

    This dataset features the salaries of 874 nhl players for the 2016/2017 season. I have randomly split the players into a training (612 players) and test (262 players) populations. There are 151 predictor columns (described in column legend section, if you're not familiar with hockey the meaning of some of these may be a bit cryptic!) as well as a leading column with the players 2016/2017 annual salary. For the test population the actual salaries have been broken off into a separate .csv file.

    Acknowledgements

    Raw excel sheet was acquired http://www.hockeyabstract.com/

    Inspiration

    Can you build a model to predict NHL player's salaries? What are the best predictors of how much a player will make?

    Column Legend

    Acronym - Meaning

    %FOT - Percentage of all on-ice faceoffs taken by this player.

    +/- - Plus/minus

    1G - First goals of a game

    A/60 - Events Against per 60 minutes, defaults to Corsi, but can be set to another stat

    A1 - First assists, primary assists

    A2 - Second assists, secondary assists

    BLK% - Percentage of all opposing shot attempts blocked by this player

    Born - Birth date

    C.Close - A player shot attempt (Corsi) differential when the game was close

    C.Down - A player shot attempt (Corsi) differential when the team was trailing

    C.Tied - A player shot attempt (Corsi) differential when the team was tied

    C.Up - A player shot attempt (Corsi) differential when the team was in the lead

    CA - Shot attempts allowed (Corsi, SAT) while this player was on the ice

    Cap Hit - The player's cap hit

    CBar - Crossbars hit

    CF - The team's shot attempts (Corsi, SAT) while this player was on the ice

    CF.QoC - A weighted average of the Corsi percentage of a player's opponents

    CF.QoT - A weighted average of the Corsi percentage of a player's linemates

    CHIP - Cap Hit of Injured Player is games lost to injury multiplied by cap hit per game

    City - City of birth

    Cntry - Country of birth

    DAP - Disciplined aggression proxy, which is hits and takeaways divided by minor penalties

    DFA - Dangerous Fenwick against, which is on-ice unblocked shot attempts weighted by shot quality

    DFF - Dangerous Fenwick for, which is on-ice unblocked shot attempts weighted by shot quality

    DFF.QoC - Quality of Competition metric based on Dangerous Fenwick, which is unblocked shot attempts weighted for shot quality

    DftRd - Round in which the player was drafted

    DftYr - Year drafted

    Diff - Events for minus event against, defaults to Corsi, but can be set to another stat

    Diff/60 - Events for minus event against, per 60 minutes, defaults to Corsi, but can be set to another stat

    DPS - Defensive point shares, a catch-all stats that measures a player's defensive contributions in points in the standings

    DSA - Dangerous shots allowed while this player was on the ice, which is rebounds plus rush shots

    DSF - The team's dangerous shots while this player was on the ice, which is rebounds plus rush shots

    DZF - Shifts this player has ended with an defensive zone faceoff

    dzFOL - Faceoffs lost in the defensive zone

    dzFOW - Faceoffs win in the defensive zone

    dzGAPF - Team goals allowed after faceoffs taken in the defensive zone

    dzGFPF - Team goals scored after faceoffs taken in the defensive zone

    DZS - Shifts this player has started with an defensive zone faceoff

    dzSAPF - Team shot attempts allowed after faceoffs taken in the defensive zone

    dzSFPF - Team shot attempts taken after faceoffs taken in the defensive zone

    E+/- - A player's expected +/-, based on his team and minutes played

    ENG - Empty-net goals

    Exp dzNGPF - Expected goal differential after faceoffs taken in the defensive zone, based on the number of them

    Exp dzNSPF - Expected shot differential after faceoffs taken in the defensive zone, based on the number of them

    Exp ozNGPF - Expected goal differential after faceoffs taken in the offensive zone, based on the number of them

    Exp ozNSPF - Expected shot differential after faceoffs taken in the offensive zone, based on the number of them

    F.Close - A player unblocked shot attempt (Fenwick) differential when the game was close

    F.Down - A player unblocked shot attempt (Fenwick) differential when the team was trailing

    F.Tied - A player unblocked shot attempt (Fenwick) differential when the team was tied

    F.Up - A player unblocked shot attempt (Fenwick) differential when the team was in the lead. Not the best acronym.

    F/60 - Events For per 60 minutes, defaults to Corsi, but can be set to another stat

    FA - Unblocked shot attempts allowed (Fenwick, USAT) while this player was on the ice

    FF - The team's unblocked shot attempts (Fenwick, USAT) while this player was on the ice

    First Name -

    FO% - Faceoff winning percentage

    FO%vsL - Faceoff winning percentage against lefthanded opponents

    FO%vsR - Faceoff winning percentage against righthanded opponents

    FOL - The team's faceoff losses...

  11. Growing Up in Scotland: Cohort 1, Sweep 8 Physical Activity Data, 2015-2016...

    • harmonydata.ac.uk
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ScotCen Social Research, Growing Up in Scotland: Cohort 1, Sweep 8 Physical Activity Data, 2015-2016 / Studying Physical Activity in Children’s Environments across Scotland; SPACES [Dataset]. http://doi.org/10.5255/UKDA-SN-9120-1
    Explore at:
    Dataset provided by
    ScotCen Social Research
    University of Glasgow, MRC/CSO Social and Public Health Sciences Unit
    Time period covered
    2005 - Present
    Area covered
    Scotland
    Description

    The Growing Up in Scotland (GUS) study is a large-scale longitudinal social survey which follows the lives of several groups of Scottish children from infancy through childhood and adolescence. It aims to provide important information on children, young people and their families in Scotland. The study forms a central part of the Scottish Government's strategy for the long-term monitoring and evaluation of its policies for children and young people, with a specific focus on the early years. The study seeks both to describe the characteristics, circumstances and experiences of children in their early years in Scotland and, through its longitudinal design, to generate a better understanding of how children's start in life can shape their longer term prospects and developmentSince 2005 fieldwork has been undertaken by the Scottish Centre for Social Research. The survey design for Birth Cohort 1 consisted of recruiting the parents of an initial total of 5,217 children aged 10 months old in 2005 and interviewing them annually until their child reached age six. Further fieldwork was then undertaken at ages 8, 10, 12, 14 and 17-18 with a sample boost added at age 12.Data for sweeps 1-9 were collected via an in-home, face-to-face interview with self-complete sections. Fieldwork for sweep 10 was disrupted due to the COVID pandemic. As a result, the final portion of the data was collected via web and telephone questionnaires. Sweep 11 data were gathered via web, telephone and face-to-face surveys of cohort members and their parent/carer.Further information about the survey may be found on the Growing Up in Scotland website.In May 20205, data and documentation for Cohort 1, Sweeps 1-11 were released as individual studies (SNs 9373-9383 and 9386-9387). Previously they were held under one study (SN 5760) which has been withdrawn from the data catalogue. SN 9120 - Growing Up in Scotland: Cohort 1, Sweep 8 Physical Activity Data, 2015-2016The Studying Physical Activity in Children's Environments across Scotland (SPACES) project aimed to investigate the ways in which the built, natural and social environment influences children's physical activity. The project employed an observational, cross-sectional design that sub-sampled from Birth Cohort 1 (BC1) of the GUS during the GUS Sweep 8 fieldwork. Children sub-sampled from GUS were invited to provide objectively measured physical activity data by wearing an accelerometer for eight days.This dataset provides a range of summary physical activity variables from this project. A total of 775 children provided valid data. As a sub sample of GUS BC1, the summary level physical activity data can be linked, where appropriate, to other GUS BC1 datasets held on UKDS at the individual level. The physical activity data were collected between May 2015 and May 2016 by the MRC/CSO Social and Public Health Sciences Unit (SPHSU), University of Glasgow. To support a range of analytical projects, a series of summary variables have been derived and included in the dataset. These include minutes spent in different categories of physical activity and variables indicating whether the child met the recommended Scottish Government guidelines of 60 minutes of physical activity each day (calculated in two forms: an average of 60 minutes per day overall valid days; a stricter measure of actual 60 minutes per day on each valid day).

  12. PISA2009Test

    • kaggle.com
    zip
    Updated Jul 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vageesha Budanur (2019). PISA2009Test [Dataset]. https://www.kaggle.com/vageeshabudanur/pisa2009test
    Explore at:
    zip(74778 bytes)Available download formats
    Dataset updated
    Jul 4, 2019
    Authors
    Vageesha Budanur
    Description

    The Programme for International Student Assessment (PISA) is a test given every three years to 15-year-old students from around the world to evaluate their performance in mathematics, reading, and science. This test provides a quantitative way to compare the performance of students from different parts of the world.

    We will predict the reading scores of students from the United States of America on the 2009 PISA exam.

    The datasets pisa2009train.csv and pisa2009test.csv contain information about the demographics and schools for American students taking the exam, derived from 2009 PISA Public-Use Data Files distributed by the United States National Center for Education Statistics (NCES).Each row in the datasets pisa2009train.csv and pisa2009test.csv represents one student taking the exam.

    The datasets have the following variables:

    grade: The grade in school of the student (most 15-year-olds in America are in 10th grade)

    male: Whether the student is male (1/0)

    raceeth: The race/ethnicity composite of the student

    preschool: Whether the student attended preschool (1/0)

    expectBachelors: Whether the student expects to obtain a bachelor's degree (1/0)

    motherHS: Whether the student's mother completed high school (1/0)

    motherBachelors: Whether the student's mother obtained a bachelor's degree (1/0)

    motherWork: Whether the student's mother has part-time or full-time work (1/0)

    fatherHS: Whether the student's father completed high school (1/0)

    fatherBachelors: Whether the student's father obtained a bachelor's degree (1/0)

    fatherWork: Whether the student's father has part-time or full-time work (1/0)

    selfBornUS: Whether the student was born in the United States of America (1/0)

    motherBornUS: Whether the student's mother was born in the United States of America (1/0)

    fatherBornUS: Whether the student's father was born in the United States of America (1/0)

    englishAtHome: Whether the student speaks English at home (1/0)

    computerForSchoolwork: Whether the student has access to a computer for schoolwork (1/0)

    read30MinsADay: Whether the student reads for pleasure for 30 minutes/day (1/0)

    minutesPerWeekEnglish: The number of minutes per week the student spend in English class

    studentsInEnglish: The number of students in this student's English class at school

    schoolHasLibrary: Whether this student's school has a library (1/0)

    publicSchool: Whether this student attends a public school (1/0)

    urban: Whether this student's school is in an urban area (1/0)

    schoolSize: The number of students in this student's school

    readingScore: The student's reading score, on a 1000-point scale

  13. ATP Matches from 2001 to 2020

    • kaggle.com
    zip
    Updated Apr 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afaf Athar (2022). ATP Matches from 2001 to 2020 [Dataset]. https://www.kaggle.com/datasets/afafathar3007/dataset
    Explore at:
    zip(3285962 bytes)Available download formats
    Dataset updated
    Apr 10, 2022
    Authors
    Afaf Athar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    ATP Tennis Rankings, Results, and Stats

    This contains my master ATP player file, historical rankings, results, and match stats.

    The player file columns are player_id, first_name, last_name, hand, birth_date, country_code, and height (cm).

    The columns for the ranking files are ranking_date, ranking, player_id, and ranking_points (where available).

    ATP rankings are mostly complete from 1985 to the present. 1982 is missing, and rankings from 2001-2022020 are only intermittent.

    Results and stats: There are up to three files per season: One for tour-level main draw matches (e.g. 'atp_matches_2014.csv'), one for tour-level qualifying and challenger main-draw matches, and one for futures match.

    Most of the columns in the results files are self-explanatory. I've also included a matches_data_dictionary.txt file to spell things out a bit more.

    To make the results files easier for more people to use, I've included a fair bit of redundancy with the biographical and ranking files: each row contains several columns of biographical information, along with ranking and ranking points, for both players. Ranking data, as well as age, areas of tourney_date, which is almost always the Monday at or near the beginning of the event.

    MatchStats a are included where I have them. In general, that means 1991-present for tour-level matches, 2008-present for challengers, and 2011-present for tour-level qualifying. The MatchStats columns should be self-explanatory, but they might not be what you're used to seeing; it's all integer totals (e.g. 1st serves in, not 1st serve percentage), from which traditional percentages can be calculated.

    There are some tour-level matches with missing stats. Some are missing because ATP doesn't have them. Others I've deleted because they didn't pass some sanity check (loser won 60% of points, or match time was under 20 minutes, etc). Also, Davis Cup matches are included in the tour-level files, but there are no stats for Davis Cup matches until the last few seasons.

    Doubles I've added tour-level doubles back to 2000. Filenames follow the convention atp_matches_doubles_yyyy.csv. I may eventually be able to add tour-level doubles from before 2000, as well as lower-level doubles for some years. Most of the columns are the same, though in a different order.

    Doubles updates are temporarily suspended as of late 2020.

    Contributing If you find a bug, please file an issue, and be as specific as possible.

    Feel free to correct bugs or fill in missing data via pull requests, but be aware that I will not merge PRs. But if that's the most convenient way for you to submit improvements to the data, that's fine; I can work with that.

    If you'd like to contribute to the project, I post "help wanted" issues, starting with a plea to fill in biographical data such as date of birth.

    Also, I encourage everyone to pitch into the Match Charting Project by charting pro matches. It's not a direct contribution to this repo, but it is a great way to improve the existing state of tennis data.

    Attention Please read, understand, and abide by the license below. It seems like a reasonable thing to ask, given the hundreds of hours I've put into amassing and maintaining this dataset. Unfortunately, a few bad apples have violated the license, and when people do that, it makes me considerably less motivated to continue updating.

    Also, if you're using this for academic/research purposes (great!), take a minute and cite it properly. It's not that hard, it helps others find a useful resource, and let's face it, you should be doing it anyway.

    License Creative Commons License Tennis databases, files, and algorithms by Jeff Sackmann / Tennis Abstract is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Based on a work at https://github.com/JeffSackmann.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Government of Canada, Statistics Canada (2025). Live births, by month [Dataset]. http://doi.org/10.25318/1310041501-eng
Organization logo

Live births, by month

1310041501

Explore at:
Dataset updated
Sep 24, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description

Number and percentage of live births, by month of birth, 1991 to most recent year.

Search
Clear search
Close search
Google apps
Main menu