73 datasets found
  1. P

    RACE Dataset

    • paperswithcode.com
    Updated Jan 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guokun Lai; Qizhe Xie; Hanxiao Liu; Yiming Yang; Eduard Hovy (2021). RACE Dataset [Dataset]. https://paperswithcode.com/dataset/race
    Explore at:
    Dataset updated
    Jan 28, 2021
    Authors
    Guokun Lai; Qizhe Xie; Hanxiao Liu; Yiming Yang; Eduard Hovy
    Description

    The ReAding Comprehension dataset from Examinations (RACE) dataset is a machine reading comprehension dataset consisting of 27,933 passages and 97,867 questions from English exams, targeting Chinese students aged 12-18. RACE consists of two subsets, RACE-M and RACE-H, from middle school and high school exams, respectively. RACE-M has 28,293 questions and RACE-H has 69,574. Each question is associated with 4 candidate answers, one of which is correct. The data generation process of RACE differs from most machine reading comprehension datasets - instead of generating questions and answers by heuristics or crowd-sourcing, questions in RACE are specifically designed for testing human reading skills, and are created by domain experts.

  2. w

    Dataset of book subjects that contain A race of female patriots : women and...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain A race of female patriots : women and public spirit on the British stage, 1688-1745 [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=A+race+of+female+patriots+:+women+and+public+spirit+on+the+British+stage%2C+1688-1745&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 8 rows and is filtered where the books is A race of female patriots : women and public spirit on the British stage, 1688-1745. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  3. N

    New Britain, CT Hispanic or Latino Population Distribution by Ancestries...

    • neilsberg.com
    csv, json
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). New Britain, CT Hispanic or Latino Population Distribution by Ancestries Dataset : Detailed Breakdown of Hispanic or Latino Origins // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/6058870f-2314-11ef-bd92-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Connecticut, New Britain
    Variables measured
    Hispanic or Latino population with Cuban ancestry, Hispanic or Latino population with Mexican ancestry, Hispanic or Latino population with Puerto Rican ancestry, Hispanic or Latino population with Other Hispanic or Latino ancestry, Hispanic or Latino population with Cuban ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Mexican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Puerto Rican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Other Hispanic or Latino ancestry as Percent of Total Hispanic Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Origin / Ancestry for Hispanic population and (b) respective population as a percentage of the total Hispanic population, we initially analyzed and categorized the data for each of the ancestries across the Hispanic or Latino population. It is ensured that the population estimates used in this dataset pertain exclusively to ancestries for the Hispanic or Latino population. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the New Britain Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of New Britain, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of New Britain.

    Key observations

    Among the Hispanic population in New Britain, regardless of the race, the largest group is of Puerto Rican origin, with a population of 25,104 (76.28% of the total Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Origin for Hispanic or Latino population include:

    • Mexican
    • Puerto Rican
    • Cuban
    • Other Hispanic or Latino

    Variables / Data Columns

    • Origin: This column displays the origin for Hispanic or Latino population for the New Britain
    • Population: The population of the specific origin for Hispanic or Latino population in the New Britain is shown in this column.
    • % of Total Hispanic Population: This column displays the percentage distribution of each Hispanic origin as a proportion of New Britain total Hispanic or Latino population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for New Britain Population by Race & Ethnicity. You can refer the same here

  4. Multi-race Human Body Data | 300,000 ID | Computer Vision Data| Image/Video...

    • datarade.ai
    Updated Mar 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). Multi-race Human Body Data | 300,000 ID | Computer Vision Data| Image/Video Deep Learning (DL) Data [Dataset]. https://datarade.ai/data-products/nexdata-multi-race-human-body-data-300-000-id-image-vi-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Mar 16, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Albania, Armenia, Japan, Latvia, Peru, State of, Macedonia (the former Yugoslav Republic of), Vietnam, El Salvador, Dominican Republic
    Description
    1. Specifications Data size : 200,000 ID

    Race distribution : Asians, Caucasians, black people

    Gender distribution : gender balance

    Age distribution : ranging from teenager to the elderly, the middle-aged and young people are the majorities

    Collecting environment : including indoor and outdoor scenes

    Data diversity : different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses

    Device : cameras

    Data format : the data format is .jpg/mp4, the annotation file format is .json, the camera parameter file format is .json, the point cloud file format is .pcd

    Accuracy : based on the accuracy of the poses, the accuracy exceeds 97%;the accuracy of labels of gender, race, age, collecting environment and clothes are more than 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go machine learning (ML) data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at hhttps://www.nexdata.ai/datasets/computervision?source=Datarade
  5. Estimates of the population for the UK, England, Wales, Scotland, and...

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Oct 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2024). Estimates of the population for the UK, England, Wales, Scotland, and Northern Ireland [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesforukenglandandwalesscotlandandnorthernireland
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Ireland, United Kingdom, England
    Description

    National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).

  6. Multi-race Human Face Data | 200,000 ID | Face Recognition Data| Image/Video...

    • datarade.ai
    Updated Dec 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Multi-race Human Face Data | 200,000 ID | Face Recognition Data| Image/Video AI Training Data | Biometric AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multi-race-human-face-data-200-000-id-image-vi-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 22, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Germany, Bosnia and Herzegovina, Lao People's Democratic Republic, Belarus, Mexico, Chile, Bulgaria, Canada, Cambodia, Iran (Islamic Republic of)
    Description
    1. Specifications Product : Biometric Data

    Data size : 200,000 ID

    Race distribution : black people, Caucasian people, brown(Mexican) people, Indian people and Asian people

    Gender distribution : gender balance

    Age distribution : young, midlife and senior

    Collecting environment : including indoor and outdoor scenes

    Data diversity : different face poses, races, ages, light conditions and scenes Device : cellphone

    Data format : .jpg/png

    Accuracy : the accuracy of labels of face pose, race, gender and age are more than 97%

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Biometric Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/computervision?source=Datarade
  7. Young people's earnings progression and geographic mobility, England and...

    • ons.gov.uk
    • cy.ons.gov.uk
    xls
    Updated Oct 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2018). Young people's earnings progression and geographic mobility, England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/personalandhouseholdfinances/incomeandwealth/datasets/youngpeoplesearningsprogressionandgeographicmobilityenglandandwales
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 23, 2018
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Wales, England
    Description

    Supporting dataset using data from Census, Pay As You Earn (PAYE) and National Benefits Database. Tables contain data on earnings progression and geographic mobility from tax year ending 2012 to tax year ending 2016, broken down by characteristics such as age, sex, ethnicity, qualification level and local authority. The dataset also includes regression model output tables.

  8. Young people's well-being measures

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Oct 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2020). Young people's well-being measures [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/datasets/youngpeopleswellbeingmeasures
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 2, 2020
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Contains all the current domains and measures of national well-being for young people. As well as providing the latest data for each measure, where available a time series of data are also presented along with useful links to data sources and other websites which may be of interest.

  9. Ethnic group (England and Wales) 2011

    • statistics.ukdataservice.ac.uk
    csv, zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2022). Ethnic group (England and Wales) 2011 [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/ethnic-group-england-and-wales-2011
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Northern Ireland Statistics and Research Agency
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Wales, England
    Description

    Dataset population: Persons

    Ethnic group

    Ethnic group classifies people according to their own perceived ethnic group and cultural background.

    This topic contains ethnic group write-in responses without reference to the five broad ethnic group categories, e.g. all Irish people, irrespective of whether they are White, Mixed/multiple ethnic groups, Asian/Asian British, Black/African/Caribbean/Black British or Other ethnic group, are in the "Irish" response category. This topic was created as part of the commissioned table processing.

  10. Model estimates of deaths involving the coronavirus (COVID-19) by ethnic...

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Oct 16, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2020). Model estimates of deaths involving the coronavirus (COVID-19) by ethnic group for people in private households, England [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/modelestimatesofdeathsinvolvingthecoronaviruscovid19byethnicgroupforpeopleinprivatehouseholdsengland
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Model estimates of deaths involving the coronavirus (COVID-19) by ethnic group for people in private households in England.

  11. 5 measures of social capital by region and urban and rural by ethnicity

    • ons.gov.uk
    • cy.ons.gov.uk
    xls
    Updated May 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2016). 5 measures of social capital by region and urban and rural by ethnicity [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/datasets/5measuresofsocialcapitalbyregionandurbanandruralbyethnicity
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 19, 2016
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    How people feel about their neighbourhood across the UK. This dataset shows how people feel about their neighbourhood by looking at 5 measures of social capital and shows differences observed between regions,constituent countries and urban and rural areas by ethnicity

  12. Population estimates time series dataset

    • ons.gov.uk
    • cy.ons.gov.uk
    csv, xlsx
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2024). Population estimates time series dataset [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatestimeseriesdataset
    Explore at:
    csv, xlsxAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The mid-year estimates refer to the population on 30 June of the reference year and are produced in line with the standard United Nations (UN) definition for population estimates. They are the official set of population estimates for the UK and its constituent countries, the regions and counties of England, and local authorities and their equivalents.

  13. F

    English Shopping List OCR Image Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/english-shopping-list-ocr-image-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Introducing the English Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the English language.

    Dataset Contain & Diversity:

    Containing more than 2000 images, this English OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.

    To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible English text.

    The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.

    All these shopping lists were written and images were captured by native English people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.

    Metadata:

    In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.

    This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of English text recognition models.

    Update & Custom Collection:

    We are committed to continually expanding this dataset by adding more images with the help of our native English crowd community.

    If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.

    Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.

    License:

    This image dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the English language. Your journey to improved language understanding and processing begins here.

  14. w

    Immigration system statistics data tables

    • gov.uk
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    GOV.UK
    Authors
    Home Office
    Description

    List of the data tables as part of the Immigration System Statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

    If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Accessible file formats

    The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
    If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
    Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Immigration system statistics, year ending March 2025
    Immigration system statistics quarterly release
    Immigration system statistics user guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Passenger arrivals

    https://assets.publishing.service.gov.uk/media/68258d71aa3556876875ec80/passenger-arrivals-summary-mar-2025-tables.xlsx">Passenger arrivals summary tables, year ending March 2025 (MS Excel Spreadsheet, 66.5 KB)

    ‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

    Electronic travel authorisation

    https://assets.publishing.service.gov.uk/media/681e406753add7d476d8187f/electronic-travel-authorisation-datasets-mar-2025.xlsx">Electronic travel authorisation detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 56.7 KB)
    ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/68247953b296b83ad5262ed7/visas-summary-mar-2025-tables.xlsx">Entry clearance visas summary tables, year ending March 2025 (MS Excel Spreadsheet, 113 KB)

    https://assets.publishing.service.gov.uk/media/682c4241010c5c28d1c7e820/entry-clearance-visa-outcomes-datasets-mar-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 29.1 MB)
    Vis_D01: Entry clearance visa applications, by nationality and visa type
    Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

    Additional dat

  15. Smoking Dataset from UK

    • kaggle.com
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Utkarsh Singh (2023). Smoking Dataset from UK [Dataset]. https://www.kaggle.com/datasets/utkarshx27/smoking-dataset-from-uk/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Utkarsh Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United Kingdom
    Description
    Survey data on smoking habits from the United Kingdom. The data set can be used for analyzing the demographic characteristics of smokers and types of tobacco consumed. A data frame with 1691 observations on the following 12 variables.
    
    ColumnDescription
    genderGender with levels Female and Male.
    ageAge.
    marital_statusMarital status with levels Divorced, Married, Separated, Single and Widowed.
    highest_qualificationHighest education level with levels A Levels, Degree, GCSE/CSE, GCSE/O Level, Higher/Sub Degree, No Qualification, ONC/BTEC and Other/Sub Degree
    nationalityNationality with levels British, English, Irish, Scottish, Welsh, Other, Refused and Unknown.
    ethnicityEthnicity with levels Asian, Black, Chinese, Mixed, White and Refused Unknown.
    gross_incomeGross income with levels Under 2,600, 2,600 to 5,200, 5,200 to 10,400, 10,400 to 15,600, 15,600 to 20,800, 20,800 to 28,600, 28,600 to 36,400, Above 36,400, Refused and Unknown.
    regionRegion with levels London, Midlands And East Anglia, Scotland, South East, South West, The North and Wales
    smokeSmoking status with levels No and Yes
    amt_weekendsNumber of cigarettes smoked per day on weekends.
    amt_weekdaysNumber of cigarettes smoked per day on weekdays.
    typeType of cigarettes smoked with levels Packets, Hand-Rolled, Both/Mainly Packets and Both/Mainly Hand-Rolled

    Source

    National STEM Centre, Large Datasets from stats4schools, https://www.stem.org.uk/resources/elibrary/resource/28452/large-datasets-stats4schools.

  16. e

    Perspectives on the adaptations of immigrants in Britain - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Apr 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Perspectives on the adaptations of immigrants in Britain - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a06d1399-1bd0-52d1-8246-83da21705e4c
    Explore at:
    Dataset updated
    Apr 30, 2023
    Area covered
    United Kingdom
    Description

    This data collection stems from work directly arising out of the project 'Unity out of Diversity? Perspectives on the adaptations of immigrants in Britain'. The main aim of the project was to examine perceptions of adaptation in academic, policy, and public spheres. The research generated new data in the form of: (1) focus groups conducted in Manchester and Glasgow between November 2014 and September 2015; (2) interviews with local and national 'policy stakeholders' conducted between January 2015 and September 2016. This data collection provides access to this new data and related documentation. The research also used existing data from various sources: (a) Existing surveys available via the UK Data Service such as: (1) Ethnic Minority British Election Study; (2) Citizenship Survey; and (3) Understanding Society. This data collection provides scripts that showed how the data was transformed for analysis. (b) Textual data from journal article abstracts; newspaper articles; and Hansard debates. This data collection provides details of the methodology used to extract such data. (c) Online survey data from a related project funded by the British Academy, where Dr Lessard-Phillips was a co-applicant (PI: Dr Maria Sobolewska). This data collection provides a replication dataset and related documentation.The adaptation of immigrants (the immigrants' long-term integration into British society, and British society's response to it) has become an important topic of academic inquiry and debate among policy makers and the general public. Yet there is little systematic research or unified understanding of this process within and across these different arenas. This project aims to investigate the commonalities and differences in the various perceptions and understandings of adaptation and try to reconnect them. This will be done by using an original research design that will examine: the multidimensionality of immigrant adaptation in British academia (via a meta-analysis of the current literature and quantitative analysis of secondary data). Which will be contrasted with the subjective understandings and perceptions of adaptation in Britain among: - policy makers and third-sector stakeholders (via an analysis of policy documents and interviews) - minority and majority groups among the British population (via focus groups). This project will seek an active involvement by academic and non-academic audiences. It will provide a thorough and updated understanding of immigrant adaptation and its dimensionality in Britain, reaching beyond academic and policy circles, with the aim to build a solid evidence base for future research and policy. The qualitative data was collected via focus groups with the members of the public in Manchester and Glasgow (using purposive samples provided by local community organisations), and interviews with policy stakeholders (people working in local and national goverment and third sector organisations) selected based on their expertise on the topic (either via direct sollicitation or adverts about the project).

  17. Returns and detention - Historic datasets

    • gov.uk
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2023). Returns and detention - Historic datasets [Dataset]. https://www.gov.uk/government/statistical-data-sets/returns-and-detention-datasets
    Explore at:
    Dataset updated
    Aug 24, 2023
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    This page contains data for the immigration system statistics up to March 2023.

    For current immigration system data, visit ‘Immigration system statistics data tables’.

    Immigration detention

    https://assets.publishing.service.gov.uk/media/6462567294f6df000cf5ea90/detention-datasets-mar-2023.xlsx">Immigration detention (MS Excel Spreadsheet, 9.8 MB)
    Det_D01: Number of entries into immigration detention by nationality, age, sex and initial place of detention
    Det_D02: Number of people in immigration detention at the end of each quarter by nationality, age, sex, current place of detention and length of detention
    Det_D03: Number of occurrences of people leaving detention by nationality, age, sex, reason for leaving detention and length of detention
    This is not the latest data

    Returns

    https://assets.publishing.service.gov.uk/media/646357c494f6df0010f5eb0a/returns-datasets-mar-2023.xlsx">Returns (MS Excel Spreadsheet, 14.4 MB)
    Ret_D01: Number of returns from the UK, by nationality, age, sex, type of return and return destination group
    Ret_D02: Number of returns from the UK, by type of return and country of destination
    Ret_D03: Number of foreign national offender returns from the UK, by nationality and return destination group
    Ret_D04: Number of foreign national offender returns from the UK, by destination
    This is not the latest data

  18. Country of birth by Ethnic group (England and Wales) 2011

    • statistics.ukdataservice.ac.uk
    csv, zip
    Updated Sep 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service. (2022). Country of birth by Ethnic group (England and Wales) 2011 [Dataset]. https://statistics.ukdataservice.ac.uk/dataset/country-birth-ethnic-group-england-and-wales-2011
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Northern Ireland Statistics and Research Agency
    Office for National Statisticshttp://www.ons.gov.uk/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency; UK Data Service.
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Wales, England
    Description

    Dataset population: Persons

    Country of birth

    Country of birth is the country in which a person was born. This topic records whether the person was born in or if they were not born in a country.

    For the full country of birth classification in England and Wales, please see the National Statistics Country Classification.

    Ethnic group

    Ethnic group classifies people according to their own perceived ethnic group and cultural background.

    This topic contains ethnic group write-in responses without reference to the five broad ethnic group categories, e.g. all Irish people, irrespective of whether they are White, Mixed/multiple ethnic groups, Asian/Asian British, Black/African/Caribbean/Black British or Other ethnic group, are in the 'Irish' response category. This topic was created as part of the commissioned table processing.

  19. National DNA Database statistics

    • gov.uk
    • s3.amazonaws.com
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2025). National DNA Database statistics [Dataset]. https://www.gov.uk/government/statistics/national-dna-database-statistics
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    These statistics include:

    • crime matches
    • subject samples
    • NDNAD breakdown
    • gender
    • ethnic appearance
    • age

    We are currently unable to provide figures on matches made against profiles on the National DNA Database.

    https://webarchive.nationalarchives.gov.uk/ukgwa/20230502153339/https://www.gov.uk/government/statistics/national-dna-database-statistics" class="govuk-link">Statistics from Q1 2013 to Q4 2022 to 2023 are available on the National Archives.

    Figures for Q2 2014 to 2015 are unavailable. This is due to technical issues with the management information system.

  20. F

    English Chain of Thought Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-chain-of-thought-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the English Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.

    Dataset Content:

    This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the English language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.

    Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native English speaking people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.

    Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.

    Prompt Diversity:

    To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.

    These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.

    Response Formats:

    To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.

    These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled English Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.

    Quality and Accuracy:

    Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The English version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy English Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Guokun Lai; Qizhe Xie; Hanxiao Liu; Yiming Yang; Eduard Hovy (2021). RACE Dataset [Dataset]. https://paperswithcode.com/dataset/race

RACE Dataset

ReAding Comprehension dataset from Examinations

Explore at:
Dataset updated
Jan 28, 2021
Authors
Guokun Lai; Qizhe Xie; Hanxiao Liu; Yiming Yang; Eduard Hovy
Description

The ReAding Comprehension dataset from Examinations (RACE) dataset is a machine reading comprehension dataset consisting of 27,933 passages and 97,867 questions from English exams, targeting Chinese students aged 12-18. RACE consists of two subsets, RACE-M and RACE-H, from middle school and high school exams, respectively. RACE-M has 28,293 questions and RACE-H has 69,574. Each question is associated with 4 candidate answers, one of which is correct. The data generation process of RACE differs from most machine reading comprehension datasets - instead of generating questions and answers by heuristics or crowd-sourcing, questions in RACE are specifically designed for testing human reading skills, and are created by domain experts.

Search
Clear search
Close search
Google apps
Main menu